Read and format project data
# Include and execute your code here
= pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv") df
Course DS 250
Brayden McAllister
paste your elevator pitch here A SHORT (4-5 SENTENCES) PARAGRAPH THAT DESCRIBES KEY INSIGHTS
TAKEN FROM METRICS IN THE PROJECT RESULTS THINK TOP OR MOST IMPORTANT RESULTS.
Highlight the Questions and Tasks
Fix all of the varied missing data types in the data to be consistent (all missing values should be displayed as “NaN”). In your report include one record example (one row) from your new data, in the raw JSON format. Your example should display the “NaN” for at least one missing value.
type your results and analysis here
include figures in chunks and discuss your findings in the figure.
::: {#cell-Q1 chart .cell execution_count=4}
My useless chart
:::
::: {#cell-Q1 table .cell .tbl-cap-location-top tbl-cap=‘Not much of a table’ execution_count=5}
year | AK | AR | |
---|---|---|---|
96 | 2006 | 21.0 | 183.0 |
97 | 2007 | 28.0 | 153.0 |
98 | 2008 | 36.0 | 212.0 |
99 | 2009 | 34.0 | 179.0 |
100 | 2010 | 22.0 | 196.0 |
101 | 2011 | 41.0 | 148.0 |
102 | 2012 | 28.0 | 140.0 |
103 | 2013 | 26.0 | 134.0 |
104 | 2014 | 20.0 | 114.0 |
105 | 2015 | 28.0 | 121.0 |
:::
Which airport has the worst delays? Discuss the metric you chose, and why you chose it to determine the “worst” airport. Your answer should include a summary table that lists (for each airport) the total number of flights, total number of delayed flights, proportion of delayed flights, and average delay time in hours.
type your results and analysis here
include figures in chunks and discuss your findings in the figure.
::: {#cell-Q2 chart .cell execution_count=7}
My useless chart
:::
::: {#cell-Q2 table .cell .tbl-cap-location-top tbl-cap=‘Not much of a table’ execution_count=8}
year | AK | AR | |
---|---|---|---|
96 | 2006 | 21.0 | 183.0 |
97 | 2007 | 28.0 | 153.0 |
98 | 2008 | 36.0 | 212.0 |
99 | 2009 | 34.0 | 179.0 |
100 | 2010 | 22.0 | 196.0 |
101 | 2011 | 41.0 | 148.0 |
102 | 2012 | 28.0 | 140.0 |
103 | 2013 | 26.0 | 134.0 |
104 | 2014 | 20.0 | 114.0 |
105 | 2015 | 28.0 | 121.0 |
:::
What is the best month to fly if you want to avoid delays of any length? Discuss the metric you chose and why you chose it to calculate your answer. Include one chart to help support your answer, with the x-axis ordered by month. (To answer this question, you will need to remove any rows that are missing the Month variable.)
type your results and analysis here
include figures in chunks and discuss your findings in the figure.
::: {#cell-Q3 chart .cell execution_count=10}
My useless chart
:::
::: {#cell-Q3 table .cell .tbl-cap-location-top tbl-cap=‘Not much of a table’ execution_count=11}
year | AK | AR | |
---|---|---|---|
96 | 2006 | 21.0 | 183.0 |
97 | 2007 | 28.0 | 153.0 |
98 | 2008 | 36.0 | 212.0 |
99 | 2009 | 34.0 | 179.0 |
100 | 2010 | 22.0 | 196.0 |
101 | 2011 | 41.0 | 148.0 |
102 | 2012 | 28.0 | 140.0 |
103 | 2013 | 26.0 | 134.0 |
104 | 2014 | 20.0 | 114.0 |
105 | 2015 | 28.0 | 121.0 |
:::
According to the BTS website, the “Weather” category only accounts for severe weather delays. Mild weather delays are not counted in the “Weather” category, but are actually included in both the “NAS” and “Late-Arriving Aircraft” categories. Your job is to create a new column that calculates the total number of flights delayed by weather (both severe and mild). You will need to replace all the missing values in the Late Aircraft variable with the mean. Show your work by printing the first 5 rows of data in a table. Use these three rules for your calculations:
type your results and analysis here
Using the new weather variable calculated above, create a barplot showing the proportion of all flights that are delayed by weather at each airport. Discuss what you learn from this graph.
type your results and analysis here