Assignment 2: All about data, urban legends and statistics
For all problems below, always show your work. I am not interested very much in whether or not you obtain the "right answer" but rather on the process you used to try and solve the problem
- The current (2015) population of Eugene is approximately 163,500 people. All of your friends insist that Eugene population is growing out of control and that there will be 300,000 people by the year 2035.
Okay get the data: by typing "population of eugene oregon"
a) If the population
growth rate remains at 4% per year, approximately when will the city of Eugene have 500,000 people?
b) If the Eugene school district is mandated to have 1 elementary school per 10,000 people, at the 4% growth rate, how many schools will have to be built between the period 2015 and 2030.
- The following is a list of total points in 15 randomly selected NFL football games
23
31
17
25
12
18
48
35
27
33
41
63
24
22
48
Calculate the average and standard deviation of these scores any way you like. This tool is very easy to use you can just cut and paste the data into that page. Here is another tool but it wants data separated by commas.
Your urban legend friend (some one that believes anecodal evidence much more than quantatitave reasoning) insists that scoring more than 45 points in an NFL game is highly improbable (meaning it happens less than 1% of the time).
a) Based on this data, what would you tell your friend?
b) Based on this data, what should be the level of the 1% event?
- The Columbia River near Pasco has had about 100 years worth of flood
measurements made. Over that period the average flood level is about 9 feet above flood stage. The standard deviation about that average is 2.5 feet. From that estimate the flood level above flood stage for a 100 year flood, and a 1000 year flood.
Important note: 10 cents = 0.1$ - don't mix up your units for these questions 4, 5 and 6
A survey of 25 gas stations in the Eugene Area showed an average
price for unleaded gas of $2.75 per gallon with a standard deviation (dispersion) of
10 cents.
A survey of 25 stations in the Portland area had an
average of $2.90 per gallon with a standard deviation of 10 cents.
A
survey of 50 stations in the Seattle area had an average of $2.85
per gallon with a standard devation of 21 cents.
- What is the probability that in Eugene, Portland and Seattle you will find a gas station that charges.
$3.00 per gallon?
- Which location has the highest probability of paying $3.25 at a gas station?
- Use the Z-test as described in the lecture material to determine if the difference in mean gas prices between Eugene and Portland is statistically significant.
- Suggest a reason that tbe standard deviation is so much higher in Seattle.
- The following data represent the average GPA of UO
Students as a function of year.
1972 2.43
1973 2.44
1974 2.45
1975 2.44
1976 2.47
1977 2.48
1978 2.5
1979 2.52
1980 2.56
1981 2.58
1982 2.58
1983 2.58
1984 2.57
1985 2.6
1986 2.6
1987 2.59
1988 2.58
1989 2.6
1990 2.64
1991 2.66
1992 2.74
1993 2.76
1994 2.79
1995 2.78
1996 2.8
1997 2.82
1998 2.84
1999 2.79
2000 2.82
2001 2.85
2002 2.87
2003 2.89
2004 2.91
2005 2.9
2006 2.93
2007 2.94
2008 2.96
2009 2.97
2010 3.01
2011 3.03
2012 3.07
2013 3.13
2014 3.09
2015 3.21
2016 3.17
-
Use the Z-test as described in the lecture material and divide the data into two
halves and calculate the Z-statistic.
-
The standard around around the 1972 GPA was 0.7 and the standard deviation around the 2016 GPA was 0.6. Determine the percentage of students whose GPA was above 3.5 in 1972 compared to 2016.
-
Using these results (and not your personal opinion), argue whether or not grade inflation is significant at the UO. (note: this is a real life question as I was recently on a committee that decided GPA inflation was not "real" or a problem at the U0 - I gave them the data argument, but, as noted previously, quantitative reasonsing skills mostly are now so bad that data arguments are not deemed as relevant).
Another data graphing exercise:
NASA has recently released the composite land+ocean surface temperature data set that is calibrated all the way back to 1880.
- YOu can cut and paste from the spread sheet into the plotting tool, like you did in the first homework assignment.
By so doing put the year in column A and the temperature anamoly in column B and produce the plot and insert the plot here.
Note the temperature anamoly refers to a given years temperature with respect to a temperature baseline that is chosen to be 1960 to 1990, for this data set. If one changes the baseline the anomaly data will change slightly, but the overall form of the curve will remain unaffected.
- Comment on the overall form of this graph and any features that might be present (note this means you have to look at the graph and think about it).
- Is the trend significant? Well lets' use the Z-test in the following way:
a) Determine if the average value of temperature anamoly during the period 1990-2016 is significantly different than the value determined for the time period 1900-1989.
b) Based on this determination, compose a Tweet to you know who about whether or not Global Warming is "Real"
- Ignoring the last 3 years, indicate on the graph some time of extrapolation (you can type the numbers into column C for some years and they will appear as Orange. For instance, adding the value of 1.0 for 2050 would product a point that looks like this:
And is a way of representing an extapolation. So just enter numbers for the appropriate years in column C to product a set of orange dots that can be your extrapolation from the current data.
- Now include the last three years of data into your extrapolation and show how much larger the 2050 Temperature anamoly will be.
- The Paris Accord of November 2015, when translated into units of temperature anamoly means that the world needs to avoid an anomaly value of 1.75 - based on your extrapolations argue whether or not this goal is likely to be met.
|