9 Sleeping in college
A recent article in a college newspaper stated that college students get an average of 5.5 hrs of sleep each night. A student who was skeptical about this value decided to conduct a survey by randomly sampling 25 students. On average, the sampled students slept 6.25 hours per night. Identify which value represents the sample mean and which value represents the claimed population mean.
AnswerPopulation mean = 5.5. Sample mean = 6.25.
10 Parameters and statistics
Identify which value represents the sample mean and which value represents the claimed population mean.
American households spent an average of about $52 in 2007 on Halloween merchandise such as costumes, decorations and candy. To see if this number had changed, researchers conducted a new survey in 2008 before industry numbers were reported. The survey included 1,500 households and found that average Halloween spending was $58 per household.
The average GPA of students in 2001 at a private university was 3.37. A survey on a sample of 203 students from this university yielded an average GPA of 3.59 in Spring semester of 2012.
11 Make-up exam
In a class of 25 students, 24 of them took an exam in class and 1 student took a make-up exam the following day. The professor graded the first batch of 24 exams and found an average score of 74 points with a standard deviation of 8.9 points. The student who took the make-up the following day scored 64 points on the exam.
- Does the new student's score increase or decrease the average score? Answer
Decrease: the new score is smaller than the mean of the 24 previous scores.
- What is the new average? Answer
Calculate a weighted mean. Use a weight of 24 for the old mean and 1 for the new mean: \((24\times 74 + 1\times64)/(24+1) = 73.6\text{.}\) There are other ways to solve this exercise that do not use a weighted mean.
- Does the new student's score increase or decrease the standard deviation of the scores? Answer
The new score is more than 1 standard deviation away from the previous mean, so increase.
12 Days off at a mining plant
Workers at a particular mining site receive an average of 35 days paid vacation, which is lower than the national average. The manager of this plant is under pressure from a local union to increase the amount of paid time off. However, he does not want to give more days off to the workers because that would be costly. Instead he decides he should fire 10 employees in such a way as to raise the average number of days off that are reported by his employees. In order to achieve this goal, should he fire employees who have the most number of days off, least number of days off, or those who have about the average number of days off?
13 Smoking habits of UK residents, Part I
A survey was conducted to study the smoking habits of UK residents. The histograms below display the distributions of the number of cigarettes smoked on weekdays and weekends, and they exclude data from people who identified themselves as non-smokers. Describe the two distributions and compare them. 5
AnswerBoth distributions are right skewed and bimodal with modes at 10 and 20 cigarettes; note that people may be rounding their answers to half a pack or a whole pack. The median of each distribution is between 10 and 15 cigarettes. The middle 50% of the data (the IQR) appears to be spread equally in each group and have a width of about 10 to 15. There are potential outliers above 40 cigarettes per day. It appears that respondents who smoke only a few cigarettes (0 to 5) smoke more on the weekdays than on weekends.
14 Stats scores, Part I
Below are the final exam scores of twenty introductory statistics students.
79, 83, 57, 82, 94, 83, 72, 74, 73, 71, 66, 89, 78, 81, 78, 81, 88, 69, 77, 79 |
Draw a histogram of these data and describe the distribution.
15 Smoking habits of UK residents, Part II
A random sample of 5 smokers from the data set discussed in Exercise 2.5.13 is provided below.
|
|
|
|
|
|
|
gender |
age |
maritalStatus |
grossIncome |
smoke |
Weekends |
Weekdays |
|
|
|
|
|
|
|
Female |
51 |
Married |
£2,600 to £5,200 |
Yes |
20 cig/day |
20 cig/day |
Male |
24 |
Single |
£10,400 to £15,600 |
Yes |
20 cig/day |
15 cig/day |
Female |
33 |
Married |
£10,400 to £15,600 |
Yes |
20 cig/day |
10 cig/day |
Female |
17 |
Single |
£5,200 to £10,400 |
Yes |
20 cig/day |
15 cig/day |
Female |
76 |
Widowed |
£5,200 to £10,400 |
Yes |
20 cig/day |
20 cig/day |
|
|
|
|
|
|
|
- Find the mean amount of cigarettes smoked on weekdays and weekends by these 5 respondents. Answer
\(\bar{x}_{amtWeekends} = 20\text{,}\) \(\bar{x}_{amtWeekdays} = 16\text{.}\)
- Find the standard deviation of the amount of cigarettes smoked on weekdays and on weekends by these 5 respondents. Is the variability higher on weekends or on weekdays? Answer
\(s_{amtWeekends} = 0\text{,}\) \(s_{amtWeekdays} = 4.18\text{.}\) In this very small sample, higher on weekdays.
16 Factory defective rate
A factory quality control manager decides to investigate the percentage of defective items produced each day. Within a given work week (Monday through Friday) the percentage of defective items produced was 2%, 1.4%, 4%, 3%, 2.2%.
Calculate the mean for these data.
Calculate the standard deviation for these data, showing each step in detail.
17 Medians and IQRs
For each part, compare distributions (1) and (2) based on their medians and IQRs. You do not need to calculate these statistics; simply state how the medians and IQRs compare. Make sure to explain your reasoning.
-
.
3, 5, 6, 7, 9
3, 5, 6, 7, 20
AnswerBoth distributions have the same median and IQR.
-
.
3, 5, 6, 7, 9
3, 5, 7, 8, 9
AnswerSecond distribution has a higher median and IQR.
-
.
1, 2, 3, 4, 5
6, 7, 8, 9, 10
AnswerSecond distribution has higher median. IQRs are equal.
-
.
0, 10, 50, 60, 100
0, 100, 500, 600, 1000
AnswerSecond distribution has higher median and larger IQR
18 Means and SDs
For each part, compare distributions (1) and (2) based on their means and standard deviations. You do not need to calculate these statistics; simply state how the means and the standard deviations compare. Make sure to explain your reasoning. Hint: It may be useful to sketch dot plots of the distributions.
-
.
3, 5, 5, 5, 8, 11, 11, 11, 13
3, 5, 5, 5, 8, 11, 11, 11, 20
-
.
-20, 0, 0, 0, 15, 25, 30, 30
-40, 0, 0, 0, 15, 25, 30, 30
-
.
0, 2, 4, 6, 8, 10
20, 22, 24, 26, 28, 30
-
.
100, 200, 300, 400, 500
0, 50, 300, 550, 600
19 Stats scores, Part II
Create a box plot for the final exam scores of twenty introductory statistics students given in Exercise 2.5.14. The five number summary provided below may be useful.
Min |
Q1 |
Q2 (Median) |
Q3 |
Max |
|
|
|
|
|
57 |
72.5 |
78.5 |
82.5 |
94 |
Answer
20 Infant mortality
The infant mortality rate is defined as the number of infant deaths per 1,000 live births. This rate is often used as an indicator of the level of health in a country. The relative frequency histogram below shows the distribution of estimated infant death rates in 2012 for 222 countries. 6
Estimate Q1, the median, and Q3 from the histogram.
Would you expect the mean of this data set to be smaller or larger than the median? Explain your reasoning.
21 Matching histograms and box plots
Describe the distribution in the histograms below and match them to the box plots.
Answer
Descriptions will vary a little.
(a) 2. Unimodal, symmetric, centered at 60, standard deviation of roughly 3/
(b) 3. Symmetric and approximately evenly distributed from 0 to 100.
(c) 1. Right skewed, unimodal, centered at about 1.5, with most observations falling between 0 and 3. A very small fraction of observations exceed a value of 5.
22 Air quality
Daily air quality is measured by the air quality index (AQI) reported by the Environmental Protection Agency. This index reports the pollution level and what associated health effects might be a concern. The index is calculated for five major air pollutants regulated by the Clean Air Act and takes values from 0 to 300, where a higher value indicates lower air quality. AQI was reported for a sample of 91 days in 2011 in Durham, NC. The relative frequency histogram below shows the distribution of the AQI values on these days. 7
Estimate the median AQI value of this sample.
Would you expect the mean AQI value of this sample to be higher or lower than the median? Explain your reasoning.
Estimate \(Q_1\text{,}\) \(Q_3\text{,}\) and IQR for the distribution.
23 Histograms and box plots
Compare the two plots below. What characteristics of the distribution are apparent in the histogram and not in the box plot? What characteristics are apparent in the box plot but not in the histogram?
AnswerThe histogram shows that the distribution is bimodal, which is not apparent in the box plot. The box plot makes it easy to identify more precise values of observations outside of the whiskers.
24 Marathon winners
The histogram and box plots below show the distribution of finishing times for male and female winners of the New York Marathon between 1970 and 1999.
What features of the distribution are apparent in the histogram and not the box plot? What features are apparent in the box plot but not in the histogram?
What may be the reason for the bimodal distribution? Explain.
Compare the distribution of marathon times for men and women based on the box plot shown below.
The time series plot shown below is another way to look at these data. Describe what is visible in this plot but not in the others.
25 ACS, Part II
The hollow histograms below show the distribution of incomes of respondents to the American Community Survey introduced in Exercise 2.5.1.
- Compare the distributions of incomes of males and females. Answer
Both distributions are right skewed however the distribution of incomes of males has a much higher median (around $40K) compared to females (around $20K).
- Suggest an alternative visualization for displaying and comparing the distributions of incomes of males and females. Answer
We could also use side-by-side box plots for displaying and easily comparing the distributions of incomes of males and females.
26 AP Stats
The table below shows scores (out of 100) of twenty college students on a college level statistical reasoning test given at the beginning of the semester in their introductory statistics course. Ten of these students have taken AP Stats in high school, and the other ten have not taken AP Stats.
Took AP stats: 52.5, 57.5, 60, 65, 70, 70, 72.5, 77.5, 80, 85
Did not take AP stats: 40, 45, 45, 50, 52.5, 57.5, 57.5, 60, 65, 72.5
Create a relative frequency histogram of all students' scores on the statistical reasoning test.
What percent of all students scored above 50 on this test?
Compare the performances of students who did and did not take AP stats. The side-by-side box plots and the hollow histograms shown below might be helpful for this task.
27 Distributions and appropriate statistics, Part I
For each of the following, describe whether you expect the distribution to be symmetric, right skewed, or left skewed. Also specify whether the mean or median would best represent a typical observation in the data and whether the variability of observations would be best represented using the standard deviation or IQR.
- Number of pets per household. Answer
The distribution of number of pets per household is likely right skewed as there is a natural boundary at 0 and only a few people have many pets. Therefore the center would be best described by the median, and variability would be best described by the IQR.
- Distance to work, i.e. number of miles between work and home. Answer
The distribution of number of distance to work is likely right skewed as there is a natural boundary at 0 and only a few people live a very long distance from work. Therefore the center would be best described by the median, and variability would be best described by the IQR.
- Heights of adult males. Answer
The distribution of heights of males is likely symmetric. Therefore the center would be best described by the mean, and variability would be best described by the standard deviation.
28 Distributions and appropriate statistics, Part II
For each of the following, describe whether you expect the distribution to be symmetric, right skewed, or left skewed. Also specify whether the mean or median would best represent a typical observation in the data, and whether the variability of observations would be best represented using the standard deviation or IQR.
Housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000.
Housing prices in a country where 25% of the houses cost below $300,000, 50% of the houses cost below $600,000, 75% of the houses cost below $900,000 and very few houses that cost more than $1,200,000.
Number of alcoholic drinks consumed by college students in a given week. Assume that most of these students don't drink since they are under 21 years old, and only a few drink excessively.
Annual salaries of the employees at a Fortune 500 company where only a few high level executives earn much higher salaries than all the other employees.
29 TV watchers
Students in an AP Statistics class were asked how many hours of television they watch per week (including online streaming). This sample yielded an average of 4.71 hours, with a standard deviation of 4.18 hours. Is the distribution of number of hours students watch television weekly symmetric? If not, what shape would you expect this distribution to have? Explain your reasoning.
AnswerNo, we would expect this distribution to be right skewed. There are two reasons for this: (1) there is a natural boundary at 0 (it is not possible to watch less than 0 hours of TV), (2) the standard deviation of the distribution is very large compared to the mean.
30 Exam scores
The average on a history exam (scored out of 100 points) was 85, with a standard deviation of 15. Is the distribution of the scores on this exam symmetric? If not, what shape would you expect this distribution to have? Explain your reasoning.
31 Facebook friends
Facebook data indicate that 50% of Facebook users have 100 or more friends, and that the average friend count of users is 190. What do these findings suggest about the shape of the distribution of number of friends of Facebook users? 8
AnswerThe statement “50% of Facebook users have over 100 friends” means that the median number of friends is 100, which is lower than the mean number of friends (190), which suggests a right skewed distribution for the number of friends of Facebook users.
32 A new statistic
The statistic \(\frac{\bar{x}}{median}\) can be used as a measure of skewness. Suppose we have a distribution where all observations are greater than 0, \(x_i \gt 0\text{.}\) What is the expected shape of the distribution under the following conditions? Explain your reasoning.
\(\frac{\bar{x}}{median} = 1\)
\(\frac{\bar{x}}{median} \lt 1\)
\(\frac{\bar{x}}{median} \gt 1\)
33 Income at the coffee shop, Part I
The first histogram below shows the distribution of the yearly incomes of 40 patrons at a college coffee shop. Suppose two new people walk into the coffee shop: one making $225,000 and the other $250,000. The second histogram shows the new income distribution. Summary statistics are also provided.
|
|
|
|
(1) |
(2) |
|
|
|
n |
40 |
42 |
Min. |
60,680 |
60,680 |
1st Qu. |
63,620 |
63,710 |
Median |
65,240 |
65,350 |
Mean |
65,090 |
73,300 |
3rd Qu. |
66,160 |
66,540 |
Max. |
69,890 |
250,000 |
SD |
2,122 |
37,321 |
|
|
|
- Would the mean or the median best represent what we might think of as a typical income for the 42 patrons at this coffee shop? What does this say about the robustness of the two measures? Answer
The median is better; the mean is substantially affected by the two extreme observations.
- Would the standard deviation or the IQR best represent the amount of variability in the incomes of the 42 patrons at this coffee shop? What does this say about the robustness of the two measures? Answer
The IQR is better; the standard deviation, like the mean, is substantially affected by the two high salaries.
34 Midrange
The midrange of a distribution is defined as the average of the maximum and the minimum of that distribution. Is this statistic robust to outliers and extreme skew? Explain your reasoning
35 Commute times, Part I
The histogram below shows the distribution of mean commute times in 3,143 US counties in 2010. Describe the distribution and comment on whether or not a log transformation may be advisable for these data.
AnswerThe distribution is unimodal and symmetric with a mean of about 25 minutes and a standard deviation of about 5 minutes. There does not appear to be any counties with unusually high or low mean travel times. Since the distribution is already unimodal and symmetric, a log transformation is not necessary.
36 Hispanic population, Part I
The histogram below shows the distribution of the percentage of the population that is Hispanic in 3,143 counties in the US in 2010. Also shown is a histogram of logs of these values. Describe the distribution and comment on why we might want to use log-transformed values in analyzing or modeling these data.
37 Income at the coffee shop, Part II
Suppose each of the 40 people in the coffee shop in Exercise 2.5.33 got a 5% raise. What would the new mean, median, and the standard deviation of their incomes be?
Answermean = \(65,090 \times 1.05 = 68,344.50\text{;}\) median = \(65,240 \times 1.05 = 68,502\text{;}\) sd = \(2,122 \times 1.05 = 2,228.10\)
38 LA weather
The temperatures in June in Los Angeles have a mean of 77°F, with a standard deviation of 5 °F. To convert from Celsius to Fahrenheit, we use the following conversion:
\begin{equation*}
x_{C} = (x_{F} - 32) \frac{5}{9}
\end{equation*}
What is the mean temperature in June in LA in degrees Celcius?
What is the standard deviation of temperatures in June in LA in degrees Celcius?
39 Smoking habits of UK residents, Part III
The UK residents in Exercise 2.5.15 smoke on average 16 cigarettes per day on weekdays, with a standard deviation of 4.18. Suppose these residents participated in a smoking cessation program and at the end of the first week of the program reduced their weekday smoking by 3 cigarettes / day. Find the new mean and standard deviation of the number of cigarettes they smoke on weekdays.
Answermean = 16 - 3 = 13; sd = 4.18
40 Stats scores, Part III
The introductory statistics students in Exercise 2.5.14 scored on average 77.7 points, with a standard deviation of 8.44. The median score was 78.5, and the IQR was 9.5. Suppose these students completed an extra credit exercise that earned them additional two points on their exams. Calculate the new mean, median, standard deviation, and IQR of their scores.
41 Commute times, Part II
Exercise 2.5.35 displays histograms of mean commute times in 3,143 US counties in 2010. Describe the spatial distribution of commuting times using the map below.
AnswerAnswers will vary. There are pockets of longer travel time around DC, Southeastern NY, Chicago, Minneapolis, Los Angeles, and many other big cities. There is also a large section of shorter average commute times that overlap with farmland in the Midwest. Many farmers' homes are adjacent to their farmland, so their commute would be 0 minutes, which may explain why the average commute time for these counties is relatively low.
42 Hispanic population, Part II
Exercise 2.5.36 displays histograms of the distribution of the percentage of the population that is Hispanic in 3,143 counties in the US in 2010.
What features of this distribution are apparent in the map but not in the histogram?
What features are apparent in the histogram but not the map?
Is one visualization more appropriate or helpful than the other? Explain your reasoning.