## Section6.5Exercises

### SubsectionExercises

###### 1Vegetarian college students

Suppose that 8% of college students are vegetarians. Determine if the following statements are true or false, and explain your reasoning.

1. The distribution of the sample proportions of vegetarians in random samples of size 60 is approximately normal since $n \ge 30\text{.}$ Answer

False. Doesn't satisfy success-failure condition.

2. The distribution of the sample proportions of vegetarian college students in random samples of size 50 is right skewed. Answer

True. The success-failure condition is not satisfied. In most samples we would expect $\hat{p}$ to be close to 0.08, the true population proportion. While $\hat{p}$ can be much above 0.08, it is bound below by 0, suggesting it would take on a right skewed shape. Plotting the sampling distribution would confirm this suspicion.

3. A random sample of 125 college students where 12% are vegetarians would be considered unusual. Answer

False. $SE_{\hat{p}} = 0.0243\text{,}$ and $\hat{p} = 0.12$ is only $\frac{0.12 - 0.08}{0.0243} = 1.65$ SEs away from the mean, which would not be considered unusual.

4. A random sample of 250 college students where 12% are vegetarians would be considered unusual. Answer

True. $\hat{p}=0.12$ is 2.32 standard errors away from the mean, which is often considered unusual.

5. The standard error would be reduced by one-half if we increased the sample size from 125 to 250. Answer

False. Decreases the SE by a factor of $1/\sqrt{2}\text{.}$

###### 2Young Americans, Part I

About 77% of young adults think they can achieve the American dream. Determine if the following statements are true or false, and explain your reasoning. 1 A. Vaughn. “Poll finds young adults optimistic, but not about money”. In: Los Angeles Times (2011).

1. The distribution of sample proportions of young Americans who think they can achieve the American dream in samples of size 20 is left skewed.

2. The distribution of sample proportions of young Americans who think they can achieve the American dream in random samples of size 40 is approximately normal since $n \ge 30\text{.}$

3. A random sample of 60 young Americans where 85% think they can achieve the American dream would be considered unusual.

4. A random sample of 120 young Americans where 85% think they can achieve the American dream would be considered unusual.

###### 3Orange tabbies

Suppose that 90% of orange tabby cats are male. Determine if the following statements are true or false, and explain your reasoning.

1. The distribution of sample proportions of random samples of size 30 is left skewed. Answer

True. See the reasoning of 6.1(b).

2. Using a sample size that is 4 times as large will reduce the standard error of the sample proportion by one-half. Answer

True. We take the square root of the sample size in the SE formula.

3. The distribution of sample proportions of random samples of size 140 is approximately normal. Answer

True. The independence and success-failure conditions are satisfied.

4. The distribution of sample proportions of random samples of size 280 is approximately normal. Answer

True. The independence and success-failure conditions are satisfied.

###### 4Young Americans, Part II

About 25% of young Americans have delayed starting a family due to the continued economic slump. Determine if the following statements are true or false, and explain your reasoning. 2 Desmos.org “The State of Young America: The Poll”. In: (2011).

1. The distribution of sample proportions of young Americans who have delayed starting a family due to the continued economic slump in random samples of size 12 is right skewed.

2. In order for the distribution of sample proportions of young Americans who have delayed starting a family due to the continued economic slump to be approximately normal, we need random samples where the sample size is at least 40.

3. A random sample of 50 young Americans where 20% have delayed starting a family due to the continued economic slump would be considered unusual.

4. A random sample of 150 young Americans where 20% have delayed starting a family due to the continued economic slump would be considered unusual.

5. Tripling the sample size will reduce the standard error of the sample proportion by one-third.

###### 5Prop 19 in California

In a 2010 Survey USA poll, 70% of the 119 respondents between the ages of 18 and 34 said they would vote in the 2010 general election for Prop 19, which would change California law to legalize marijuana and allow it to be regulated and taxed. At a 95% confidence level, this sample has an 8% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning. 3 Survey USA, Election Poll #16804, data collected July 8-11, 2010.

1. We are 95% confident that between 62% and 78% of the California voters in this sample support Prop 19. Answer

False. A confidence interval is constructed to estimate the population proportion, not the sample proportion.

2. We are 95% confident that between 62% and 78% of all California voters between the ages of 18 and 34 support Prop 19. Answer

True. 95% CI: $70\pm8\text{.}$

3. If we considered many random samples of 119 California voters between the ages of 18 and 34, and we calculated 95% confidence intervals for each, 95% of them will include the true population proportion of 18-34 year old Californians who support Prop 19. Answer

True. By the definition of the confidence level.

4. In order to decrease the margin of error to 4%, we would need to quadruple (multiply by 4) the sample size. Answer

True. Quadrupling the sample size decreases the SE and ME by a factor of $1/\sqrt{4}\text{.}$

5. Based on this confidence interval, there is sufficient evidence to conclude that a majority of California voters between the ages of 18 and 34 support Prop 19. Answer

True. The 95% CI is entirely above 50%.

###### 62010 Healthcare Law

On June 28, 2012 the U.S. Supreme Court upheld the much debated 2010 healthcare law, declaring it constitutional. A Gallup poll released the day after this decision indicates that 46% of 1,012 Americans agree with this decision. At a 95% confidence level, this sample has a 3% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning. 4 Gallup Americans Issue Split Decision on Healthcare Ruling, data collected June 28, 2012.

1. We are 95% confident that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law.

2. We are 95% confident that between 43% and 49% of Americans support the decision of the U.S. Supreme Court on the 2010 healthcare law.

3. If we considered many random samples of 1,012 Americans, and we calculated the sample proportions of those who support the decision of the U.S. Supreme Court, 95% of those sample proportions will be between 43% and 49%.

4. The margin of error at a 90% confidence level would be higher than 3%.

###### 7Fireworks on July $4^\text{th}$

In late June 2012, Survey USA published results of a survey stating that 56% of the 600 randomly sampled Kansas residents planned to set off fireworks on July $4^{th}\text{.}$ Determine the margin of error for the 56% point estimate using a 95% confidence level. 5 Survey USA,News Poll #19333,data collected on June 27, 2012.

With a random sample from $\lt10$ of the population, independence is satisfied. The success-failure condition is also satisfied. $ME = z^{\star} \sqrt{ \frac{\hat{p} (1-\hat{p})} {n} } = 1.96 \sqrt{ \frac{0.56 \times 0.44}{600} }= 0.0397 \approx 4$

###### 8Elderly drivers

In January 2011, The Marist Poll published a report stating that 66% of adults nationally think licensed drivers should be required to retake their road test once they reach 65 years of age. It was also reported that interviews were conducted on 1,018 American adults, and that the margin of error was 3% using a 95% confidence level. 6 Marist Poll,Road Rules: Re-Testing Drivers at Age 65?, March 4, 2011.

1. Verify the margin of error reported by The Marist Poll.

2. Based on a 95% confidence interval, does the poll provide convincing evidence that more than 70% of the population think that licensed drivers should be required to retake their road test once they turn 65?

###### 9Life after college

We are interested in estimating the proportion of graduates at a mid-sized university who found a job within one year of completing their undergraduate degree. Suppose we conduct a survey and find out that 348 of the 400 randomly sampled graduates found jobs. The graduating class under consideration included over 4500 students.

1. Describe the population parameter of interest. What is the value of the point estimate of this parameter? Answer

Proportion of graduates from this university who found a job within one year of graduating. $\hat{p} = 348/400 = 0.87\text{.}$

2. Check if the conditions for constructing a confidence interval based on these data are met. Answer

This is a random sample from less than 10% of the population, so the observations are independent. Success-failure condition is satisfied: 348 successes, 52 failures, both well above 10.

3. Calculate a 95% confidence interval for the proportion of graduates who found a job within one year of completing their undergraduate degree at this university, and interpret it in the context of the data. Answer

(0.8371, 0.9029). We are 95% confident that approximately 84% to 90% of graduates from this university found a job within one year of completing their undergraduate degree.

4. What does “95% confidence” mean? Answer

95% of such random samples would produce a 95% confidence interval that includes the true proportion of students at this university who found a job within one year of graduating from college.

5. Now calculate a 99% confidence interval for the same parameter and interpret it in the context of the data. Answer

(0.8267, 0.9133). Similar interpretation as before.

6. Compare the widths of the 95% and 99% confidence intervals. Which one is wider? Explain. Answer

99% CI is wider, as we are more confident that the true proportion is within the interval and so need to cover a wider range.

###### 10Life rating in Greece

Greece has faced a severe economic crisis since the end of 2009. A Gallup poll surveyed 1,000 randomly sampled Greeks in 2011 and found that 25% of them said they would rate their lives poorly enough to be considered “suffering”. 7 Gallup World,More Than One in 10 “Suffering” Worldwide, data collected throughout 2011.

1. Describe the population parameter of interest. What is the value of the point estimate of this parameter?

2. Check if the conditions required for constructing a confidence interval based on these data are met.

3. Construct a 95% confidence interval for the proportion of Greeks who are “suffering”.

4. Without doing any calculations, describe what would happen to the confidence interval if we decided to use a higher confidence level.

5. Without doing any calculations, describe what would happen to the confidence interval if we used a larger sample.

A survey on 1,509 high school seniors who took the SAT and who completed an optional web survey between April 25 and April 30, 2007 shows that 55% of high school seniors are fairly certain that they will participate in a study abroad program in college. 8 studentPOLL,College-Bound Students' Interests in Study Abroad and Other International Learning Activities, January 2008.

1. Is this sample a representative sample from the population of all high school seniors in the US? Explain your reasoning. Answer

No. The sample only represents students who took the SAT, and this was also an online survey.

2. Let's suppose the conditions for inference are met. Even if your answer to part (a) indicated that this approach would not be reliable, this analysis may still be interesting to carry out (though not report). Construct a 90% confidence interval for the proportion of high school seniors (of those who took the SAT) who are fairly certain they will participate in a study abroad program in college, and interpret this interval in context. Answer

(0.5289, 0.5711). We are 90% confident that 53% to 57% of high school seniors who took the SAT are fairly certain that they will participate in a study abroad program in college.

3. What does “90% confidence” mean? Answer

90% of such random samples would produce a 90% confidence interval that includes the true proportion.

4. Based on this interval, would it be appropriate to claim that the majority of high school seniors are fairly certain that they will participate in a study abroad program in college? Answer

Yes. The interval lies entirely above 50%.

###### 12Legalization of marijuana, Part I

The 2010 General Social Survey asked 1,259 US residents: “Do you think the use of marijuana should be made legal, or not?” 48% of the respondents said it should be made legal. 9 National Opinion Research Center,General Social Survey, 2010.

1. Is 48% a sample statistic or a population parameter? Explain.

2. Construct a 95% confidence interval for the proportion of US residents who think marijuana should be made legal, and interpret it in the context of the data.

3. A critic points out that this 95% confidence interval is only accurate if the statistic follows a normal distribution, or if the normal model is a good approximation. Is this true for these data? Explain.

4. A news piece on this survey's findings states, “Majority of Americans think marijuana should be legalized.” Based on your confidence interval, is this news piece's statement justified?

###### 13Public option, Part I

A Washington Post article from 2009 reported that “support for a government-run health-care plan to compete with private insurers has rebounded from its summertime lows and wins clear majority support from the public.” More specifically, the article says “seven in 10 Democrats back the plan, while almost nine in 10 Republicans oppose it. Independents divide 52 percent against, 42 percent in favor of the legislation.” (6% responded with “other”.) There were 819 Democrats, 566 Republicans and 783 Independents surveyed. 10

1. A political pundit on TV claims that a majority of Independents oppose the health care public option plan. Do these data provide strong evidence to support this statement? Answer

This is an appropriate setting for a hypothesis test. $H_0: p = 0.50\text{.}$ $H_A: p \gt 0.50\text{.}$ Both independence and the success-failure condition are satisfied. $Z=1.12$ $\to$ p-value $= 0.1314\text{.}$ Since the p-value $\gt \alpha=0.05\text{,}$ we fail to reject $H_0\text{.}$ The data do not provide strong evidence that more than half of all Independents oppose the public option plan.

2. Would you expect a confidence interval for the proportion of Independents who oppose the public option plan to include 0.5? Explain. Answer

Yes, since we did not reject $H_0$ in part (a).

###### 14The Civil War

A national survey conducted in 2011 among a simple random sample of 1,507 adults shows that 56% of Americans think the Civil War is still relevant to American politics and political life. 11 Pew Research Center Publications,Civil War at 150: Still Relevant, Still Divisive, data collectedbetween March 30 - April 3, 2011.

1. Conduct a hypothesis test to determine if these data provide strong evidence that the majority of the Americans think the Civil War is still relevant.

2. Interpret the p-value in this context.

3. Calculate a 90% confidence interval for the proportion of Americans who think the Civil War is still relevant. Interpret the interval in this context, and comment on whether or not the confidence interval agrees with the conclusion of the hypothesis test.

###### 15Browsing on the mobile device

A 2012 survey of 2,254 American adults indicates that 17% of cell phone owners do their browsing on their phone rather than a computer or other device. 12 Pew Internet,Cell Internet Use 2012, data collected between March 15 - April 13, 2012.

1. According to an online article, a report from a mobile research company indicates that 38 percent of Chinese mobile web users only access the internet through their cell phones. 13 S. Chang. “The Chinese Love to Use Feature Phone to Access the Internet”. In: M.I.C Gadget (2012). Conduct a hypothesis test to determine if these data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%. Answer

$H_0: p = 0.38\text{.}$ $H_A: p \ne 0.38\text{.}$ Independence (random sample, $\lt10$ of population) and the success-failure condition are satisfied. $Z=-20.5$ $\to$ p-value $\approx 0\text{.}$ Since the p-value is very small, we reject $H_0\text{.}$ The data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%, and the data indicate that the proportion is lower in the US.

2. Interpret the p-value in this context. Answer

If in fact 38% of Americans used their cell phones as a primary access point to the internet, the probability of obtaining a random sample of 2,254 Americans where 17% or less or 59% or more use their only their cell phones to access the internet would be approximately 0.

3. Calculate a 95% confidence interval for the proportion of Americans who access the internet on their cell phones, and interpret the interval in this context. Answer

(0.1545, 0.1855). We are 95% confident that approximately 15.5% to 18.6% of all Americans primarily use their cell phones to browse the internet.

###### 16Is college worth it? Part I

Among a simple random sample of 331 American adults who do not have a four-year college degree and are not currently enrolled in school, 48% said they decided not to go to college because they could not afford school. 14 Pew Research Center Publications,Is College Worth It?, data collected between March 15-29, 2011.

1. A newspaper article states that only a minority of the Americans who decide not to go to college do so because they cannot afford it and uses the point estimate from this survey as evidence. Conduct a hypothesis test to determine if these data provide strong evidence supporting this statement.

2. Would you expect a confidence interval for the proportion of American adults who decide not to go to college because they cannot afford it to include 0.5? Explain.

###### 17Taste test

Some people claim that they can tell the difference between a diet soda and a regular soda in the first sip. A researcher wanting to test this claim randomly sampled 80 such people. He then filled 80 plain white cups with soda, half diet and half regular through random assignment, and asked each person to take one sip from their cup and identify the soda as diet or regular. 53 participants correctly identified the soda.

1. Do these data provide strong evidence that these people are able to detect the difference between diet and regular soda, in other words, are the results significantly better than just random guessing? Answer

$H_0: p = 0.5\text{.}$ $H_A: p \gt 0.5\text{.}$ Independence (random sample, $\lt10$ of population) is satisfied, as is the success-failure conditions (using $p_0 = 0.5\text{,}$ we expect 40 successes and 40 failures). $Z = 2.91$ $\to$ p-value $= 0.0018\text{.}$ Since the p-value $\lt 0.05\text{,}$ we reject the null hypothesis. The data provide strong evidence that the rate of correctly identifying a soda for these people is significantly better than just by random guessing.

2. Interpret the p-value in this context. Answer

If in fact people cannot tell the difference between diet and regular soda and they randomly guess, the probability of getting a random sample of 80 people where 53 or more identify a soda correctly would be 0.0018.

###### 18Is college worth it? Part II

Exercise 6.5.16 presents the results of a poll where 48% of 331 Americans who decide to not go to college do so because they cannot afford it.

1. Calculate a 90% confidence interval for the proportion of Americans who decide to not go to college because they cannot afford it, and interpret the interval in context.

2. Suppose we wanted the margin of error for the 90% confidence level to be about 1.5%. How large of a survey would you recommend?

###### 19College smokers

We are interested in estimating the proportion of students at a university who smoke. Out of a random sample of 200 students from this university, 40 students smoke.

1. Calculate a 95% confidence interval for the proportion of students at this university who smoke, and interpret this interval in context. (Reminder: check conditions) Answer

Independence is satisfied (random sample from $\lt10$ of the population), as is the success-failure condition (40 smokers, 160 non-smokers). The 95% CI: (0.145, 0.255). We are 95% confident that 14.5% to 25.5% of all students at this university smoke.

2. If we wanted the margin of error to be no larger than 2% at a 95% confidence level for the proportion of students who smoke, how big of a sample would we need? Answer

We want $z^{\star}SE$ to be no larger than 0.02 for a 95% confidence level. We use $z^{\star}=1.96$ and plug in the point estimate $\hat{p}=0.2$ within the SE formula: $1.96\sqrt{0.2(1-0.2)/n} \leq 0.02\text{.}$ The sample size $n$ should be at least 1,537.

###### 20Legalize Marijuana, Part II

As discussed in Exercise 6.5.12, the 2010 General Social Survey reported a sample where about 48% of US residents thought marijuana should be made legal. If we wanted to limit the margin of error of a 95% confidence interval to 2%, about how many Americans would we need to survey ?

###### 21Public option, Part II

Exercise 6.5.13 presents the results of a poll evaluating support for the health care public option in 2009, reporting that 52% of Independents in the sample opposed the public option. If we wanted to estimate this number to within 1% with 90% confidence, what would be an appropriate sample size?

The margin of error, which is computed as $z^{\star}SE\text{,}$ must be smaller than 0.01 for a 90% confidence level. We use $z^{\star} = 1.65$ for a 90% confidence level, and we can use the point estimate $\hat{p}=0.52$ in the formula for $SE\text{.}$ $1.65\sqrt{0.52(1-0.52)/n} \leq 0.01\text{.}$ Therefore, the sample size $n$ must be at least 6,796.

###### 22Acetaminophen and liver damage

It is believed that large doses of acetaminophen (the active ingredient in over the counter pain relievers like Tylenol) may cause damage to the liver. A researcher wants to conduct a study to estimate the proportion of acetaminophen users who have liver damage. For participating in this study, he will pay each subject \$20 and provide a free medical consultation if the patient has liver damage.

1. If he wants to limit the margin of error of his 98% confidence interval to 2%, what is the minimum amount of money he needs to set aside to pay his subjects?

2. The amount you calculated in part (a) is substantially over his budget so he decides to use fewer subjects. How will this affect the width of his confidence interval?

###### 23Social experiment, Part I

A “social experiment” conducted by a TV program questioned what people do when they see a very obviously bruised woman getting picked on by her boyfriend. On two different occasions at the same restaurant, the same couple was depicted. In one scenario the woman was dressed “provocatively” and in the other scenario the woman was dressed “conservatively”. The table below shows how many restaurant diners were present under each scenario, and whether or not they intervened.

 Scenario Provocative Conservative Total Intervene Yes 5 15 20 No 15 10 25 Total 20 25 45

Explain why the sampling distribution of the difference between the proportions of interventions under provocative and conservative scenarios does not follow an approximately normal distribution. Answer

This is not a randomized experiment, and it is unclear whether people would be affected by the behavior of their peers. That is, independence may not hold. Additionally, there are only 5 interventions under the provocative scenario, so the success-failure condition does not hold. Even if we consider a hypothesis test where we pool the proportions, the success-failure condition will not be satisfied. Since one condition is questionable and the other is not satisfied, the difference in sample proportions will not follow a nearly normal distribution.

###### 24Heart transplant success

The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was officially designated a heart transplant candidate, meaning that he was gravely ill and might benefit from a new heart. Patients were randomly assigned into treatment and control groups. Patients in the treatment group received a transplant, and those in the control group did not. The table below displays how many patients survived and died in each group.  15 B. Turnbull et al.“Survivorship of Heart Transplant Data”. In: Journal of the American Statistical Association 69 (1974), pp. 74-80.

 control treatment alive 4 24 dead 30 45

A hypothesis test would reject the conclusion that the survival rate is the same in each group, and so we might like to calculate a confidence interval. Explain why we cannot construct such an interval using the normal approximation. What might go wrong if we constructed the confidence interval despite this problem?

###### 25Gender and color preference

A 2001 study asked 1,924 male and 3,666 female undergraduate college students their favorite color. A 95% confidence interval for the difference between the proportions of males and females whose favorite color is black $(p_{male} - p_{female})$ was calculated to be (0.02, 0.06). Based on this information, determine if the following statements are true or false, and explain your reasoning for each statement you identify as false. 16 L Ellis and C Ficek. “Color preferences according to gender and sexual orientation”. In: Personality and Individual Differences 31.8 (2001), pp. 1375-1379

1. We are 95% confident that the true proportion of males whose favorite color is black is 2% lower to 6% higher than the true proportion of females whose favorite color is black. Answer

False. The entire confidence interval is above 0.

2. We are 95% confident that the true proportion of males whose favorite color is black is 2% to 6% higher than the true proportion of females whose favorite color is black. Answer

True.

3. 95% of random samples will produce 95% confidence intervals that include the true difference between the population proportions of males and females whose favorite color is black. Answer

True.

4. We can conclude that there is a significant difference between the proportions of males and females whose favorite color is black and that the difference between the two sample proportions is too large to plausibly be due to chance. Answer

True.

5. The 95% confidence interval for $(p_{female} - p_{male})$ cannot be calculated with only the information given in this exercise. Answer

False. It is simply the negated and reordered values: (-0.06,-0.02).

###### 26The Daily Show

A 2010 Pew Research foundation poll indicates that among 1,099 college graduates, 33% watch The Daily Show. Meanwhile, 22% of the 1,110 people with a high school degree but no college degree in the poll watch The Daily Show. A 95% confidence interval for $(p_\text{ college grad } - p_\text{ HS or less } )\text{,}$ where $p$ is the proportion of those who watch The Daily Show, is (0.07, 0.15). Based on this information, determine if the following statements are true or false, and explain your reasoning if you identify the statement as false. 17 The Pew Research Center, Americans Spending More Time Following the News, data collected June 8-28, 2010.

1. At the 5% significance level, the data provide convincing evidence of a difference between the proportions of college graduates and those with a high school degree or less who watch The Daily Show.

2. We are 95% confident that 7% less to 15% more college graduates watch The Daily Show than those with a high school degree or less.

3. 95% of random samples of 1,099 college graduates and 1,110 people with a high school degree or less will yield differences in sample proportions between 7% and 15%.

4. A 90% confidence interval for $(p_\text{ college grad } - p_\text{ HS or less } )$ would be wider.

5. A 95% confidence interval for $(p_\text{ HS or less } - p_\text{ college grad } )$ is (-0.15,-0.07).

###### 27Public Option, Part III

Exercise 6.5.13 presents the results of a poll evaluating support for the health care public option plan in 2009. 70% of 819 Democrats and 42% of 783 Independents support the public option.

1. Calculate a 95% confidence interval for the difference between $(p_{D} - p_{I})$ and interpret it in this context. We have already checked conditions for you. Answer

(0.23, 0.33). We are 95% confident that the proportion of Democrats who support the plan is 23% to 33% higher than the proportion of Independents who do.

2. True or false: If we had picked a random Democrat and a random Independent at the time of this poll, it is more likely that the Democrat would support the public option than the Independent. Answer

True.

###### 28Sleep deprivation, CA vs. OR, Part I

According to a report on sleep deprivation by the Centers for Disease Control and Prevention, the proportion of California residents who reported insufficient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents. These data are based on simple random samples of 11,545 California and 4,691 Oregon residents. Calculate a 95% confidence interval for the difference between the proportions of Californians and Oregonians who are sleep deprived and interpret it in context of the data. 18

###### 29Offshore drilling, Part I

A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Below is the distribution of responses, separated based on whether or not the respondent graduated from college. 19 Survey USA, Election Poll #16804, data collected July 8-11, 2010.

 College Grad Yes No Support 154 132 Oppose 180 126 Do not know 104 131 Total 438 389
1. What percent of college graduates and what percent of the non-college graduates in this sample do not know enough to have an opinion on drilling for oil and natural gas off the Coast of California? Answer

2. Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. Answer

Let $p_{CG}$ and $p_{NCG}$ represent the proportion of college graduates and non-college graduates who responded “do not know”. $H_0: p_{CG} = p_{NCG}\text{.}$ $H_A: p_{CG} \ne p_{NCG}\text{.}$ Independence is satisfied (random sample, $\lt10$ of the population), and the success-failure condition, which we would check using the pooled proportion ($\hat{p} = 235/827 = 0.284$), is also satisfied. $Z=-3.18$ $\to$ p-value = 0.0014. Since the p-value is very small, we reject $H_0\text{.}$ The data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. The data also indicate that fewer college grads say they “do not know” than non-college grads (i.e. the data indicate the direction after we reject $H_0$).

###### 30Sleep deprivation, CA vs. OR, Part II

Exercise 6.5.28 provides data on sleep deprivation rates of Californians and Oregonians. The proportion of California residents who reported insufficient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents. These data are based on simple random samples of 11,545 California and 4,691 Oregon residents.

1. Conduct a hypothesis test to determine if these data provide strong evidence the rate of sleep deprivation is different for the two states. (Reminder: check conditions)

2. It is possible the conclusion of the test in part (a) is incorrect. If this is the case, what type of error was made?

###### 31Offshore drilling, Part II

Results of a poll evaluating support for drilling for oil and natural gas off the coast of California were introduced in Exercise 6.5.29.

 College Grad Yes No Support 154 132 Oppose 180 126 Do not know 104 131 Total 438 389
1. What percent of college graduates and what percent of the non-college graduates in this sample support drilling for oil and natural gas off the Coast of California? Answer

2. Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who support off-shore drilling in California is different than that of non-college graduates. Answer

Let $p_{CG}$ and $p_{NCG}$ represent the proportion of college graduates and non-college grads who support offshore drilling. $H_0: p_{CG} = p_{NCG}\text{.}$ $H_A: p_{CG} \ne p_{NCG}\text{.}$ Independence is satisfied (random sample, $\lt10$ of the population), and the success-failure condition, which we would check using the pooled proportion ($\hat{p} = 286/827 = 0.346$), is also satisfied. $Z = 0.39$ $\to$ p-value $=0.6966\text{.}$ Since the p-value $\gt \alpha$ (0.05), we fail to reject $H_0\text{.}$ The data do not provide strong evidence of a difference between the proportions of college graduates and non-college graduates who support off-shore drilling in California.

###### 32Full body scan, Part I

A news article reports that “Americans have differing views on two potentially inconvenient and invasive practices that airports could implement to uncover potential terrorist attacks.” This news piece was based on a survey conducted among a random sample of 1,137 adults nationwide, interviewed by telephone November 7-10, 2010, where one of the questions on the survey was “Some airports are now using `full-body' digital x-ray machines to electronically screen passengers in airport security lines. Do you think these new x-ray machines should or should not be used at airports?” Below is a summary of responses based on party affiliation. 20 S. Condon.“Poll: 4 in 5 Support Full-Body Airport Scanners”. In: CBS News (2010).

 Party Affiliation Republican Democrat Independent Answer Should 264 299 351 Should not 38 55 77 Don't know/No answer 16 15 22 Total 318 369 450
1. Conduct an appropriate hypothesis test evaluating whether there is a difference in the proportion of Republicans and Democrats who think the full-body scans should be applied in airports. Assume that all relevant conditions are met.

2. The conclusion of the test in part (a) may be incorrect, meaning a testing error was made. If an error was made, was it a Type 1 or a Type 2 Error? Explain.

###### 33Sleep deprived transportation workers

The National Sleep Foundation conducted a survey on the sleep habits of randomly sampled transportation workers and a control sample of non-transportation workers. The results of the survey are shown below. 21 National Sleep Foundation,2012 Sleep in America Poll: Transportation Workers' Sleep, 2012.

 Transportation Professionals Truck Train Bux/Taxi/Limo Control Pilots Drivers Operators Drivers Less than 6 hours of sleep 35 19 35 29 21 6 to 8 hours of sleep 193 132 117 119 131 More than 8 hours 64 51 51 32 58 Total 292 202 203 180 210

Conduct a hypothesis test to evaluate if these data provide evidence of a difference between the proportions of truck drivers and non-transportation workers (the control group) who get less than 6 hours of sleep per day, i.e. are considered sleep deprived.

Subscript $_C$ means control group. Subscript $_T$ means truck drivers. $H_0: p_C = p_T\text{.}$ $H_A: p _C \ne p_T\text{.}$ Independence is satisfied (random samples, $\lt10$ of the population), as is the success-failure condition, which we would check using the pooled proportion ($\hat{p} = 70/495 = 0.141$). $Z = -1.65$ $\to$ p-value $=0.0989\text{.}$ Since the p-value is high (default to $\alpha = 0.05$), we fail to reject $H_0\text{.}$ The data do not provide strong evidence that the rates of sleep deprivation are different for non-transportation workers and truck drivers.

###### 34Prenatal vitamins and Autism

Researchers studying the link between prenatal vitamin use and autism surveyed the mothers of a random sample of children aged 24 - 60 months with autism and conducted another separate random sample for children with typical development. The table below shows the number of mothers in each group who did and did not use prenatal vitamins during the three months before pregnancy (periconceptional period). 22 R.J. Schmidt et al.“Prenatal vitamins, one-carbon metabolism gene variants, and risk for autism”. In: Epidemiology 22.4 (2011), p. 476.

 Autism Autism Typical development Total Periconceptionalprenatal vitamin No vitamin 111 70 181 Vitamin 143 159 302 Total 254 229 483
1. State appropriate hypotheses to test for independence of use of prenatal vitamins during the three months before pregnancy and autism.

2. Complete the hypothesis test and state an appropriate conclusion. (Reminder: verify any necessary conditions for the test.)

3. A New York Times article reporting on this study was titled “Prenatal Vitamins May Ward Off Autism”. Do you find the title of this article to be appropriate? Explain your answer. Additionally, propose an alternative title. 23 R.C. Rabin. “Patterns: Prenatal Vitamins May Ward Off Autism”. In: New York Times (2011).

###### 35HIV in sub-Saharan Africa

In July 2008 the US National Institutes of Health announced that it was stopping a clinical study early because of unexpected results. The study population consisted of HIV-infected women in sub-Saharan Africa who had been given single dose Nevaripine (a treatment for HIV) while giving birth, to prevent transmission of HIV to the infant. The study was a randomized comparison of continued treatment of a woman (after successful childbirth) with Nevaripine vs. Lopinavir, a second drug used to treat HIV. 240 women participated in the study; 120 were randomized to each of the two treatments. Twenty-four weeks after starting the study treatment, each woman was tested to determine if the HIV infection was becoming worse (an outcome called virologic failure). Twenty-six of the 120 women treated with Nevaripine experienced virologic failure, while 10 of the 120 women treated with the other drug experienced virologic failure. 24 S. Lockman et al. “Response to antiretroviral therapy after a single, peripartum dose of nevirapine”. In: Obstetrical & gynecological survey 62.6 (2007), p. 361.

1. Create a two-way table presenting the results of this study. Answer

Summary of the study:

 Virol. failure Yes No Total Treatment Nevaripine 26 94 120 Lopinavir 10 110 120 Total 36 204 240
2. State appropriate hypotheses to test for independence of treatment and virologic failure. Answer

$H_0: p_N = p_L\text{.}$ There is no difference in virologic failure rates between the Nevaripine and Lopinavir groups. $H_A: p_N \ne p_L\text{.}$ There is some difference in virologic failure rates between the Nevaripine and Lopinavir groups.

3. Complete the hypothesis test and state an appropriate conclusion. (Reminder: verify any necessary conditions for the test.) Answer

Random assignment was used, so the observations in each group are independent. If the patients in the study are representative of those in the general population (something impossible to check with the given information), then we can also confidently generalize the findings to the population. The success-failure condition, which we would check using the pooled proportion ($\hat{p} = 36/240 = 0.15$), is satisfied. $Z=2.89$ $\to$ p-value $=0.0039\text{.}$ Since the p-value is low, we reject $H_0\text{.}$ There is strong evidence of a difference in virologic failure rates between the Nevaripine and Lopinavir groups do not appear to be independent.

###### 36Diabetes and unemployment

A 2012 Gallup poll surveyed Americans about their employment status and whether or not they have diabetes. The survey results indicate that 1.5% of the 47,774 employed (full or part time) and 2.5% of the 5,855 unemployed 18-29 year olds have diabetes. 25 Gallup Wellbeing Employed Americans in Better Health Than the Unemployed, data collected Jan. 2, 2011 - May 21, 2012.

1. Create a two-way table presenting the results of this study.

2. State appropriate hypotheses to test for independence of incidence of diabetes and employment status.

3. The sample difference is about 1%. If we completed the hypothesis test, we would find that the p-value is very small (about 0), meaning the difference is statistically significant. Use this result to explain the difference between statistically significant and practically significant findings.

###### 37Active learning

A teacher wanting to increase the active learning component of her course is concerned about student reactions to changes she is planning to make. She conducts a survey in her class, asking students whether they believe more active learning in the classroom (hands on exercises) instead of traditional lecture will helps improve their learning. She does this at the beginning and end of the semester and wants to evaluate whether students' opinions have changed over the semester. Can she used the methods we learned in this chapter for this analysis? Explain your reasoning.

No. The samples at the beginning and at the end of the semester are not independent since the survey is conducted on the same students.

###### 38An apple a day keeps the doctor away

A physical education teacher at a high school wanting to increase awareness on issues of nutrition and health asked her students at the beginning of the semester whether they believed the expression “an apple a day keeps the doctor away”, and 40% of the students responded yes. Throughout the semester she started each class with a brief discussion of a study highlighting positive effects of eating more fruits and vegetables. She conducted the same apple-a-day survey at the end of the semester, and this time 60% of the students responded yes. Can she used the methods we learned in this chapter for this analysis? Explain your reasoning.

###### 39True or false, Part I

Determine if the statements below are true or false. For each false statement, suggest an alternative wording to make it a true statement.

1. The chi-square distribution, just like the normal distribution, has two parameters, mean and standard deviation. Answer

False. The chi-square distribution has one parameter called degrees of freedom.

2. The chi-square distribution is always right skewed, regardless of the value of the degrees of freedom parameter. Answer

True.

3. The chi-square statistic cannot be negative. Answer

True.

4. As the degrees of freedom increases, the shape of the chi-square distribution becomes more skewed. Answer

False. As the degrees of freedom increases, the shape of the chi-square distribution becomes more symmetric.

###### 40True or false, Part II

Determine if the statements below are true or false. For each false statement, suggest an alternative wording to make it a true statement.

1. As the degrees of freedom increases, the mean of the chi-square distribution increases.

2. If you found $X^2 = 10$ with $df = 5$ you would fail to reject $H_0$ at the 5% significance level.

3. When finding the p-value of a chi-square test, we always shade the tail areas in both tails.

4. As the degrees of freedom increases, the variability of the chi-square distribution decreases.

###### 41Open source textbook

A professor using an open source introductory statistics book predicts that 60% of the students will purchase a hard copy of the book, 25% will print it out from the web, and 15% will read it online. At the end of the semester he asks his students to complete a survey where they indicate what format of the book they used. Of the 126 students, 71 said they bought a hard copy of the book, 30 said they printed it out from the web, and 25 said they read it online.

1. State the hypotheses for testing if the professor's predictions were inaccurate. Answer

$H_0\text{:}$ The distribution of the format of the book used by the students follows the professor's predictions. $H_A\text{:}$ The distribution of the format of the book used by the students does not follow the professor's predictions.

2. How many students did the professor expect to buy the book, print the book, and read the book exclusively online? Answer

$E_{hardcopy} = 126 \times 0.60 = 75.6\text{.}$ $E_{print} = 126 \times 0.25 = 31.5\text{.}$ $E_{online} = 126 \times 0.15 = 18.9\text{.}$

3. This is an appropriate setting for a chi-square test. List the conditions required for a test and verify they are satisfied. Answer

Independence: The sample is not random. However, if the professor has reason to believe that the proportions are stable from one term to the next and students are not affecting each other's study habits, independence is probably reasonable. Sample size: All expected counts are at least 5. Degrees of freedom: $df = k - 1 = 3 - 1 = 2$ is more than 1.

4. Calculate the chi-squared statistic, the degrees of freedom associated with it, and the p-value. Answer

$X^2 = 2.32\text{,}$ $df=2\text{,}$ p-value $\gt 0.3\text{.}$

5. Based on the p-value calculated in part (d), what is the conclusion of the hypothesis test? Interpret your conclusion in this context. Answer

Since the p-value is large, we fail to reject $H_0\text{.}$ The data do not provide strong evidence indicating the professor's predictions were statistically inaccurate.

###### 42Evolution vs. creationism

A Gallup Poll released in December 2010 asked 1019 adults living in the Continental U.S. about their belief in the origin of humans. These results, along with results from a more comprehensive poll from 2001 (that we will assume to be exactly accurate), are summarized in the table below:  26 Four in 10 Americans Believe in Strict Creationism, December 17, 2010, http://news.gallup.com/poll/145286/Four-Americans-Believe-Strict-Creationism.aspx

 Year Response 2010 2001 Humans evolved, with God guiding (1) 38% 37% Humans evolved, but God had no part in process (2) 16% 12% God created humans in present form (3) 40% 45% Other / No opinion (4) 6% 6%
1. Calculate the actual number of respondents in 2010 that fall in each response category.

2. State hypotheses for the following research question: have beliefs on the origin of human life changed since 2001?

3. Calculate the expected number of respondents in each category under the condition that the null hypothesis from part (b) is true.

4. Conduct a chi-square test and state your conclusion. (Reminder: verify conditions.)

###### 43Rock-paper-scissors

Rock-paper-scissors is a hand game played by two or more people where players choose to sign either “rock”, “paper”, or “scissors” with their hands. For a class project, you want to evaluate whether players choose between these three options randomly, or if certain options are favored above others. You ask two friends to play rock-paper-scissors and count the times each option is played. The following table summarizes the data:

 Rock Paper Scissors 43 21 35

Use these data to evaluate whether players choose between these three options randomly, or if certain options are favored above others. Make sure to clearly outline each step of your analysis, and interpret your results in context of the data and the research question.

Use a chi-squared goodness of fit test. $H_0\text{:}$ Each option is equally likely. $H_A\text{:}$ Some options are preferred over others. Total sample size: 99. Expected counts: (1/3) * 99 = 33 for each option. These are all above 5, so conditions are satisfied. $df = 3 - 1 = 2$ and $X^2 = \frac{(43 - 33)^2}{33} + \frac{(21 - 33)^2}{33} + \frac{(35 - 33)^2}{33} = 7.52 \rightarrow 0.02 \lt$ p-value $\lt 0.05\text{.}$ Since the p-value is less than 5%, we reject $H_0\text{.}$ The data provide convincing evidence that some options are preferred over others.

###### 44Barking deer

Microhabitat factors associated with forage and bed sites of barking deer in Hainan Island, China were examined from 2001 to 2002. In this region woods make up 4.8% of the land, cultivated grass plot makes up 14.7%, and deciduous forests makes up 39.6%. Of the 426 sites where the deer forage, 4 were categorized as woods, 16 as cultivated grassplot, and 61 as deciduous forests. The table below summarizes these data. 27 Liwei Teng et al. “Forage and bed sites characteristics of Indian muntjac (Muntiacus muntjak) in Hainan Island, China”. In: Ecological Research 19.6 (2004), pp. 675-681.

 Woods Cultivated grassplot Deciduous forests Other Total 4 16 61 345 426
1. Write the hypotheses for testing if barking deer prefer to forage in certain habitats over others.

2. What type of test can we use to answer this research question?

3. Check if the assumptions and conditions required for this test are satisfied.

4. Do these data provide convincing evidence that barking deer prefer to forage in certain habitats over others? Conduct an appropriate hypothesis test to answer this research question.

###### 45Quitters

Does being part of a support group affect the ability of people to quit smoking? A county health department enrolled 300 smokers in a randomized experiment. 150 participants were assigned to a group that used a nicotine patch and met weekly with a support group; the other 150 received the patch and did not meet with a support group. At the end of the study, 40 of the participants in the patch plus support group had quit smoking while only 30 smokers had quit in the other group.

1. Create a two-way table presenting the results of this study.

Two-way table:

 Quit Treatment Yes No Total Patch $+$ support group 40 110 150 Only Patch 30 120 150 Total 70 230 300
2. Answer each of the following questions under the null hypothesis that being part of a support group does not affect the ability of people to quit smoking, and indicate whether the expected values are higher or lower than the observed values.

1. How many subjects in the “patch + support” group would you expect to quit?

$E_{row_1, col_1} = \frac{(row1total)\times(col1total)}{tabletotal} = \frac{150 \times 70}{300} = 35\text{.}$ This is lower than the observed value.

2. How many subjects in the “patch only” group would you expect to not quit?

$E_{row_2, col_2} = \frac{(row2total)\times(col2total)}{tabletotal} = \frac{150 \times 230}{300} = 115\text{.}$ This is lower than the observed value.

###### 46Full body scan, Part II

The table below summarizes a data set we first encountered in Exercise 6.5.32 regarding views on full-body scans and political affiliation. The differences in each political group may be due to chance. Complete the following computations under the null hypothesis of independence between an individual's party affiliation and his support of full-body scans. It may be useful to first add on an extra column for row totals before proceeding with the computations.

 Party Affiliation Republican Democrat Independent Answer Should 264 299 351 Should not 38 55 77 Don't know/No answer 16 15 22 Total 318 369 450
1. How many Republicans would you expect to not support the use of full-body scans?

2. How many Democrats would you expect to support the use of full-body scans?

3. How many Independents would you expect to not know or not answer?

###### 47Offshore drilling, Part III

The table below summarizes a data set we first encountered in Exercise 6.5.29 that examines the responses of a random sample of college graduates and non-graduates on the topic of oil drilling. Complete a chi-square test for these data to check whether there is a statistically significant difference in responses from college graduates and non-graduates.

 College Grad Yes No Support 154 132 Oppose 180 126 Do not know 104 131 Total 438 389

$H_0\text{:}$ The opinion of college grads and non-grads is not different on the topic of drilling for oil and natural gas off the coast of California. $H_A\text{:}$ Opinions regarding the drilling for oil and natural gas off the coast of California has an association with earning a college degree.

 $E_{row_{1}, col_{1}} = 151.5$ $E_{row_{1}, col_{2}} = 134.5$ $E_{row_{2}, col_{1}} = 162.1$ $E_{row_{2}, col_{2}} = 143.9$ $E_{row_{3}, col_{1}} = 124.5$ $E_{row_{3}, col_{2}} = 110.5$

Independence: The samples are both random, unrelated, and from less than 10% of the population, so independence between observations is reasonable.

Sample size: All expected counts are at least 5. Degrees of freedom: $df = (R - 1) \times (C - 1) = (3 - 1) \times (2 - 1) = 2\text{,}$ which is greater than 1. $X^2 = 11.47\text{,}$ $df = 2$ $\to$ $0.001 \lt$ p-value $\lt 0.005\text{.}$

Since the p-value $\lt \alpha\text{,}$ we reject $H_0\text{.}$ There is strong evidence that there is an association between support for off-shore drilling and having a college degree.

###### 48Coffee and Depression

Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician-diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption. 28 M. Lucas et al. “Coffee, caffeine, and risk of depression among women”. In: Archives of internal medicine 171.17 (2011), p. 1571.

 Caffeinated coffee consumption $\le 1$ 2-6 1 2-3 $\ge 4$ cup/week cups/week cup/day cups/day cups/day Total Clinicaldepression Yes 670 373 905 564 95 2,607 No 11,545 6,244 16,329 11,726 2,288 48,132 Total 12,215 6,617 17,234 12,290 2,383 50,739
1. What type of test is appropriate for evaluating if there is an association between coffee intake and depression?

2. Write the hypotheses for the test you identified in part (a).

3. Calculate the overall proportion of women who do and do not suffer from depression.

4. Identify the expected count for cell containing 373, and calculate the contribution of this cell to the test statistic, i.e. $(Observed-Expected)^2/Expected\text{.}$

5. The test statistic is $X^2=20.93\text{.}$ What is the p-value?

6. What is the conclusion of the hypothesis test?

7. One of the authors of this study was quoted on the NYTimes as saying it was “too early to recommend that women load up on extra coffee” based on just this study. 29 A. O'Connor. “Coffee Drinking Linked to Less Depression in Women”. In: New York Times (2011). Do you agree with this statement? Explain your reasoning.

A December 2010 survey asked 500 randomly sampled Los Angeles residents which shipping carrier they prefer to use for shipping holiday gifts. The table below shows the distribution of responses by age group as well as the expected counts for each cell (shown in parentheses).

 Age 18-34 35-54 55+ Total Shipping Method USPS 72 (81) 97 (102) 76 (62) 245 UPS 52 (53) 76 (68) 34 (41) 162 FedEx 31 (21) 24 (27) 9 (16) 64 Something else 7 (5) 6 (7) 3 (4) 16 Not sure 3 (5) 6 (5) 4 (3) 13 Total 165 209 126 500
1. State the null and alternative hypotheses for testing for independence of age and preferred shipping method for holiday gifts among Los Angeles residents. Answer

$H_0\text{:}$ The age of Los Angeles residents is independent of shipping carrier preference variable. $H_A\text{:}$ The age of Los Angeles residents is associated with the shipping carrier preference variable.

2. Are the conditions for inference using a chi-square test satisfied? Answer

The conditions are not satisfied since some expected counts are below 5.

###### 50How's it going?

The American National Election Studies (ANES) collects data on voter attitudes and intentions as well as demographic information. In this question we will focus on two variables from the 2012 ANES dataset: 30 The American National Election Studies (ANES). The ANES 2012 Time Series Study [dataset]. Stanford University and the University of Michigan [producers].

• region (levels: Northeast, North Central, South, and West), and

• whether the respondent feels things in this country are generally going in the right direction or things have pretty seriously gotten off on the wrong track.

To keep calculations simple we will work with a random sample of 500 respondents from the ANES dataset. The distribution of responses are as follows:

 RightDirection WrongTrack Total Northeast 29 54 83 North Central 44 77 121 South 62 131 193 West 36 67 103 Total 171 329 500
1. Region: According to the 2010 Census, 18% of US residents live in the Northeast, 22% live in the North Central region, 37% live in the South, and 23% live in the West. Evaluate whether the ANES sample is representative of the population distribution of US residents. Make sure to clearly state the hypotheses, check conditions, calculate the appropriate test statistic and the p-value, and make your conclusion in context of the data. Also comment on what your conclusion says about whether or not this sample can be considered to be representative.

2. Region and direction:

1. We would like to evaluate the relationship between region and feeling about the country's direction. What is the response variable and what is the explanatory variable?

2. What are the hypotheses for evaluating this relationship?

3. Complete the hypothesis test and interpret your results in context of the data and the research question.