AHSS Chapter exercises

Section 6.5 Chapter exercises

Exercises 6.5.1 Exercises

1. Active learning.

A teacher wanting to increase the active learning component of her course is concerned about student reactions to changes she is planning to make. She conducts a survey in her class, asking students whether they believe more active learning in the classroom (hands on exercises) instead of traditional lecture will helps improve their hearning. She does this at the beginning and end of the semester and wants to evaluate whether students' opinions have changed over the semester. Can she used the methods we learned in this chapter for this analysis? Explain your reasoning.

Solution

No. The samples at the beginning and at the end of the semester are not independent since the survey is conducted on the same students.

2. Website expermiment.

The OpenIntro website occasionally experiments with design and link placement. We conducted one experiment testing three different placements of a download link for this textbook on the book's main page to see which location, if any, led to the most downloads. The number of site visitors included in the experiment was 701 and is captured in one of the response combinations in the following table:

	Download	No Download
Postion 1	13.8%	18.3%
Postion 2	14.6%	18.5%
Postion 3	12.1%	22.7%

Calculate the actual number of site visitors in each of the six response categories.
Each individual in the experiment had an equal chance of being in any of the three experiment groups. However, we see that there are slightly different totals for the groups. Is there any evidence that the groups were actually imbalanced? Make sure to clearly state hypotheses, check conditions, calculate the appropriate test statistic and the p-value, and make your conclusion in context of the data.
Complete an appropriate hypothesis test to check whether there is evidence that there is a higher rate of site visitors clicking on the textbook link in any of the three groups.

3. Shipping holiday gifts.

A local news survey asked 500 randomly sampled Los Angeles residents which shipping carrier they prefer to use for shipping holiday gifts. The table below shows the distribution of responses by age group as well as the expected counts for each cell (shown in parentheses).

		Age
		18-34		35-54		55+		Total
Shipping Method	USPS	72	(81)	97	(102)	76	(62)	245
	UPS	52	(53)	76	(68)	34	(41)	162
	FedEx	31	(21)	24	(27)	9	(16)	64
	Something else	7	(5)	6	(7)	3	(4)	16
	Not sure	3	(5)	6	(5)	4	(3)	13
	Total	165		209		126		500

State the null and alternative hypotheses for testing for independence of age and preferred shipping method for holiday gifts among Los Angeles residents.
Are the conditions for inference using a chi-square test satisfied?

Solution

(a) $H_{0}:$ The age of Los Angeles residents is independent of shipping carrier preference variable. $H_{A}:$ The age of Los Angeles residents is associated with the shipping carrier preference variable.

(b) The conditions are not satisfied since some expected counts are below 5.

4. The Civil Wat.

A national survey conducted among a simple random sample of 1,507 adults shows that 56% of Americans think the Civil War is still relevant to American politics and political life.¹

Conduct a hypothesis test to determine if these data provide strong evidence that the majority of the Americans think the Civil War is still relevant.
Interpret the p-value in this context.
Calculate a 90% confidence interval for the proportion of Americans who think the Civil War is still relevant. Interpret the interval in this context, and comment on whether or not the confidence interval agrees with the conclusion of the hypothesis test.

Pew Research Center Publications, Civil War at 150: Still Relevant, Still Divisive, data collected between March 30 - April 3, 2011.

5. College smokers.

We are interested in estimating the proportion of students at a university who smoke. Out of a random sample of 200 students from this university, 40 students smoke.

Calculate a 95% confidence interval for the proportion of students at this university who smoke, and interpret this interval in context. (Reminder: Check conditions.)
If we wanted the margin of error to be no larger than 2% at a 95% confidence level for the proportion of students who smoke, how big of a sample would we need?

Solution

(a) Independence is satisfied (random sample), as is the success-failure condition (40 smokers, 160 non-smokers). The 95% CI: $(0.145, 0.255)\text{.}$ We are 95% confident that 14.5% to 25.5% of all students at this university smoke.

(b) We want $z^{*}SE$ to be no larger than 0.02 for a 95% confidence level. We use $z^{*} = 1.96$ and plug in the point estimate $\hat{p} = 0.2$ within the SE formula: $1.96 \sqrt{0.2(1- 0.2)/n} \lt 0.02\text{.}$ The sample size n should be at least 1,537.

6. Acetaminophen and liver damage.

It is believed that large doses of acetaminophen (the active ingredient in over the counter pain relievers like Tylenol) may cause damage to the liver. A researcher wants to conduct a study to estimate the proportion of acetaminophen users who have liver damage. For participating in this study, he will pay each subject $20 and provide a free medical consultation if the patient has liver damage.

If he wants to limit the margin of error of his 98% confidence interval to 2%, what is the minimum amount of money he needs to set aside to pay his subjects?
The amount you calculated in part (a) is substantially over his budget so he decides to use fewer subjects. How will this affect the width of his confidence interval?

7. Life after college.

We are interested in estimating the proportion of graduates at a mid-sized university who found a job within one year of completing their undergraduate degree. Suppose we conduct a survey and find out that 348 of the 400 randomly sampled graduates found jobs. The graduating class under consideration included over 4500 students.

Describe the population parameter of interest. What is the value of the point estimate of this parameter?
Check if the conditions for constructing a confidence interval based on these data are met.
Calculate a 95% confidence interval for the proportion of graduates who found a job within one year of completing their undergraduate degree at this university, and interpret it in the context of the data.
What does “95% confidence” mean?
Now calculate a 99% confidence interval for the same parameter and interpret it in the context of the data.
Compare the widths of the 95% and 99% confidence intervals. Which one is wider? Explain.

Solution

(a) Proportion of graduates from this university who found a job within one year of graduating. $\hat{p} = 348/400 = 0.87\text{.}$

(b) This is a random sample,so the observations are independent. Success-failure condition is satisfied: 348 successes, 52 failures, both well above 10.

(c) $(0.8371, 0.9029)\text{.}$ We are 95% confident that approximately 84% to 90% of graduates from this university found a job within one year of completing their undergraduate degree.

(d) 95% of such random samples would produce a 95% confidence interval that includes the true proportion of students at this university who found a job within one year of graduating from college.

(e) $(0.8267, 0.9133)\text{.}$ Similar interpretation as before.

(f) 99% CI is wider, as we are more confident that the true proportion is within the interval and so need to cover a wider range.

8. Diabetes and unemployment.

A Gallup poll surveyed Americans about their employment status and whether or not they have diabetes. The survey results indicate that 1.5% of the 47,774 employed (full or part time) and 2.5% of the 5,855 unemployed 18-29 year olds have diabetes.²

Create a two-way table presenting the results of this study.
State appropriate hypotheses to test for difference in proportions of diabetes between employed and unemployed Americans.
The sample difference is about 1%. If we completed the hypothesis test, we would find that the p-value is very small (about 0), meaning the difference is statistically significant. Use this result to explain the difference between statistically significant and practically significant findings.

Gallup Wellbeing, Employed Americans in Better Health Than the Unemployed, data collected Jan. 2, 2011-May 21, 2012.

9. Rock-paper-scissors.

Rock-paper-scissors is a hand game played by two or more people where players choose to sign either rock, paper, or scissors with their hands. For your statistics class project, you want to evaluate whether players choose between these three options randomly, or if certain options are favored above others. You ask two friends to play rock-paper-scissors and count the times each option is played. The following table summarizes the data:

Rock	Paper	Scissors
43	21	35

Use these data to evaluate whether players choose between these three options randomly, or if certain options are favored above others. Make sure to clearly outline each step of your analysis, and interpret your results in context of the data and the research question.

Solution

Use a chi-squared goodness of fit test. $H_{0}:$ Each option is equally likely. $H_{A}:$ Some options are preferred over others. Total sample size: 99. Expected counts: $(1/3) * 99 = 33$ for each option. These are all above 5, so conditions are satisifed. $df =3- 1 = 2$ and $\chi^2 = \frac{(43-33)^2}{33} + \frac{(21-33)^2}{33}+ \frac{(35-33)^2}{33} =7.52 \rightarrow \text{p-value }= 0.023\text{.}$ Since the p-value is less than 5%, we reject $H_{0}\text{.}$ The data provide convincing evidence that some options are preferred over others.

10. 2010 Healthcare Law.

On June 28, 2012 the U.S. Supreme Court upheld the much debated 2010 healthcare law, declaring it constitutional. A Gallup poll released the day after this decision indicates that 46% of 1,012 Americans agree with this decision. At a 95% confidence level, this sample has a 3% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning.³

We are 95% confident that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law.
We are 95% confident that between 43% and 49% of Americans support the decision of the U.S. Supreme Court on the 2010 healthcare law.
If we considered many random samples of 1,012 Americans, and we calculated the sample proportions of those who support the decision of the U.S. Supreme Court, 95% of those sample proportions will be between 43% and 49%.
The margin of error at a 90% confidence level would be higher than 3%.

Gallup, Americans Issue Split Decision on Healthcare Ruling, data collected June 28, 2012.

11. Browsing on the mobile device.

A survey of 2,254 American adults indicates that 17% of cell phone owners browse the internet exclusively on their phone rather than a computer or other device.⁴

According to an online article, a report from a mobile research company indicates that 38 percent of Chinese mobile web users only access the internet through their cell phones.⁵ Conduct a hypothesis test to determine if these data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%.
Interpret the p-value in this context.
Calculate a 95% confidence interval for the proportion of Americans who access the internet on their cell phones, and interpret the interval in this context.

Pew Internet, Cell Internet Use 2012, data collected between March 15 - April 13, 2012.

S. Chang. “The Chinese Love to Use Feature Phone to Access the Internet”. In: M.I.C Gadget (2012).

Solution

(a) $H_{0} : p = 0.38\text{.}$ $H_{A} : p \ne = 0.38\text{.}$ Independence (random sample) and the success-failure condition are satisfied. $Z = -20.5 \rightarrow \text{p-value } \approx 0\text{.}$ Since the p-value is very small, we reject $H_{0}\text{.}$ The data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%, and the data indicate that the proportion is lower in the US.

(b) If in fact 38% of Americans used their cell phones as a primary access point to the internet, the probability of obtaining a random sample of 2,254 Americans where 17% or less or 59% or more use their only their cell phones to access the internet would be approximately 0.

(c) $(0.1545, 0.1855)\text{.}$ We are 95% confident that approximately 15.5% to 18.6% of all Americans primarily use their cell phones to browse the internet.

12. Coffee and Depression.

Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician- diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption.⁶

M. Lucas et al. “Coffee, caffeine, and risk of depression among women”. In: Archives of internal medicine 171.17 (2011), p. 1571.

		Caffeinated coffee consumption
		$\le 1$ cup/week	2-6 cups/week	1 cup/day	2-3 cups/day	$\ge 4$ cups/day	Total
Clinical depression	Yes	670	373	905	564	95	2607
	No	11545	6244	16329	11726	2288	48132
	Total	12215	6617	17234	12290	2383	50739

What type of test is appropriate for evaluating if there is an association between coffee intake and depression?
Write the hypotheses for the test you identified in part (a).
Calculate the overall proportion of women who do and do not suffer from depression.
Identify the expected count for the highlighted cell, and calculate the contribution of this cell to the test statistic, i.e. $(\text{Observed }-\text{ Expected})^2=\text{Expected}\text{.}$
The test statistic is $\chi^2 = 20.93\text{.}$ What is the p-value?
What is the conclusion of the hypothesis test?
One of the authors of this study was quoted on the NYTimes as saying it was “too early to recommend that women load up on extra coffee” based on just this study.⁷ Do you agree with this statement? Explain your reasoning.

A. O'Connor. “Coffee Drinking Linked to Less Depression in Women”. In: New York Times (2011).