AHSS Difference of two proportions

Section 6.2 Difference of two proportions

We often wish two compare to groups to each other. In this section, we will answer the following questions:

How much more effective is a blood thinner than a placebo for those who undergo CPR for a heart attack?
How different is the approval of the 2010 healthcare law under two different question phrasings?
Does the use of fish oils reduce heart attacks better than a placebo?

Subsection 6.2.1 Learning objectives

State and verify whether or not the conditions for inference on the difference of two proportions using a normal distribution are met.
Recognize that the standard error calculation is different for the test and for the interval, and explain why that is the case.
Know how to calculate the pooled proportion and when to use it.
Carry out a complete confidence interval procedure for the difference of two proportions.
Carry out a complete hypothesis test for the difference of two proportions.

Subsection 6.2.2 Sampling distribution of the difference of two proportions

In this section we want to compare two proportions to each other. We can start by taking their difference. If the difference is positive it tells us that the first one is larger. If it is negative, it tells use that the second one is larger. If the difference is zero, it tells us that they are equal. When comparing two proportions, then, the quantity that we want to estimate is really the difference: \(p_1-p_2\text{.}\) This tells us how far apart the two proportions are.

Before we find a test statistic and perform inference for the two proportion case, we must investigate the sampling distribution of \(\hat{p}_1-\hat{p}_2\text{,}\) which will become our point estimate. We know that the sampling distribution should be centered on \(p_1-p_2\text{.}\) The standard deviation of \(\hat{p}_1-\hat{p}_2\) can be computed as:

\begin{gather*} SD_{\hat{p}_1 - \hat{p}_2} = \sqrt{(SD_{\hat{p}_1})^2 + (SD_{\hat{p}_2})^2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \end{gather*}

Like with \(\hat{p}\text{,}\) the difference of two sample proportions \(\hat{p}_1-\hat{p}_2\) follows a normal distribution when certain conditions are met. First, the sampling distribution for each sample proportion must be nearly normal, and secondly, the samples must be independent. Under these two conditions, the sampling distribution of \(\hat{p}_1 - \hat{p}_2\) may be well approximated using the normal model.

Subsection 6.2.3 Checking conditions for inference using a normal distribution

When comparing two proportions, we carry out inference on \(p_1-p_2\text{.}\) The assumptions are that the observations are independent, both between groups and within groups and that the sampling distribution of \(\hat{p}_1-\hat{p}_2\) is nearly normal. We check whether these assumptions are reasonable by verifying the following conditions.

Independent. Observations between groups can be considered independent when the data are collected from two independent random samples or, in the context of experiments, from two randomly assigned treatments. Randomly assigning subjects to treatments is equivalent to randomly assigning treatments to subjects. When sampling without replacement from a finite population, observations can be considered independent when sampling less than 10% of the population.

Nearly normal sampling distribution. The sampling distribution of \(\hat{p}_1-\hat{p}_2\) will be nearly normal when the success-failure condition is met for both groups. In the two sample case, Instead of checking two inequalities, there are four to check.

Subsection 6.2.4 Confidence interval for the difference of two proportions

We consider an experiment for patients who underwent CPR for a heart attack and were subsequently admitted to a hospital. These patients were randomly divided into a treatment group where they received a blood thinner or the control group where they did not receive a blood thinner. The outcome variable of interest was whether the patients survived for at least 24 hours. The results are shown in Table 6.2.1.


	Survived	Died	Total

Treatment	14	26	40
Control	11	39	50

Total	25	65	90

Table 6.2.1. Results for the CPR study. Patients in the treatment group were given a blood thinner, and patients in the control group were not.

Here, the parameter of interest is a difference of population proportions, specifically, the difference in the proportion of similar patients that would survive for at least 24 hours if in the treatment group versus if in the control group. Let:

\begin{align*} p_1:\amp \text{ proportion that would survive in treatment group, and }\\ p_2:\amp \text{ proportion that would survive in control group } \end{align*}

Then the parameter of interest is \(p_1 - p_2\text{.}\) In order to use a Z-interval to estimate this difference, we must see if the point estimate, \(\hat{p}_{1} - \hat{p}_{2}\text{,}\) follows a normal distribution. Because the patients were randomly assigned to one of the two groups and one heart attack patient is unlikely to influence the next that was in the study, the observations are considered independent, both within the samples and between the samples. Next, the success-failure condition should be verified for each group. We use the sample proportions along with the sample sizes to check the condition.

\begin{align*} n_1\hat{p}_1\amp \ge 10 \amp n_1(1-\hat{p}_1)\amp \ge 10 \amp n_2\hat{p}_2\amp \ge 10 \amp n_2(1-\hat{p}_2)\amp \ge 10\\ 40 \times \frac{14}{40} \amp \ge 10 \amp 40 \times (1-\frac{14}{40}) \amp \ge 10 \amp 50 \times \frac{11}{50} \amp \ge 10 \amp 50 \times (1-\frac{11}{50}) \amp \ge 10 \end{align*}

Because all conditions are met, the normal model can be used for the point estimate of the difference in survival rate.

The point estimate is:

\begin{gather*} \hat{p}_{1} - \hat{p}_{2} = \frac{14}{40} - \frac{11}{50} = 0.35 - 0.22 = 0.13 \end{gather*}

We compute the standard error for the difference of sample proportions in the same way that we compute the standard deviation for the difference of sample proportions — the only difference is that we use the sample proportions in place of the population proportions:

\begin{align*} SE \amp = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}= \sqrt{\frac{0.35 (1 - 0.35)}{40} + \frac{0.22 (1 - 0.22)}{50}} = 0.095 \end{align*}

Let us estimate the true difference in survival rate with 90% confidence. For a 90% confidence level, we use \(z^{\star} = 1.645\text{.}\) The 90% confidence interval is calculated as:

\begin{align*} \text{ point estimate } \ \pm\amp \ z^{\star} \times SE\ \text{ of estimate }\\ 0.13 \ \pm\amp \ 1.65\times 0.095\\ (-0.027,\ \amp 0.287) \end{align*}

We are 90% confident that the true difference in the survival rate (treatment \(-\) control) lies between -0.027 and 0.095. That is, we are 90% confident that the treatment of blood thinners changes survival rate for patients like those in the study by -2.7% to +28.7% percentage points. Because this interval contains both negative and positive values, we do not have enough information to say with confidence whether blood thinners harm or help heart attack patients who have been admitted after they have undergone CPR.

Constructing a confidence interval for the difference of two proportions.

To carry out a complete confidence interval procedure to estimate the difference of two proportions \(p_1-p_2\text{,}\)

Identify: Identify the parameter and the confidence level, C%.

The parameter will be a difference of proportions, e.g. the true difference in the proportion of 17 and 18 year olds with a summer job (proportion of 18 year olds \(-\) proportion of 17 year olds).

Choose: Identify the correct interval procedure and identify it by name.

Here we choose the 2-proportion Z-interval.

Check: Check conditions for the sampling distribution of \(\hat{p}_1-\hat{p}_2\) to be nearly normal.

Data come from 2 independent random samples or 2 randomly assigned treatments.
\(n_1\hat{p}_1\geq10\text{,}\) \(n_1(1-\hat{p}_1)\geq10\text{,}\) \(n_2\hat{p}_2\geq10\text{,}\) and \(n_2(1-\hat{p}_2)\geq10\)

Calculate: Calculate the confidence interval and record it in interval form.

\(\text{ point estimate } \ \pm\ z^{\star} \times SE\ \text{ of estimate }\)
- point estimate: the difference of sample proportions \(\hat{p}_1 - \hat{p}_2\)
- \(SE\) of estimate: \(\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)
- \(z^{\star}\text{:}\) use a \(t\)-table at row \(\infty\) and confidence level C
(, )

Conclude: Interpret the interval and, if applicable, draw a conclusion in context.

We are C% confident that the true difference in the proportion of [...] is between and . If applicable, draw a conclusion based on whether the interval is entirely above, is entirely below, or contains the value 0.

Example 6.2.2.

A remote control car company is considering a new manufacturer for wheel gears. The new manufacturer would be more expensive but their higher quality gears are more reliable, resulting in happier customers and fewer warranty claims. However, management must be convinced that the more expensive gears are worth the conversion before they approve the switch. The quality control engineer collects a sample of gears, examining 1000 gears from each company and finds that 879 gears pass inspection from the current supplier and 958 pass inspection from the prospective supplier. Using these data, construct a 95% confidence interval for the difference in the proportion from each supplier that would pass inspection. Use the five step framework described above to organize your work.

Solution

Identify: First we identify the parameter of interest. Here the parameter we wish to estimate is the true difference in the proportion of gears from each supplier that would pass inspection, \(p_1-p_2\text{.}\) We will take the difference as: current \(-\) prospective, so \(p_1\) is the true proportion that would pass from the current supplier and \(p_2\) is the true proportion that would pass from the prospective supplier. We will estimate the difference using a 95% confidence level.

Choose: Because the parameter to be estimated is a difference of proportions, we will use a 2-proportion Z-interval.

Check: The samples are independent, but not necessarily random, so to proceed we must assume the gears are all independent. For this sample we will suppose this assumption is reasonable, but the engineer would be more knowledgeable as to whether this assumption is appropriate. We also must verify the minimum sample size conditions:

\begin{align*} 1000 \times \frac{879}{1000} \amp \ge 10 \amp 1000 \times \frac{121}{1000} \amp \ge 10 \amp 1000 \times \frac{958}{1000} \amp \ge 10 \amp 1000 \times \frac{42}{1000} \amp \ge 10 \end{align*}

The success-failure condition is met for both samples.

Calculate: We will calculate the interval:

\begin{gather*} \text{ point estimate } \ \pm\ z^{\star} \times SE\ \text{ of estimate } \end{gather*}

The point estimate is the difference of sample proportions: \(\hat{p}_1-\hat{p}_2 = 0.879 - 0.958 = -0.079\text{.}\)

The \(SE\) of the difference of sample proportions is:

\(\sqrt{\frac{\ \hat{p}_1(1-\hat{p}_1)\ }{n_1}+ \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}\ } = \sqrt{\frac{0.879(1-0.879)}{1000} +\frac{0.958(1-0.958)}{1000}}= 0.0121\)

So the 95% confidence interval is given by:

\begin{align*} 0.879 - 0.958 \ \pm \ \amp 1.96 \times \sqrt{\frac{0.879(1-0.879)}{1000} +\frac{0.958(1-0.958)}{1000}}\\ -0.079\ \pm\ \amp 1.96 \times 0.0121\\ (-0.1\amp 03,\ -0.055) \end{align*}

Conclude: We are 95% confident that the true difference (current \(-\) prospective) in the proportion that would pass inspection is between -0.103 and -0.055, meaning that we are 95% confident that the prospective supplier would have between a 5.5% and 10.3% greater rate of passing inspection. Because the entire interval is below zero, the data provide sufficient evidence that the prospective gears pass inspection more often than the current gears. The remote control car company should go with the new manufacturer.

Subsection 6.2.5 Calculator: the 2-proportion Z-interval

As with the 1-proportion Z-interval, a calculator can be helpful for evaluating the final interval.

TI-83/84: 2-proportion Z-interval.

Use STAT, TESTS, 2-PropZInt.

Choose STAT.
Right arrow to TESTS.
Down arrow and choose B:2-PropZInt.
Let x1 be the number of yeses (must be an integer) in sample 1 and let n1 be the size of sample 1.
Let x2 be the number of yeses (must be an integer) in sample 2 and let n2 be the size of sample 2.
Let C-Level be the desired confidence level.
Choose Calculate and hit ENTER, which returns:

(,) the confidence interval

\(\hat{p}_1\) sample 1 proportion \(n_1\) size of sample 1

\(\hat{p}_2\) sample 2 proportion \(n_2\) size of sample 2

Casio fx-9750GII: 2-proportion Z-interval.

Navigate to STAT (MENU button, then hit the 2 button or select STAT).
Choose the INTR option (F4 button).
Choose the Z option (F1 button).
Choose the 2-P option (F4 button).
Specify the interval details:
- Confidence level of interest for C-Level.
- Enter the number of successes for each group, x1 and x2.
- Enter the sample size for each group, n1 and n2.
Hit the EXE button, which returns

Left, Right the ends of the confidence interval

\(\hat{p}1\text{,}\) \(\hat{p}2\) the sample proportions

n1, n2 sample sizes

Checkpoint 6.2.3.

From Example 6.2.2, we have that a quality control engineer collects a sample of gears, examining 1000 gears from each company and finds that 879 gears pass inspection from the current supplier and 958 pass inspection from the prospective supplier. Use a calculator to find a 95% confidence interval for the difference (current \(-\) prospective) in the proportion that would pass inspection.¹

Navigate to the 2-proportion Z-interval on the calculator. Let x1 \(= 879\text{,}\) n1 \(= 1000\text{,}\) x2 \(= 958\text{,}\) and n2\(= 1000\text{.}\) C-Level is .95. This should lead to an interval of \((-0.1027, -0.0553)\text{,}\) which matches what we found previously.

Subsection 6.2.6 Hypothesis testing when \(H_0\text{:}\) \(p_1 = p_2\)

Here we use a new example to examine a special estimate of the standard error when the null hypothesis is that two population proportions equal each other, i.e. \(H_0\text{:}\) \(p_1 = p_2\text{.}\) We investigate whether the way a question is phrased can influence a person's response. Pew Research Center conducted a survey with the following question:²

https://www.people-press.org/2012/03/26/public-remains-split-on-health-care-bill-opposed-to-mandate/. Sample sizes for each polling group are approximate.

As you may know, by 2014 nearly all Americans will be required to have health insurance. [People who do not buy insurance will pay a penalty] while [People who cannot afford it will receive financial help from the government]. Do you approve or disapprove of this policy?

For each randomly sampled respondent, the statements in brackets were randomized: either they were kept in the original order given above, or they were reversed. Results are presented in Table 6.2.4

	sample size	Approve law (%)	Disapprove law (%)	Other
“People who do not buy insurance will pay a penalty” is given first (original order)	771	47	49	4
“People who cannot afford it will receive financial help from the government” is given first (reversed order)	732	34	63	3

Table 6.2.4. Results for a Pew Research Center poll where the ordering of two statements in a question regarding healthcare were randomized.

Checkpoint 6.2.5.

Is this study an experiment or an observational study? ³

There is a random sample involved, but there are also two treatments. Half of the the respondents are given the original statement order and the other half, randomly, are given the reversed statement order. This is an experiment because there are randomly assigned treatments.

The approval percents of 47% and 34% seem far apart. However, could this difference be due to random chance? We will answer this question using a hypothesis test. To simplify things, let

\begin{align*} p_1\amp \text{ : the proportion of respondents that would approve of policy with the original statement ordering, and }\\ p_2\amp \text{ : the proportion of respondents that would approve of policy with the reversed statement ordering. } \end{align*}

Example 6.2.6.

Set up hypotheses to test whether the two statement orders produce the same response.

Solution

The null claim is that the question order does not matter, that is, that the two proportions should be equal. The alternate claim, the one that bears the burden of proof, is that the question ordering does matter.

\(H_0\text{:}\) \(p_1 = p_2\)

\(H_A\text{:}\) \(p_1 \ne p_2\)

Now, we can note that:

\begin{align*} p_1=p_2 \amp \text{ is equivalent to } p_1-p_2=0\text{ , and }\\ p_1\ne p_2 \amp \text{ is equivalent to } p_1-p_2\ne 0\text{.} \end{align*}

We can now see that the hypotheses are really about a difference of proportions: \(p_1-p_2\text{.}\) In the last section, we used a 2-proportion Z-interval to estimate the parameter \(p_1-p_2\text{;}\) here, we will use a 2-proportion Z-test to test the null hypothesis that \(p_1-p_2=0\text{,}\) i.e. that \(p_1=p_2\text{.}\)

Recall that the test statistic Z has the form:

\begin{gather*} Z = \frac{\text{ point estimate } - \text{ null value } }{SE\ \text{ of estimate } } \end{gather*}

The parameter of interest is \(p_1-p_2\text{,}\) so the point estimate will be the observed difference of sample proportions: \(\hat{p}_{1} - \hat{p}_{2} = 0.47 - 0.34 = 0.13\text{.}\)

The null value depends on the null hypothesis. The null hypothesis is that the approval rate would be the same for both statement orderings, i.e. that the difference is 0, therefore, the null value is 0. In this section we consider only the case where \(H_0\text{:}\) \(p_1=p_2\text{,}\) so the null value for the difference will always be 0.

The \(SD\) of a difference of sample proportions has the form:

\begin{gather*} SD = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \end{gather*}

However, in a hypothesis test, the distribution of the point estimate is always examined assuming the null hypothesis is true, i.e. in this case, \(p_1 = p_2\text{.}\) Both the success-failure check and the standard error formula should reflect this equality in the null hypothesis. We will use \(p_c\) to represent the common proportion that support healthcare law regardless of statement order:

\begin{align*} SD \amp = \sqrt{\frac{p_c(1-p_c)}{n_1} + \frac{p_c(1-p_c)}{n_2}}\\ \amp = \sqrt{p_c(1-p_c)}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \end{align*}

We don't know the true proportion \(p_c\text{,}\) but we can obtain a good estimate of it, \(\hat{p}_c\text{,}\) by pooling the results of both samples. We find the total number of “yeses” or “successes” and divide that by the total number of cases. This is equivalent to taking a weighted average of \(\hat{p}_1\) and \(\hat{p}_2\text{.}\) We call \(\hat{p}_c\) the pooled sample proportion, and we use it to check the success-failure condition and to compute the standard error when the null hypothesis is that \(p_1 = p_2\text{.}\) Here:

\begin{equation*} \hat{p}_c = \frac{771(0.47) + 732(0.34)}{771+732}= 0.407 \end{equation*}

Pooled sample proportion.

When the null hypothesis is \(p_1 = p_2\text{,}\) it is useful to find the pooled sample proportion:

\begin{gather*} \hat{p}_c = \frac{\text{ number of “successes” } }{\text{ number of cases } } = \frac{\text{x} _1+\text{x} _2}{n_1+n_2}=\frac{n_1\hat{p}_1 + n_2\hat{p}_2}{n_1 + n_2} \end{gather*}

Here \(\text{x} _1\) represents the number of successes in sample 1. If \(\text{x} _1\) is not given, it can be computed as \(n_1\times \hat{p}_1\text{.}\) Similarly, \(\text{x} _2\) represents the number of successes in sample 2 and can be computed as \(n_2\times \hat{p}_2\text{.}\)

Use the pooled sample proportion when \(H_0\text{:}\) \({p}_1 = {p}_2\).

When the null hypothesis states that the proportions are equal, we use the pooled sample proportion (\(\hat{p}_c\)) to check the success-failure condition and to estimate the standard error:

\begin{gather} SE =\sqrt{\hat{p}_c(1-\hat{p}_c)}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\label{seOfDiffInPropUsingPooledEstimate}\tag{6.2.1} \end{gather}

Example 6.2.7.

Verify that conditions for using the normal are met and find the \(SE\) of estimate for this hypothesis test. Recall that the pooled proportion \(\hat{p}_c=0.407\text{,}\) \(n_1 = 771\text{,}\) and \(n_2=732\text{.}\)

Solution

The data do come from two randomly assigned treatments, where the treatments are the two different orderings of the question regarding healthcare. Also, the success-failure condition (minimums of 10) easily holds for each group.

\begin{align*} 771 \times 0.407 \amp \ge 10 \amp 771 \times (1-0.407) \amp \ge 10 \amp 732 \times 0.407 \amp \ge 10 \amp 732 \times (1-0.407) \amp \ge 10 \end{align*}

Here, we compute the \(SE\) for the difference of sample proportions as:

\begin{equation*} SE =\sqrt{\hat{p}_c(1-\hat{p}_c)}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}=\sqrt{0.407(1-0.407)}\sqrt{\frac{1}{771} + \frac{1}{732}}=0.025 \end{equation*}

Example 6.2.8.

Complete the hypothesis test using a significance level of 0.01.

Solution

We have already set up the hypotheses and verified that the difference of proportions can be modeled using a normal distribution. We can now calculate the test statistic and p-value.

\begin{gather*} Z = \frac{\text{ point estimate } - \text{ null value } }{SE\ \text{ of estimate } }= \frac{(0.47-0.34) - 0}{0.025} = 5.2 \end{gather*}

This is a two-tailed test as \(H_A\) is that \(p_1\ne p_2\text{.}\) We can find the area in one tail and double it. Here, the p-value \(\approx\) 0. Because the p-value is smaller than \(\alpha = 0.01\text{,}\) we reject the null hypothesis and conclude that the order of the statements affects how likely a respondent is to support the 2010 healthcare law.

Hypothesis testing for the difference of two proportions.

To carry out a complete hypothesis test to test the claim that two proportions \(p_1\) and \(p_2\) are equal to each other,

Identify: Identify the hypotheses and the significance level, \(\alpha\text{.}\)

\(H_0\text{:}\) \(p_1=p_2\)
\(H_A\text{:}\) \(p_1\ne p_2\text{;}\) \(H_A\text{:}\) \(p_1>p_2\text{;}\) or \(H_A\text{:}\) \(p_1\lt p_2\)

Choose: Identify the correct test procedure and identify it by name.

Here we choose the 2-proportion Z-test.

Check: Check conditions for the sampling distribution of \(\hat{p}_1-\hat{p}_2\) to be nearly normal.

1. Data come from 2 independent random samples or from 2 randomly assigned treatments.
\(n_1\hat{p}_c\geq 10\text{,}\) \(n_1(1-\hat{p}_c)\geq 10\text{,}\) \(n_2\hat{p}_c\geq 10\text{,}\) and \(n_2(1-\hat{p}_c)\geq 10\)

Calculate: Calculate the Z-statistic and p-value.

\(Z = \frac{\text{ point estimate } - \text{ null value } }{SE \text{ of estimate } }\)
- point estimate: the difference of sample proportions \(\hat{p}_1 - \hat{p}_2\)
- \(SE\) of estimate: \(\sqrt{\hat{p}_c(1-\hat{p}_c)}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\text{,}\) where \(\hat{p}\) is the pooled proportion
- null value: 0
p-value = (based on the Z-statistic and the direction of \(H_A\))

Conclude: Compare the p-value to \(\alpha\text{,}\) and draw a conclusion in context.

If the p-value is \(\lt \alpha\text{,}\) reject \(H_0\text{;}\) there is sufficient evidence that [\(H_A\) in context].
If the p-value is \(> \alpha\text{,}\) do not reject \(H_0\text{;}\) there is not sufficient evidence that [\(H_A\) in context].

Example 6.2.9.

A 5-year experiment was conducted to evaluate the effectiveness of fish oils on reducing heart attacks, where each subject was randomized into one of two treatment groups. We'll consider heart attack outcomes in these patients:


	heart_attack	no_event	Total

fish_oil	145	12788	12933
placebo	200	12738	12938

Carry out a complete hypothesis test at the 10% significance level to test whether the use of fish oils is effective in reducing heart attacks.

Solution

Identify: Define \(p_1\) and \(p_2\) as follows:

\(p_1\text{:}\) the true proportion that would suffer a heart attack if given fish oil

\(p_2\text{:}\) the true proportion that would suffer a heart attack if given placebo

We will test the following hypotheses at the \(\alpha=0.10\) significance level.

\(H_0\text{:}\) \(p_1=p_2\) Fish oil and placebo are equally effective.

\(H_A\text{:}\) \(p_1 \lt p_2\) Fish oil is effective in reducing heart attacks.

Choose: Because we are testing whether two proportions equal each other, we choose the 2-proportion Z-test.

Check: We must verify that the difference of sample proportions can be modeled using a normal distribution. First we note that there are two randomly assigned treatments. Second, we calculate the pooled proportion as follows:

\begin{equation*} \hat{p}_c = \frac{x_1+x _2}{n_1+n_2}=\frac{145 + 200}{12933 + 12938}=0.0133 \end{equation*}

We can now verify: \(12933(0.0133)\geq10\text{,}\) \(12933(1-0.0133)\geq10\text{,}\) \(12938(0.0133)\geq10\text{,}\) and \(12938(1-0.0133)\geq10\text{,}\) so both conditions are met.

Calculate: We will calculate the Z-statistic and the p-value.

\begin{gather*} Z = \frac{\text{ point estimate } - \text{ null value } }{SE \text{ of estimate } } \end{gather*}

The point estimate is the difference of sample proportions: \(\hat{p}_1-\hat{p}_2 = 0.0112 - 0.0155 = -0.0043\text{.}\)

The value hypothesized for the parameter in \(H_0\) is the null value: null value = 0.

The pooled proportion, calculated above, is: \(\hat{p}_c = 0.0133\text{.}\)

The \(SE\) of the difference of sample proportions, assuming \(H_0\) is true, is:

\(\sqrt{\hat{p}_c(1-\hat{p}_c)}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} = \sqrt{0.0133(1-0.0133)}\sqrt{\frac{1}{12933} + \frac{1}{12938}}=0.00142\text{.}\)

\begin{gather*} Z = \frac{-0.0043 - 0}{0.00142} = -3.0 \end{gather*}

Because \(H_A\) uses a less than, meaning that it is a lower-tail test, the p-value is the area to the left of \(Z=-3.0\) under the standard normal curve. This area can be found using a normal table or a calculator. The area or p-value = \(0.0013\text{.}\)

Conclude: The p-value of 0.0013 is \(\lt 0.10\text{,}\) so we reject \(H_0\text{;}\) there is sufficient evidence that fish oil is effective in reducing heart attacks.

Subsection 6.2.7 Calculator: the 2-proportion Z-test

TI-83/84: 2-proportion Z-test.

Use STAT, TESTS, 2-PropZTest.

Choose STAT.
Right arrow to TESTS.
Down arrow and choose 6:2-PropZTest.
Let x1 be the number of yeses (must be an integer) in sample 1 and let n1 be the size of sample 1.
Let x2 be the number of yeses (must be an integer) in sample 2 and let n2 be the size of sample 2.
Choose \(\ne\text{,}\) \(\lt\text{,}\) or > to correspond to \(H_A\text{.}\)
Choose Calculate and hit ENTER, which returns:

z Z-statistic p p-value

\(\hat{p}_1\) sample 1 proportion \(\hat{p}\) pooled sample proportion

\(\hat{p}_2\) sample 2 proportion

Casio fx-9750GII: 2-proportion Z-test.

Navigate to STAT (MENU button, then hit the 2 button or select STAT).
Choose the TEST option (F3 button).
Choose the Z option (F1 button).
Choose the 2-P option (F4 button).
Specify the test details:
- Specify the sidedness of the test using the F1, F2, and F3 keys.
- Enter the number of successes for each group, x1 and x2.
- Enter the sample size for each group, n1 and n2.
Hit the EXE button, which returns

z Z-statistic \(\hat{p}1\text{,}\) \(\hat{p}2\) sample proportions

p p-value \(\hat{p}\) pooled proportion

n1, n2 sample sizes

Checkpoint 6.2.10.

Use a calculator to find the test statistic, p-value, and pooled proportion for a test with: \(H_A\text{:}\) \(p\) for fish oil \(\lt p\) for placebo.⁴

Correctly going through the calculator steps should lead to a solution with the test statistic z \(= -2.977\) and the p-value p \(= 0.00145\text{.}\) These two values match our calculated values from the previous example to within rounding error. The pooled proportion is given as \(\hat_{p}\) \(= 0.0133\text{.}\) Note: values for x1 and x2 were given in the table. If, instead, proportions are given, find x1 and x2 by multiplying the proportions by the sample sizes and rounding the result to an integer.


	heart_attack	no_event	Total

fish_oil	145	12788	12933
placebo	200	12738	12938

Subsection 6.2.8 Section summary

In the previous section, we looked at inference for a single proportion. In this section, we compared two groups to each other with respect to a proportion or a percent.

We are interested in whether the true proportion of yeses is the same or different between two distinct groups. Call these proportions \(p_1\) and \(p_2\text{.}\) The difference, \(p_1-p_2\) tells us whether \(p_1\) is greater than, less than, or equal to \(p_2\text{.}\)
When comparing two proportions to each other, the parameter of interest is the difference of proportions, \(p_1-p_2\text{,}\) and we use the difference of sample proportions, \(\hat{p}_1-\hat{p}_2\text{,}\) as the point estimate.
The sampling distribution of \(\hat{p}_1-\hat{p}_2\) is nearly normal when the success-failure condition is met for both groups and when the data is collected using 2 independent random samples or 2 randomly assigned treatments. When the sampling distribution of \(\hat{p}_1-\hat{p}_2\) is nearly normal, the standardized test statistic also follows a normal distribution.
When the null hypothesis is that the two populations proportions are equal to each other, use the pooled sample proportion \(\hat{p}_c=\frac{x_1+x_2}{n_1+n_2}\text{,}\) i.e. the combined number of yeses over the combined sample sizes, when verifying the success-failure condition and when finding the \(SE\text{.}\) For the confidence interval, do not use the pooled sample proportion; use the separate values of \(\hat{p}_1\) and \(\hat{p}_2\text{.}\)
When there are two samples or treatments and the parameter of interest is a difference of proportions, e.g. the true difference in proportion of 17 and 18 year olds with a summer job (proportion of 18 year olds \(-\) proportion of 17 year olds):
- Estimate \(p_1-p_2\) at the C% confidence level using a 2-proportion Z-interval.
- Test \(H_0\text{:}\) \(p_1-p_2=0\) (i.e. \(p_1=p_2\)) at the \(\alpha\) significance level using a 2-proportion Z-test.
Verify the conditions for using a normal model:
1. Data come from 2 independent random samples or 2 randomly assigned treatments.
2. CI: \(n_1\hat{p}_1\ge 10\text{,}\) \(n_1(1-\hat{p}_1)\ge 10\text{,}\) \(n_2\hat{p}_2\ge 10\text{,}\) and \(n_2(1-\hat{p}_2)\ge 10\)
  
  Test: \(n_1\hat{p}_c\ge 10\text{,}\) \(n_1(1-\hat{p}_c)\ge 10\text{,}\) \(n_2\hat{p}_c\ge 10\text{,}\) and \(n_2(1-\hat{p}_c)\ge 10\)
When the conditions are met, we calculate the confidence interval and the test statistic using the same structure as in the previous section.
- Confidence interval: \(\text{ point estimate } \ \pm\ z^{\star} \times SE\ \text{ of estimate }\)
- Test statistic: \(Z = \frac{\text{ point estimate } - \text{ null value } }{SE \text{ of estimate } }\)
Here the point estimate is the difference of sample proportions \(\hat{p}_1 - \hat{p}_2\text{.}\)

The \(SE\) of estimate is the \(SE\) of a difference of sample proportions.
For a CI, use: \(SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\text{.}\)
For a Test, use: \(SE = \sqrt{\hat{p}_c(1-\hat{p}_c)}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\text{.}\)

Exercises 6.2.9 Exercises

1. Social experiment, Part I.

A “social experiment” conducted by a TV program questioned what people do when they see a very obviously bruised woman getting picked on by her boyfriend. On two different occasions at the same restaurant, the same couple was depicted. In one scenario the woman was dressed “provocatively” and in the other scenario the woman was dressed “conservatively”. The table below shows how many restaurant diners were present under each scenario, and whether or not they intervened.

		Scenario
		Provocative	Conservative	Total
Intervene	Yes	5	15	20
	No	15	10	25
	Total	20	25	45

Explain why the sampling distribution of the difference between the proportions of interventions under provocative and conservative scenarios does not follow an approximately normal distribution.

Solution

This is not a randomized experiment, and it is unclear whether people would be affected by the behavior of their peers. That is, independence may not hold. Additionally, there are only 5 interventions under the provocative scenario, so the success-failure condition does not hold. Even if we consider a hypothesis test where we pool the proportions, the success-failure condition will not be satisfied. Since one condition is questionable and the other is not satisfied, the difference in sample proportions will not follow a nearly normal distribution.

2. Heart transplant success.

The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was officially designated a heart transplant candidate, meaning that he was gravely ill and might benefit from a new heart. Patients were randomly assigned into treatment and control groups. Patients in the treatment group received a transplant, and those in the control group did not. The table below displays how many patients survived and died in each group.⁵

B. Turnbull et al. “Survivorship of Heart Transplant Data”. In: Journal of the American Statistical Association 69 (1974), pp. 74-80.


	control	treatment

alive	4	24
dead	30	45

Suppose we are interested in estimating the difference in survival rate between the control and treatment groups using a confidence interval. Explain why we cannot construct such an interval using the normal approximation. What might go wrong if we constructed the confidence interval despite this problem?

3. Gender and color preference.

A study asked 1,924 male and 3,666 female undergraduate college students their favorite color. A 95% confidence interval for the difference between the proportions of males and females whose favorite color is black \((p_{male} - p_{female})\) was calculated to be (0.02, 0.06). Based on this information, determine if the following statements are true or false, and explain your reasoning for each statement you identify as false.⁶

We are 95% confident that the true proportion of males whose favorite color is black is 2% lower to 6% higher than the true proportion of females whose favorite color is black.
We are 95% confident that the true proportion of males whose favorite color is black is 2% to 6% higher than the true proportion of females whose favorite color is black.
95% of random samples will produce 95% confidence intervals that include the true difference between the population proportions of males and females whose favorite color is black.
We can conclude that there is a significant difference between the proportions of males and females whose favorite color is black and that the difference between the two sample proportions is too large to plausibly be due to chance.
The 95% confidence interval for \((p_{female} - p_{male})\) cannot be calculated with only the information given in this exercise.

L Ellis and C Ficek. “Color preferences according to gender and sexual orientation”. In: Personality and Individual Differences 31.8 (2001), pp. 1375-1379.

Solution

(a) False. The entire confidence interval is above 0.

(b) True.

(d) True.

(e) False. It is simply the negated and reordered values: \((-0.06,-0.02)\text{.}\)

4. The Daily Show.

A Pew Research foundation poll indicates that among 1,099 college graduates, 33% watch The Daily Show. Meanwhile, 22% of the 1,110 people with a high school degree but no college degree in the poll watch The Daily Show. A 95% confidence interval for \((p_\text{ college grad } - p_\text{ HS or less } )\text{,}\) where \(p\) is the proportion of those who watch The Daily Show, is (0.07, 0.15). Based on this information, determine if the following statements are true or false, and explain your reasoning if you identify the statement as false. ⁷

At the 5% significance level, the data provide convincing evidence of a difference between the proportions of college graduates and those with a high school degree or less who watch The Daily Show.
We are 95% confident that 7% less to 15% more college graduates watch The Daily Show than those with a high school degree or less.
95% of random samples of 1,099 college graduates and 1,110 people with a high school degree or less will yield differences in sample proportions between 7% and 15%.
A 90% confidence interval for \((p_\text{ college grad } - p_\text{ HS or less } )\) would be wider.
A 95% confidence interval for \((p_\text{ HS or less } - p_\text{ college grad } )\) is (-0.15,-0.07).

The Pew Research Center, Americans Spending More Time Following the News, data collected June 8-28, 2010.

5. National Health Plan, Part III.

Exercise 6.1.10.11 presents the results of a poll evaluating support for a generically branded “National Health Plan” in the United States. 79% of 347 Democrats and 55% of 617 Independents support a National Health Plan.

Calculate a 95% confidence interval for the difference between the proportion of Democrats and Independents who support a National Health Plan \((p_{D} - p_{I})\text{,}\) and interpret it in this context. We have already checked conditions for you.
True or false: If we had picked a random Democrat and a random Independent at the time of this poll, it is more likely that the Democrat would support the National Health Plan than the Independent.

Solution

(a) Standard error:

\begin{gather*} SE=\sqrt{ \frac{0.79(1-0.79)}{347} + \frac{0.55(1-0.55)}{617} }=0.33 \end{gather*}

Using \(z^{*}=1.96\text{,}\) we get:

\begin{gather*} 0.79-0.55 \pm 1.96 \times 0.03 \rightarrow (0.181, 0.299) \end{gather*}

We are 95% confident that the proportion of Democrats who support the plan is 18.1% to 29.9% higher than the proportion of Independents who support the plan.

(b) True.

6. Sleep deprivation, CA vs. OR, Part I.

According to a report on sleep deprivation by the Centers for Disease Control and Prevention, the proportion of California residents who reported insufficient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents. These data are based on simple random samples of 11,545 California and 4,691 Oregon residents. Calculate a 95% confidence interval for the difference between the proportions of Californians and Oregonians who are sleep deprived and interpret it in context of the data.⁸

CDC, Perceived Insuficient Rest or Sleep Among Adults | United States, 2008.

7. Offshore drilling, Part I.

A survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Below is the distribution of responses, separated based on whether or not the respondent graduated from college.⁹

Survey USA, Election Poll #16804, data collected July 8-11, 2010.

	College Grad
	Yes	No
Support	154	132
Oppose	180	126
Do not know	104	131
Total	438	389

What percent of college graduates and what percent of the non-college graduates in this sample do not know enough to have an opinion on drilling for oil and natural gas off the Coast of California?
Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates.

Solution

(a) College grads: 23.7%. Non-college grads: 33.7%.

(b) Let \(p_{CG}\) and \(p_{NCG}\) represent the proportion of college graduates and non-college graduates who responded “do not know”. \(H_{0} : p_{CG} = p_{NCG}\text{.}\) \(H_{A} : p_{CG} \ne p_{NCG}\text{.}\) Independence is satisfied (random sample), and the success-failure condition, which we would check using the pooled proportion \((\hat{p}_{pool} = 235/827 = 0.284)\text{,}\) is also satisfied. \(Z = -3.18 \rightarrow \text{p-value } = 0.0014\text{.}\) Since the p-value is very small, we reject \(H_{0}\text{.}\) The data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. The data also indicate that fewer college grads say they “do not know” than non-college grads (i.e. the data indicate the direction after we reject \(H_{0}\)).

8. Sleep deprivation, CA vs. OR, Part II.

Exercise 6.2.9.6 provides data on sleep deprivation rates of Californians and Oregonians. The proportion of California residents who reported insufficient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents. These data are based on simple random samples of 11,545 California and 4,691 Oregon residents.

Conduct a hypothesis test to determine if these data provide strong evidence the rate of sleep deprivation is different for the two states. (Reminder: Check conditions)
It is possible the conclusion of the test in part (a) is incorrect. If this is the case, what type of error was made?

9. Offshore drilling, Part II.

Results of a poll evaluating support for drilling for oil and natural gas off the coast of California were introduced in Exercise 6.2.9.7.

	College Grad
	Yes	No
Support	154	132
Oppose	180	126
Do not know	104	131
Total	438	389

What percent of college graduates and what percent of the non-college graduates in this sample support drilling for oil and natural gas off the Coast of California?
Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who support off-shore drilling in California is different than that of non-college graduates.

Solution

(a) College grads: 35.2%. Non-college grads: 33.9%.

(b) Let \(p_{CG}\) and \(p_{NCG}\) represent the proportion of college graduates and non-college grads who support offshore drilling. \(H_{0} : p_{CG} = p_{NCG}\text{.}\) \(H_{A} : p_{CG} \ne p_{NCG}\text{.}\) Independence is satisfied (random sample), and the success-failure condition, which we would check using the pooled proportion \((\hat{p}_{pool} = 286/827 = 0.346)\text{,}\) is also satisfied. \(Z = 0.39 \rightarrow \text{p-value } = 0.6966\text{.}\) Since the \(\text{p-value } > (0.05)\text{,}\) we fail to reject \(H_{0}\text{.}\) The data do not provide strong evidence of a difference between the proportions of college graduates and non-college graduates who support off-shore drilling in California.

10. Full body scan, Part I.

A news article reports that “Americans have differing views on two potentially inconvenient and invasive practices that airports could implement to uncover potential terrorist attacks.” This news piece was based on a survey conducted among a random sample of 1,137 adults nationwide, where one of the questions on the survey was “Some airports are now using ‘full-body’ digital x-ray machines to electronically screen passengers in airport security lines. Do you think these new x-ray machines should or should not be used at airports?” Below is a summary of responses based on party affiliation. ¹⁰

S. Condon. “Poll: 4 in 5 Support Full-Body Airport Scanners”. In: CBS News (2010).

		Party Affiliation
		Republican	Democrat	Independent
Answer	Should	264	299	351
	Should not	38	55	77
	Don't know/No answer	16	15	22
	Total	318	369	450

Conduct an appropriate hypothesis test evaluating whether there is a difference in the proportion of Republicans and Democrats who think the full- body scans should be applied in airports. Assume that all relevant conditions are met.
The conclusion of the test in part (a) may be incorrect, meaning a testing error was made. If an error was made, was it a Type 1 or a Type 2 Error? Explain.

11. Sleep deprived transportation workers.

The National Sleep Foundation conducted a survey on the sleep habits of randomly sampled transportation workers and a control sample of non-transportation workers. The results of the survey are shown below. ¹¹

National Sleep Foundation, 2012 Sleep in America Poll: Transportation Workers' Sleep, 2012.

		Transportation Professionals
			Truck	Train	Bux/Taxi/Limo
	Control	Pilots	Drivers	Operators	Drivers
Less than 6 hours of sleep	35	19	35	29	21
6 to 8 hours of sleep	193	132	117	119	131
More than 8 hours	64	51	51	32	58
Total	292	202	203	180	210

Conduct a hypothesis test to evaluate if these data provide evidence of a difference between the proportions of truck drivers and non-transportation workers (the control group) who get less than 6 hours of sleep per day, i.e. are considered sleep deprived.

Solution

Subscript \(_C\) means control group. Subscript \(_T\) means truck drivers. \(H_{0} : p_{C} = p_{T}\text{.}\) \(H_{A} : p_{C} \ne p_{T}\text{.}\) Independence is satisfied (random samples), as is the success-failure condition, which we would check using the pooled proportion \((\hat{p}_{pool} = 70/495 = 0.141)\text{.}\) \(Z = -1.65 \rightarrow \text{p-value } = 0.0989\text{.}\) Since the p-value is high (default to alpha = 0.05), we fail to reject \(H_{0}\text{.}\) The data do not provide strong evidence that the rates of sleep deprivation are different for non-transportation workers and truck drivers.

12. Prenatal vitamins and Autism.

Researchers studying the link between prenatal vitamin use and autism surveyed the mothers of a random sample of children aged 24 - 60 months with autism and conducted another separate random sample for children with typical development. The table below shows the number of mothers in each group who did and did not use prenatal vitamins during the three months before pregnancy (periconceptional period). ¹²

R.J. Schmidt et al. “Prenatal vitamins, one-carbon metabolism gene variants, and risk for autism”. In: Epidemiology 22.4 (2011), p. 476.

		Autism
		Autism	Typical development	Total
Periconceptional prenatal vitamin	No vitamin	111	70	181
	Vitamin	143	159	302
	Total	254	229	483

State appropriate hypotheses to test for independence of use of prenatal vitamins during the three months before pregnancy and autism.
Complete the hypothesis test and state an appropriate conclusion. (Reminder: Verify any necessary conditions for the test.)
A New York Times article reporting on this study was titled “Prenatal Vitamins May Ward Off Autism”. Do you find the title of this article to be appropriate? Explain your answer. Additionally, propose an alternative title. ¹³

R.C. Rabin. “Patterns: Prenatal Vitamins May Ward Off Autism”. In: New York Times (2011).

13. HIV in sub-Saharan Africa.

In July 2008 the US National Institutes of Health announced that it was stopping a clinical study early because of unexpected results. The study population consisted of HIV-infected women in sub-Saharan Africa who had been given single dose Nevaripine (a treatment for HIV) while giving birth, to prevent transmission of HIV to the infant. The study was a randomized comparison of continued treatment of a woman (after successful childbirth) with Nevaripine vs Lopinavir, a second drug used to treat HIV. 240 women participated in the study; 120 were randomized to each of the two treatments. Twenty-four weeks after starting the study treatment, each woman was tested to determine if the HIV infection was becoming worse (an outcome called virologic failure). Twenty-six of the 120 women treated with Nevaripine experienced virologic failure, while 10 of the 120 women treated with the other drug experienced virologic failure. ¹⁴

Create a two-way table presenting the results of this study.
State appropriate hypotheses to test for difference in virologic failure rates between treatment groups.
Complete the hypothesis test and state an appropriate conclusion. (Reminder: Verify any necessary conditions for the test.)

S. Lockman et al. “Response to antiretroviral therapy after a single, peripartum dose of nevirapine”. In: Obstetrical & gynecological survey 62.6 (2007), p. 361.

Solution

(a) Summary of the study:

		Virol. failure
		Yes	No	Total
Treatment	Nevaripine	26	94	120
	Lopinavir	10	110	120
	Total	36	204	240

(b) \(H_{0} : p_{N} = p_{L}\text{.}\) There is no difference in virologic failure rates between the Nevaripine and Lopinavir groups. \(H_{A} : p_{N} \ne p_{L}\text{.}\) There is some differencein virologic failure rates between the Nevaripine and Lopinavir groups.

(c) Random assignment was used, so the observations in each group are independent. If the patients in the study are representative of those in the general population (something impossible to check with the given information), then we can also confidently generalize the findings to the population. The success-failure condition, which we would check using the pooled proportion \((\hat{p}_{pool} = 36/240 = 0.15)\text{,}\) is satisfied. \(Z = 2.89 \rightarrow \text{p-value } = 0.0039\text{.}\) Since the p-value is low, we reject \(H_{0}\text{.}\) There is strong evidence of a difference in virologic failure rates between the Nevaripine and Lopinavir groups. Treatment and virologic failure do not appear to be independent.

14. An apple a day keeps the doctor away.

A physical education teacher at a high school wanting to increase awareness on issues of nutrition and health asked her students at the beginning of the semester whether they believed the expression “an apple a day keeps the doctor away”, and 40% of the students responded yes. Throughout the semester she started each class with a brief discussion of a study highlighting positive effects of eating more fruits and vegetables. She conducted the same apple-a-day survey at the end of the semester, and this time 60% of the students responded yes. Can she used a two-proportion method from this section for this analysis? Explain your reasoning.