Skip to main content

Section 6.1 Inference for a single proportion

OpenIntro: Sampling Distribution of Proportions
Figure 6.1.1 Sampling Distribution of Proportions

The distribution of a sample proportion, such as an estimate of the fraction of people who share a particular opinion in a poll, was introduced in Section 4.5.

Subsection 6.1.1 Confidence intervals for a proportion

Suppose we want to construct a confidence interval for the proportion of Americans who approve of the job the Supreme Court is doing. In a simple random sample of \(n = 976\text{,}\) 44% of respondents approved. 1 https://www.nytimes.com/2012/06/08/us/politics/44-percent-of-americans-approve-of-supreme-court-in-new-poll.html In the examples below, we will construct a 1-proportion z-interval.

Before constructing the confidence interval, we should determine whether we can model the sample proportion, \(\hat{p} = 0.44\text{,}\) using a normal model, which requires two conditions to be satisfied.

Conditions for the sampling distribution of \(\hat{p}\) being nearly normal

The sampling distribution for \(\hat{p}\text{,}\) taken from a sample of size \(n\) from a population with a true proportion \(p\text{,}\) is nearly normal when

  1. the sample observations are independent and

  2. we expected to see at least 10 successes and 10 failures in our sample, i.e. \(np\geq10\) and \(n(1-p)\geq10\text{.}\) This is called the success-failure condition.

If these conditions are met, then the sampling distribution of \(\hat{p}\) is nearly normal with mean \(\mu_{\hat{p}}=p\) and standard deviation \(\sigma_{\hat{p}} = \sqrt{\frac{\ p(1-p)\ }{n}}\text{.}\)

Verify that we can use a normal distribution to model \(\hat{p}=0.44\) for the Supreme Court poll of \(n = 976\) US adults.

Solution

The data are from a simple random sample, so the independence condition is satisfied. To check the success-failure condition we want to check that \(np\) and \(n(1-p)\) are at least 10. However, p is unknown. Therefore, we will use the sample proportion \(\hat{p}\) to check this condition.

\begin{gather*} n\hat{p} = 976 \times 0.44 = 429\text{ () }\\ n(1-\hat{p}) = 976 \times (1 - 0.44) = 547\text{ () } \end{gather*}

The second condition is satisfied since 429 and 547 are both larger than 10. With the two conditions satisfied, we can model the sample proportion \(\hat{p} = 0.44\) using a normal model.

TIP: Reminder on checking independence of observations

If data come from a simple random sample, then the independence assumption is generally reasonable. Alternatively, if the data come from a random process, we must evaluate the independence condition more carefully.

The general form of a confidence interval is:

\begin{gather*} \text{ point estimate } \ \pm\ \text{ critical value } \times SE \end{gather*}

What should we use as the point estimate for the confidence interval?

Solution

The best estimate for the unknown parameter \(p\) (the proportion of Americans who approve of the job the Supreme Court is doing) is the sample proportion. When constructing a confidence interval for a single proportion, we use \(\hat{p} = 0.44\) as the point estimate for \(p\text{.}\)

Calculate the standard error for the confidence interval.

Solution

In Section 4.5, we learned that the formula for the standard deviation of \(\hat{p}\) is

\begin{gather*} \sigma_{\hat{p}} = \sqrt{\frac{\ p(1-p)\ }{n}} \end{gather*}

The proportion \(p\) is unknown, but we can use sample proportion \(\hat{p}\) instead when finding the SE in a confidence interval:

\begin{gather*} SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}= \sqrt{\frac{0.44(1-0.44)}{976}}= 0.016 \end{gather*}

When the conditions for a normal model are met, we use \(z^\star\) for the critical value. An appropriate value for \(z^\star\) can most easily be found in the \(t\)-table in Appendix C in the last row (\(\infty\)), where the column corresponds to the desired confidence level.

Construct a 90% confidence interval for \(p\text{,}\) the proportion of Americans who approve of the job the Supreme Court is doing.

Solution

Using the point estimate \(\hat{p} = 0.44\) and standard error \(SE = 0.16\) computed earlier, we can construct the confidence interval:

\begin{align*} \text{ point estimate } \ \pm \amp \ \text{ critical value } \times SE\\ 0.44\ \pm \amp \ 1.65 \times 0.016\\ (0.414 \amp ,\ 0.466) \end{align*}

The critical value \(z^{\star}\) was found by looking in the 90% column in the \(t\)-table in Appendix C.

We are 90% confident that the true proportion of Americans who approve of the job the Supreme Court is doing is between 41.4% and 46.6%. Because the entire interval is below 0.5, we have evidence that the true percent that approve is less than 50%.

Constructing a confidence interval for a proportion

A complete solution to a confidence interval question for a single proportion includes the following steps:

  1. State the name of the confidence interval being used.

    • 1-proportion z-interval

  2. Verify conditions.

    • A simple random sample.

    • \(n\hat{p} \geq10\) and \(n(1-\hat{p})\geq10\text{.}\)

  3. Plug in the numbers and write the interval in the form

    \begin{gather*} \text{ point estimate } \pm z^\star \times \text{ SE of estimate } \end{gather*}
    • The point estimate is \(\hat{p}\text{.}\)

    • Critical value \(z^\star=1.96\) a 95% CI, otherwise find \(z^\star\) using the \(t\)-table at row \(\infty\text{.}\)

    • Use \(SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\text{.}\)

  4. Evaluate the CI and write in the form (     ,      ).

  5. Interpret the interval: “We are [XX]% confident that the true proportion of [...] is between [...] and [...].”

  6. State the conclusion to the original question.

Identify each of the six steps for constructing a confidence interval in the Supreme Court description and examples. 2 The following are the required components for constructing a confidence interval for a single proportion. 1. The last sentence in the first paragraph of Subsection 6.1.1. 2. in the solution to Example 6.1.2, the first sentence to cover independence and the two calculations verifying \(n\hat{p}\) and \(n(1-\hat{p})\) were at least 10. 3. The last formula in the solution to Example 6.1.4 and the calculations and identification of \(z^{\star}\) in Example 6.1.4. 4. At the end of the calculations in the solution to Example 6.1.4. Items 5 and 6 were contained in the last paragraph of Example 6.1.3's solution.

A poll randomly selected 1,042 adults residing in the state of New York and asked the question, “Regardless of whether the person shows symptoms or not, do you support or oppose a 21 day quarantine for anyone who has come in contact with someone with the Ebola virus?” 3 NBC 4 New York / The Wall Street Journal / Marist Poll. October 31, 2014. Among the sample, 82% said “support”, 15% said “oppose”, and 3% said “unsure”. Carry out the appropriate 95% confidence interval procedure to estimate the true proportion of adults in New York who supported a quarantine. Is there evidence that the true percent is greater than 75%?

Solution

We will construct a 1-proportion z-interval. The poll can be considered a simple random sample of adults form New York. Also, \(1042\times 0.82\ge10\) and \(1042\times (1-0.82)\ge10\text{,}\) so the conditions for the confidence interval are satisfied. The standard error and confidence interval can be calculated as

\begin{align*} SE=\amp \sqrt{\frac{0.82(1-0.82)}{1042}}=0.0119\\ 0.82\ \pm\amp \ 1.96 \times 0.0119\\ (0.796\amp ,\ 0.843) \end{align*}

We are 95% confident that the true proportion of adults in New York who supported a 21 day quarantine for anyone who has come in contact with someone with the Ebola virus lies between 0.796 and 0.843. Because the entire interval is above 0.75, the interval provides evidence that the true percent is greater than 75%.

Subsection 6.1.2 Hypothesis testing for a proportion

While a confidence interval provides a reasonable range of values for an unknown parameter, a hypothesis test evaluates a specific claim. In a hypothesis test, we declare what test we will use, check that the test is reasonable for the context, and construct appropriate null and alternative hypotheses. We then construct a p-value for the test and use it to assess the hypotheses, which allows us to form a conclusion based on the data.

Deborah Toohey is running for Congress, and her campaign manager claims she has more than 50% from the district's electorate. A newspaper collects a simple random sample of 500 likely voters in the district and estimates Toohey's support to be 52%.

  1. What is the name of the test that is appropriate for this context?

  2. State the alternate hypothesis. What value we should use as the null value, \(p_{0}\text{?}\)

  3. Can we model \(\hat{p} = 0.52\) using a normal model? Check the conditions.

Solution

(a) The name of the test we will use is the 1-proportion z-test.

(b) The alternate hypothesis, the one that bears the burden of proof, argues that Toohey has more than 50% support. Therefore, \(H_A\) will be one-sided and the null value will be \(p_0 = 50\% = 0.5\text{.}\) \(H_A\text{:}\) \(p \gt 0.5\text{.}\)

(c) The calculations in a hypothesis test for a proportion assume the value \(p_0\) for the unknown \(p\text{,}\) so we use \(p_{0}\) rather than \(\hat{p}\) when verifying the hypothesis test conditions:

\begin{align*} np_0 \amp \geq 10 \rightarrow 500\times 0.5 \geq 10\\ n(1-p_0) \amp \geq 10 \rightarrow 500 \times (1-0.5) \geq 10 \end{align*}

The conditions for a normal model are met.

In Chapter 5, we saw that the general form of the test statistic for a hypothesis test took the following form:

\begin{gather*} \text{ test statistic } = \frac{\text{ point estimate } - \text{ null value } }{\text{ SE of estimate } } \end{gather*}

When the conditions for a normal model are met,

  • we use Z as the test statistic,

  • the point estimate is \(\hat{p}\) (just like for a confidence interval), and

  • since we compute the test statistic under the null hypothesized value of \(p = p_0\text{,}\) we compute the standard error as

    \begin{gather*} SE = \sqrt{\frac{p_0(1-p_0)}{n}} \end{gather*}

Deborah Toohey is running for Congress, and her campaign manager claimed she has more than 50% support from the district's electorate. A newspaper poll finds that 52% of 500 likely voters who were sampled support Toohey. Does this provide convincing evidence for the claim by Toohey's manager at the 5% significance level?

Solution

We will use a one-sided test with the following hypotheses:

\(p = 0.5\text{.}\) Toohey's support is 50%.

\(p \gt 0.5\text{.}\) Toohey's manager is correct, and her support is higher than 50%.

We will use a significance level of \(\alpha = 0.05\) for the test. We can compute the standard error as

\begin{gather*} SE = \sqrt{\frac{p_0 (1 - p_0)}{n}} = \sqrt{\frac{0.5 (1 - 0.5)}{500}} = 0.022 \end{gather*}

The test statistic can be computed as:

\begin{gather*} Z = \frac{\hat{p} - p_0}{SE} = \frac{0.52 - 0.50}{0.022} = 0.89 \end{gather*}

A picture featuring the p-value is shown in Figure 6.1.10 as the shaded region. Using a table or a calculator, we can get the p-value as about 0.19, which is larger than \(\alpha = 0.05\text{,}\) so we do not reject \(H_0\text{.}\) That is, we do not have strong evidence to support Toohey's campaign manager's claims that she has more than 50% support within the district.

Figure 6.1.10 Sampling distribution of the sample proportion if the null hypothesis is true for Example 6.1.9. The p-value for the test is shaded.
Hypothesis test for a proportion

A complete solution to a test of hypothesis problem for a single proportion should include the following steps:

  1. State the name of the test being used.

    • 1-proportion z-test

  2. Verify conditions to ensure the standard error estimate is reasonable and the point estimate is nearly normal and unbiased.

    • A simple random sample.

    • \(np_0\geq10\) and \(n(1-p_0)\geq10\) (use hypothesized \(p\text{,}\) not sample \(\hat{p}\)).

  3. Write the hypotheses in plain language and mathematical notation.

    • H\(_0: p = p_0\text{,}\) where \(p_0\) is the hypothesized value of \(p\)

    • H\(_A: p \ne \text{ or } \lt \text{ or } > p_0\)

  4. Identify the significance level \(\alpha\text{.}\)

  5. Calculate the test statistic: \(\text{Z} = \frac{\text{ point estimate } - \text{ null value } }{\text{ SE of estimate } }\)

    • The point estimate is \(\hat{p}\text{.}\)

    • Use \(SE = \sqrt{\frac{p_0(1-p_0)}{n}}\) (plug in hypothesized \(p\text{,}\) not sample \(\hat{p}\)).

  6. Find the p-value and compare it to \(\alpha\) to determine whether to reject or not reject \(H_0\text{.}\)

  7. Write the conclusion in the context of the question.

Identify each of the seven steps for conducting a hypothesis test in the example for Toohey's support. 4 The following are the required components for running a hypothesis test for a single proportion. Items 1 and 2 are contained in Example 6.1.8. Items 3-7 are covered in Example 6.1.9.

In Example 6.1.9, the data did not show strong evidence that Toohey's campaign manager was correct. Does this mean the manager was wrong? 5 Not necessarily. While we did not reject the null hypothesis, that does not mean it is true. It is possible that Toohey does have support above 50%, but that the sample did not provide enough evidence to convincingly show this.

A Gallup poll conducted in March of 2015 found that 51% of respondents support nuclear energy. 6 www.gallup.com/poll/182180/support-nuclear-energy.aspx The survey was based on telephone interviews from a random sample of 1,025 adults in the United States. Before the poll was conducted, a nuclear energy advocacy group claimed a majority of US adults support nuclear energy. Does the poll provide strong evidence that supports their claim? Carry out an appropriate test at the 0.10 significance level. 7 We will perform a 1-proportion z-test. We will assume that the sample can be treated as a simple random sample of adults from the United States. Our null value will be \(p_0 = 0.5\text{,}\) and \(1025\times 0.5=1025\times(1-0.5)\ge10\) so the conditions for the test are satisfied. We will use a one-sided test with the following hypotheses:

\begin{gather*} H_{0}: p = 0.5. \text {Support for nuclear energy is 50.}\\ H_{A}: p \gt 0.5. \text {Support for nuclear energy is higher than 50.} \end{gather*}
We will use a significance level of \(\alpha = 0.10\) for the test. We can compute the standard error as \(SE = \sqrt{\frac{p_0 (1 - p_0)}{n}} = \sqrt{\frac{0.5 (1 - 0.5)}{1025}} = 0.0156\text{.}\) The test statistic can be computed as: \(Z = \frac{\hat{p} - p_0}{SE} = \frac{0.51 - 0.50}{0.0156} = 0.656\) and the p-value for this one-sided test is 0.256. \(0.256 \gt 0.10\text{,}\) so we do not reject \(H_0\text{.}\) We do not have strong evidence that the true percent of adults in the United States that support nuclear energy is greater than 50%. That is, the poll does not provide evidence supporting the nuclear energy advocacy group's claim.

Subsection 6.1.3 Calculator: the 1-proportion z-test and z-interval

We can use a calculator to compute a confidence interval or to evaluate the test statistic and the p-value. Remember to show work and first substitute in all numbers before using the calculator.

TI-83/84: 1-proportion z-interval

MISSINGVIDEOLINK Use STAT, TESTS, 1-PropZInt.

  1. Choose STAT.

  2. Right arrow to TESTS.

  3. Down arrow and choose A:1-PropZInt.

  4. Let x be the number of yes's (must be an integer).

  5. Let n be the sample size.

  6. Let C-Level be the desired confidence level.

  7. Choose Calculate and hit ENTER, which returns

    (,) the confidence interval
    \(\hat{p}\) the sample proportion
    n the sample size

Casio fx-9750GII: 1-proportion z-interval

MISSINGVIDEOLINK

  1. Navigate to STAT (MENU button, then hit the 2 button or select STAT).

  2. Choose the INTR option (F4 button).

  3. Choose the Z option (F1 button).

  4. Choose the 1-P option (F3 button).

  5. Specify the interval details:

    • Confidence level of interest for C-Level.

    • Enter the number of successes, x.

    • Enter the sample size, n.

  6. Hit the EXE button, which returns

    Left, Right ends of the confidence interval
    \(\hat{p}\) sample proportion
    n sample size

Using a calculator, confirm the earlier result from Example 6.1.5: a 90% confidence interval for the percent of Americans who approve of the job the Supreme Court is doing is between 41.4% and 47.1%. The sample percent was 44% and \(n = 976\text{.}\)

TI-83/84: 1-proportion z-test

MISSINGVIDEOLINK Use STAT, TESTS, 1-PropZTest.

  1. Choose STAT.

  2. Right arrow to TESTS.

  3. Down arrow and choose 5:1-PropZTest.

  4. Let \(p_0\) be the null or hypothesized value of p.

  5. Let x be the number of yes's (must be an integer).

  6. Let n be the sample size.

  7. Choose \(\ne\text{,}\) \(\lt\text{,}\) or \(\gt\) to correspond to H\(_A\text{.}\)

  8. Choose Calculate and hit ENTER, which returns

    z Z-statistic
    p p-value
    \(\hat{p}\) the sample proportion
    n the sample size

Casio fx-9750GII: 1-proportion z-test

MISSINGVIDEOLINK The steps closely match those of the 1-proportion confidence interval.

  1. Navigate to STAT (MENU button, then hit the 2 button or select STAT).

  2. Choose the TEST option (F3 button).

  3. Choose the Z option (F1 button).

  4. Choose the 1-P option (F3 button).

  5. Specify the test details:

    • Specify the sidedness of the test using the F1, F2, and F3 keys.

    • Enter the null value, p0.

    • Enter the number of successes, x.

    • Enter the sample size, n.

  6. Hit the EXE button, which returns

    z Z-statistic
    p p-value
    \(\hat{p}\) the sample proportion
    n the sample size

Using a calculator, confirm the earlier result from Example 6.1.9, that we do not have strong evidence that Toohey's voter support is above 50% because the p-value is 0.19. The sample percent was 52% and \(n=500\text{.}\)

Subsection 6.1.4 Choosing a sample size when estimating a proportion

Planning a sample size before collecting data is important. If we collect too little data, the standard error of our point estimate may be so large that the data are not very useful. On the other hand, collecting data in some contexts is time-consuming and expensive, so we don't want to waste resources on collecting more data than we need.

When considering the sample size, we want to put an upper bound on the margin of error. The margin of error is defined as quantity that follows the +/- in the confidence interval. It is half the total width of the confidence interval.

Margin of error

The margin of error of a confidence interval is given by:

\begin{gather*} ME = \text{ critical value } \times SE \end{gather*}

The margin of error is affected by both the sample size and the confidence level.

All other things being equal, will the margin of error be bigger for a 90% confidence interval or a 95% confidence interval?

Solution

A 95% confidence interval is wider than a 90% confidence interval, so the 95% confidence interval will have a larger margin of error.

All other things being equal, what happens to the margin of error as the sample size increases?

Solution

As the sample size \(n\) increases, the \(SE\) will decrease, so the margin of error will decrease as \(n\) increases. This makes sense as we expect less error with a larger sample.

Suppose we are conducting a university survey to determine whether students support a $200 per year increase in fees to pay for a new football stadium, how big of a sample is needed to be sure the margin of error is less than 0.04 using a 95% confidence level? Find the smallest sample size \(n\) so that the margin of error of the point estimate \(\hat{p}\) will be no larger than \(m=0.04\) when using a 95% confidence interval.

Solution

For a 95% confidence level, the value \(z^{\star}\) corresponds to 1.96. We want:

\begin{gather*} ME \leq 0.04\\ 1.96 \times SE \leq 0.04\\ 1.96\times \sqrt{\frac{p(1-p)}{n}} \leq 0.04 \end{gather*}

There are two unknowns in the equation: \(p\) and \(n\text{.}\) If we have an estimate of \(p\text{,}\) perhaps from a similar survey, we could use that value. If we have no such estimate, we must use some other value for \(p\text{.}\) It turns out that the margin of error is largest when \(p\) is 0.5, so we typically use this worst case estimate if no other estimate is available:

\begin{align*} 1.96\times \sqrt{\frac{0.5(1-0.5)}{n}} \amp \leq 0.04\\ 1.96^2\times \frac{0.5(1-0.5)}{n} \amp \leq 0.04^2\\ 1.96^2\times \frac{0.5(1-0.5)}{0.04^2} \amp \leq n\\ 600.25 \amp \leq n\\ n=601 \end{align*}

The sample size must be an integer and we round up because \(n\) must be greater than or equal to 600.25. We need at least 601 participants to ensure the sample proportion is within 0.04 of the true proportion with 95% confidence.

No estimate of the true proportion is required in sample size computations for a proportion. However, if we have an estimate of the proportion, we should use it in place of the worst case estimate of the proportion, 0.5.

A manager is about to oversee the mass production of a new tire model in her factory, and she would like to estimate what proportion of these tires will be rejected through quality control. The quality control team has monitored the last three tire models produced by the factory, failing 1.7% of tires in the first model, 6.2% of the second model, and 1.3% of the third model. The manager would like to examine enough tires to estimate the failure rate of the new tire model to within about 2% with a 90% confidence level.  8 (a) For the 1.7% estimate of \(p\text{,}\) we estimate the appropriate sample size as follows:

\begin{align*} 1.645\times \sqrt{\frac{0.017(1-0.017)}{n}} \amp \leq 0.02\\ n \amp \geq 113.7\\ n\amp =114 \end{align*}
Using the estimate from the first model, we would suggest examining 114 tires (round up!). A similar computation can be accomplished using 0.062 and 0.013 for \(p\text{:}\) 396 and 88. (b) We could examine which of the old models is most like the new model, then choose the corresponding sample size. Or if two of the previous estimates are based on small samples while the other is based on a larger sample, we should consider the value corresponding to the larger sample. (Answers will vary.) It should also be noted that the success-failure condition is not met with \(n = 114\) or \(n = 88\text{.}\) That is, we would need additional methods than what we've covered so far to analyze results based on those sample sizes.

  1. There are three different failure rates to choose from. Perform the sample size computation for each separately, and identify three sample sizes to consider.

  2. The sample sizes vary widely. Which of the three would you suggest using? What would influence your choice?

A recent estimate of Congress' approval rating was 17%. 9 www.gallup.com/poll/155144/Congress-Approval-June.aspx What sample size does this estimate suggest we should use for a margin of error of 0.04 with 95% confidence? 10 We complete the same computations as before, except now we use \(0.17\) instead of \(0.5\) for \(p\text{:}\)

\begin{gather*} 1.96\times \sqrt{\frac{0.17(1-0.17)}{n}} \leq 0.04 \rightarrow n \geq 338.8 \rightarrow n = 339 \end{gather*}
A sample size of 339 or more would be reasonable.