Skip to main content

Section 5.1 Estimating unknown parameters

OpenIntro: Estimating Unknown Parameters
Figure 5.1.1 Estimating Unknown Parameters

Subsection 5.1.1 Point estimates

We take a sample of size \(n=80\) from a particular county and find that 12 of the 80 people smoke. Estimate the population proportion based on the sample. Note that this example differs from Example 4.5.2 of the previous chapter in that we are not trying to predict what will happen in a sample. Instead, we have a sample, and we are trying to infer something about the true proportion.

Solution

The most intuitive way to go about doing this is to simply take the sample proportion. That is, \(\hat{p} = \frac{12}{80} = 0.15\) is our best estimate for \(p\text{,}\) the population proportion.

The sample proportion \(\hat{p} = 0.15\) is called a point estimate of the population proportion: if we can only choose one value to estimate the population proportion, this is our best guess. Suppose we take a new sample of 80 people and recompute the proportion of smokers in the sample; we will probably not get the exact same answer that we got the first time. Estimates generally vary from one sample to another, and this sampling variation tells us how close we expect our estimate to be to the true parameter.

In Chapter 2, we found the summary statistics for the number of characters in a set of 50 email data. These values are summarized below.

\(\bar{x}\) 11,160
median 6,890
\(s_x\) 13,130

Estimate the population mean based on the sample.

Solution

The best estimate for the population mean is the sample mean. That is, \(\bar{x} = 11,160\) is our best estimate for \(\mu\text{.}\)

Using the email data, what quantity should we use as a point estimate for the population standard deviation \(\sigma\text{?}\) 1 Again, intuitively we would use the sample standard deviation \(s = 13,130\) as our best estimate for \(\sigma\text{.}\)

Subsection 5.1.2 Introducing the standard error

Point estimates only approximate the population parameter, and they vary from one sample to another. It will be useful to quantify how variable an estimate is from one sample to another. For a random sample, when this variability is small we can have greater confidence that our estimate is close to the true value.

How can we quantify the expected variability in a point estimate \(\hat{p}\text{?}\) The discussion in Section 4.5 tells us how. The variability in the distribution of \(\hat{p}\) is given by its standard deviation.

\begin{align*} SD_{\hat{p}}\amp =\sqrt{\frac{p(1-p)}{n}} \end{align*}

Calculate the standard deviation of \(\hat{p}\) for smoking example, where \(\hat{p}\) = 0.15 is the proportion in a sample of size 80 that smoke.

Solution

It may seem easy to calculate the SD at first glance, but there is a serious problem: \(p\) is unknown. In fact, when doing inference, \(p\) must be unknown, otherwise it is illogical to try to estimate it. We cannot calculate the SD, but we can estimate it using, you might have guessed, the sample proportion \(\hat{p}\text{.}\)

This estimate of the standard deviation is known as the standard error, or SE for short.

\begin{align*} SE_{\hat{p}}\amp =\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \end{align*}

Calculate and interpret the SE of \(\hat{p}\) for the previous example.

Solution
\begin{gather*} SE_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} =\sqrt{\frac{0.15(1-0.15)}{80}}=0.04 \end{gather*}

The average or expected error in our estimate is 4%.

If we quadruple the sample size from 80 to 320, what will happen to the SE?

Solution
\begin{gather*} SE_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} =\sqrt{\frac{0.15(1-0.15)}{320}}=0.02 \end{gather*}

The larger the sample size, the smaller our standard error. This is consistent with intuition: the more data we have, the more reliable an estimate will tend to be. However, quadrupling the sample size does not reduce the error by a factor of 4. Because of the square root, the effect is to reduce the error by a factor \(\sqrt{4}\text{,}\) or 2.

Subsection 5.1.3 Basic properties of point estimates

We achieved three goals in this section. First, we determined that point estimates from a sample may be used to estimate population parameters. We also determined that these point estimates are not exact: they vary from one sample to another. Lastly, we quantified the uncertainty of the sample proportion using what we call the standard error. We will learn how to calculate the standard error for other point estimates such as a mean, a difference in means, or a difference in proportions in the chapters that follow.