Skip to main content

Section 4.1 Contingency Tables

Figure 4.1.1. Alternative Video Lesson

When we looked at categorical data in the previous chapter, it was related to a single variable, or characteristic of interest, such as favorite movie or car color. To illustrate the data, we made a frequency table and used it to create a pie chart or bar chart. But what if we want to illustrate the relationship between two categorical variables? To do this, we can use a contingency table.

Subsection 4.1.1 Contingency Tables

A contingency table summarizes all the possible combinations for two categorical variables. Each value in the table represents the number of times a particular combination of outcomes occurs. For example, suppose we randomly select 250 households from the greater Portland area and ask whether they have a cat and whether they have a dog. In this case, “have a cat” and “have a dog” are the two variables, and each variable has two categories: Yes and No. To create the contingency table, we make columns for the categories of one variable, and rows for the categories of the other variable. We also add a row and column for the subtotals of each category. Each cell of the resulting table contains the number of outcomes having the characteristics of the intersecting row and column categories. For our dog and cat example, the table would look like this:

Dog No Dog Total
Cat Yes Cat and Yes Dog Yes Cat and No Dog Yes Cat Total
No Cat No Cat and Yes Dog No Cat and No Dog No Cat Total
Total Yes Dog Total No Dog Total Grand total

Suppose that of the 250 households surveyed, 180 said they have a cat, 95 said they have a dog, and 52 said they have both a cat and a dog. We can use this information to fill in the cells of the table.

Dog No Dog Total
Cat 52 180
No Cat
Total 95 250

The first cell we can fill in is the grand total, which is the total number of subjects in the study. In this case, there are 250 households participating in the survey. The next two cells we can fill in are the total number of households that have a cat, 180, and the total number of households that have a dog, 95. The final cell we can fill in from the given information is the intersection of the having a dog column and a having a cat row, which is 52 households.

Since each row and column must sum to their totals, we can use subtraction to find the missing numbers as shown below.

Dog No Dog Total
Cat 52 \(180-52=128\) 180
No Cat \(95-52=43\) \(155-128=27\)
or
\(70-43=27\)
\(250-180=70\)
Total 95 \(250-95=155\) 250

Now that we have our contingency table completed, notice that the numbers in the central four cells add to the grand total as shown in the table on the left. The total row and the total column also add to the grand total as shown in the right table.

Dog No Dog Total
Cat 52 128 180
No Cat 43 27 70
Total 95 155 250
Dog No Dog Total
Cat 52 128 180
No Cat 43 27 70
Total 95 155 250

Subsection 4.1.2 Contingency Tables and Venn Diagrams

If the subtractions we just did seem familiar, they should! This is very similar to what we did for reporting data with a Venn diagram. The Venn diagram for this data is shown below. We also subtracted the intersection from the total of the cat and dog owners to find numbers in the crescent regions.

Notice that the numbers in the four regions of the Venn diagram are the same as the four cells in the center of the contingency table and add to the grand total.

A Venn diagram with two overlapping circles and a box around it. The Cat only region shows 180-52=128; The dog only region shows 95-52=43; the interesection is 52 and the number outside the circles in the universal set is 27.

Subsection 4.1.3 “And” Statements

Now we can use the contingency table or the Venn diagram to determine the percentage of households that meet certain conditions. For instance, what percent of those surveyed own a cat and do not own a dog? In the Venn diagram, this is 128 households in the cat only region.

In the contingency table we see the 128 households at the intersection of the row of households who own a cat and the column of households who do not own a dog. As a percentage, the total number of households surveyed, is \(\frac{128}{250}=0.512\) or 51.2% that have a cat and no dog.

Dog No Dog Total
Cat 52 128 180
No Cat 43 27 70
Total 95 155 250

Subsection 4.1.4 “Or” Statements

How about the percentage of households surveyed that have a cat or a dog? We know from Venn diagrams that the inclusive or includes the number of households who own a cat only, a dog only, and both a cat and a dog, or \(128+52+43=223\) households. As a percentage of the total surveyed, we get \(\frac{223}{250}=0.892\) or 89.2% of households in the sample have a dog or a cat (or both).

We can get the same answer from the contingency table. by adding the cells for households who have a cat and not a dog, a dog and not a cat, and the households that have both a cat and a dog. This also gives us 223 households.

There is another way to calculate an or statements from a contingency table. We could add the row and column totals for having a cat and having a dog, but then we have counted the 52 households in the intersection twice. We can subtract that number to get \(180+95-52=223\) households with a dog or a cat, which we know is 89.2% of those surveyed.

Dog No Dog Total
Cat 52 128 180
No Cat 43 27 70
Total 95 155 250

Subsection 4.1.5 Conditional Statements

Another question we can answer using a contingency table is what percentage of dog owning households also own a cat? In this case the group that we are interested in isn’t every household surveyed (the grand total), but just those households that own a dog.

Dog No Dog Total
Cat 52 128 180
No Cat 43 27 70
Total 95 155 250

We call this a conditional statement because we are only considering the households with a certain condition. If we focus on the column representing the households that own a dog, we see that there is a total of 95 households with a dog, and that 52 of those 95 households also have a cat. Therefore, \(\frac{52}{95} \approx 0.547\) or approximately 54.7% of the households with a dog also have a cat. Another way to phrase this conditional statement is, “What percent of households have a cat given they have a dog.” You will see the word given quite a bit in this chapter and that makes the denominator change. It is also possible to find this conditional percentage using the Venn diagram by taking the number in the intersection and dividing it by the total in the whole dog circle.

Subsection 4.1.6 Contingency Tables with More Than Two Categories

When there are only two categories for each variable, like yes/no questions, Venn diagrams and contingency tables provide basically the same information and can be used interchangeably. A Venn diagram works well for yes/no variables since a subject is either inside the circle (has the characteristic) or outside the circle (does not have the characteristic). If we have more than two possibilities for any of the variables, though, we cannot use a Venn diagram. We can use a contingency table, though. Here is an example where one variable has four categories and the other has three categories.

Example 4.1.2.

910 randomly sampled registered voters from Tampa, FL were asked if they thought workers who have illegally entered the US should (i) be allowed to keep their jobs and apply for US citizenship, (ii) be allowed to keep their jobs as temporary guest workers but not be allowed to apply for US citizenship, or (iii) lose their jobs and have to leave the country. Not sure was also an option (iv). The results of the survey by political ideology are shown below 1 . Use the contingency table to answer the questions.

SurveyUSA, News Poll #18927, data collected Jan 27-29, 2012. Example adapted from Open Intro: Advanced High School Statistics, by Diez et al, used under CC-BY-SA 3.0.
Conservative Moderate Liberal Total
(i) Apply for citizenship 57 120 101 278
(ii) Guest worker 121 113 28 262
(iii) Leave the country 179 126 45 350
(iv) Not sure 15 4 1 20
Total 372 363 175 910
  1. What percent of the sampled Tampa, Fl voters identified themselves as conservatives?

  2. What percent of the sampled voters are in favor of the citizenship option?

  3. What percent of the sampled voters identify themselves as conservatives and are in favor of the citizenship option?

  4. What percent of the sampled voters identify themselves as liberal or are in favor of the leaving the country option?

  5. What percent of the sampled voters who identify as conservatives are also in favor of the citizenship option? What percent of moderate and liberal voters share this view?

Solution.
  1. To answer this question, we find the conservative column and look to the bottom cell for the total number of conservative voters and divide that by the total number of voters surveyed. This gives us \(\frac{372}{910}\approx 0.409\) or approximately 41% of the Tampa, Fl voters who identify as conservative.

  2. For this question we find the apply for citizenship row, look across to find the total, and divide this by the total number of voters surveyed. We get \(\frac{278}{910} \approx 0.305\) or approximately 31% of these voters are in favor of the citizenship option.

  3. For this question we are looking for the cell that is the intersection of those who identify as conservative and those who are in favor of the citizen option. This cell has 57 voters, so we divide that by the total number of voters. This gives us or \(\frac{57}{910} \approx 0.063\) or approximately 6.3% of these voters identify as conservatives and are in favor of the citizenship option.

  4. The or in this question is inclusive, so we need to determine the number of voters who identify as liberal, who are in favor of the leaving the country option, or both.

    Conservative Moderate Liberal Total
    (i) Apply for citizenship 57 120 101 278
    (ii) Guest worker 121 113 28 262
    (iii) Leave the country 179 126 45 350
    (iv) Not sure 15 4 1 20
    Total 372 363 175 910

    In terms of the individual cells, the number of voters who have the specified characteristics is the sum \(179+126+101+28+45+1=480\text{,}\) which we can divide by the total number of voters surveyed to get the percent. So, we have \(\frac{480}{910} \approx 0.527\) or approximately 53% of the voters identify as liberal or are in favor of the leave the country option.

    Another way to calculate this is to add the total number who identify as liberal (175 voters) and the total number who are in favor of the leave the country option (350 voters), then subtract the double counted cell (45 voters) who are liberal and in favor of the leave the country option: \(175+350-45=480\)

  5. As we saw before, these are conditional statements. For the first part of this question, we want to focus just on those voters who identify as conservatives, and from among that group determine the percent in favor of the citizenship option. We calculate that \(\frac{57}{372} \approx 0.153\) or approximately 15% of conservative voters are in favor of the citizenship option.

    For the second part, we want to focus on just those voters who identify as moderate, and from among that group determine the percent in favor of the citizenship option. Then we have \(\frac{120}{363} \approx 0.33\) or approximately 33% of moderate voters are in favor of the citizen option.

    Finally, we want to focus on just those voters who identify as liberal, and from among that group determine the percent in favor of the citizenship option. We calculate \(\frac{101}{175} \approx 0.58\) or approximately 58% of liberal voters are in favor of the citizenship option. Looking at these three percentages, it is clear that support of the citizenship option depends on political ideology. If support of the citizenship option were the same across political ideologies, then we would say that favoring the citizenship option and political ideology were independent of each other.

Subsection 4.1.7 Empirical Probability

If our sample is representative of the population, then we can also interpret a percentage we calculate from a contingency table as a probability, or the likelihood that something will happen. Since a contingency table is constructed from data collected through sampling or an experiment, we call it an empirical or experimental probability. This is different from a theoretical probability which we will look at in the next section.

Subsection 4.1.8 Finding Empirical Probabilities with a Contingency Table

Suppose that 60% of students in our class have a summer birthday (June, July, or August). Now suppose everyone’s name and birth month are written on slips of paper and thrown into a bag. If we pull a slip of paper out of the bag at random, what is the probability that the selected student has a summer birthday? If you think there should be a 60% chance, you are right! The relative frequency of the characteristic of interest will be equal to its empirical probability. To write this as a probability statement, it would look like

\begin{gather*} \text{P(summer birthday)}=60\% \end{gather*}

Probability is a function named P, and the function is applied to what follows in the parentheses. Let’s look at another example where we write probability statements and find empirical probabilities.

Example 4.1.3.

A survey of licensed drivers asked whether they had received a speeding ticket in the last year and whether their car is red. The results of the survey are shown in the contingency table to the right.

Speeding Ticket No Speeding Ticket Total
Red Car 15 135 150
Not Red Car 45 470 515
Total 60 605 665

Find the probability that a randomly selected survey participant:

  1. has a red car.

  2. has had a speeding ticket in the last year.

  3. has a red car and has not had a speeding ticket in the last year.

  4. has a red car or has had a speeding ticket in the last year.

  5. has had a speeding ticket in the last year given they have a red car.

  6. who has received a speeding ticket in the last year also has a red car.

  7. What do the answers to b and e suggest about the relationship between owning a red car and getting a speeding ticket?

Solution.
  1. Speeding Ticket No Speeding Ticket Total
    Red Car 15 135 150
    Not Red Car 45 470 515
    Total 60 605 665

    To find \(P(\text{red car})\text{,}\) we divide the number of participants who own a red car by the total number of people surveyed:\(P(\text{red car})=\frac{150}{665} \approx 0.226\) or 22.6%.

  2. Speeding Ticket No Speeding Ticket Total
    Red Car 15 135 150
    Not Red Car 45 470 515
    Total 60 605 665

    \(P(\text{speeding ticket})\text{,}\) we divide the number of participants who got a speeding ticket in the last year by the total number of people surveyed: \(P(\text{speeding ticket})=\frac{60}{665} \approx 0.09\) or 9%.

  3. Speeding Ticket No Speeding Ticket Total
    Red Car 15 135 150
    Not Red Car 45 470 515
    Total 60 605 665

    To find \(P(\text{red and no ticket})\) , we find the intersection of the red car category and the no ticket category and divide by the total number of participants: \(P(\text{red and no ticket}) =\frac{135}{665} \approx 0.203\) or 20.3%

  4. Speeding Ticket No Speeding Ticket Total
    Red Car 15 135 150
    Not Red Car 45 470 515
    Total 60 605 665

    To find \(P(\text{red or ticket})\text{,}\) we need to add those who drive a red car and did not have a speeding ticket (just red), those who had a speeding ticket and do not drive a red car (just ticket) and those who drive a red car and had a speeding ticket (both), and divide by the total number of participants:

    \begin{gather*} P(\text{red and no ticket})=\frac{135+45+15}{665}=\frac{195}{665} \approx 0.293\text{ or } 29.3\% \end{gather*}

    Recall from our earlier discussion that we could also calculate the or probability as:

    \begin{align*} P(\text{red and no ticket})\amp= P(\text{red})+ P(\text{speeding ticket}) - P(\text{red and speeding ticket})\\ \amp=\frac{150}{665}+\frac{60}{665}-\frac{15}{665}\\ \amp=\frac{195}{665} \end{align*}

    which gives us the same answer as counting the individual cells.

  5. Speeding Ticket No Speeding Ticket Total
    Red Car 15 135 150
    Not Red Car 45 470 515
    Total 60 605 665

    The probability \(P(\text{speeding ticket given red car})\) is a conditional probability as we have seen before since it is conditional on the given characteristic occurring. In this problem, the given characteristic is owning a red car, so we isolate our attention to just the row of 150 red car owners and see how many have had a speeding ticket in the last year. Looking at the table, we see that there were 15 red car owners who had a speeding ticket in the last year, so we calculate:

    \begin{gather*} P(\text{speeding ticket given red car})=\frac{15}{150} = 0.10\text{ or } 10\% \end{gather*}
  6. Speeding Ticket No Speeding Ticket Total
    Red Car 15 135 150
    Not Red Car 45 470 515
    Total 60 605 665

    This question is also asking for a conditional probability, \(P(\text{red car given speeding ticket})\text{,}\) but it is phrased more like we would say it. In this case the given characteristic is that the person has received a speeding ticket, so we will isolate our attention to just the speeding ticket column. Among the 60 people who had a speeding ticket in the last year, we see that 15 also drove a red car. Now we can calculate the probability:

    \begin{gather*} P(\text{red car given speeding ticket})=\frac{15}{60} = 0.25\text{ or }25\% \end{gather*}

    Notice that compared with part e, when we change the conditional characteristic, we change the denominator of the fraction.

  7. In part b, we determined that there was a 9% chance of randomly selecting a participant who had received a speeding ticket in the last year. However, in part e we found that there was a 25% chance of receiving a ticket in the last year if the person had a red car. This seems to suggest that there is a higher likelihood of getting a speeding ticket if you own a red car. This means that getting a speeding ticket is dependent on whether the person drives a red car, since that increases the probability of getting a ticket. We cannot say, however, whether driving a red car makes you speed or whether people who tend to drive faster buy red cars.

Subsection 4.1.9 Conditional Probabilities

We have mentioned conditional probabilities, which we find by isolating our attention to the given row or column. Here is another example of finding conditional probabilities.

Example 4.1.4.

A home pregnancy test was given to a sample of 93 cisgender women, and their pregnancy was then verified by a blood test. The contingency table below shows the home pregnancy test and whether or not they were actually pregnant as determined by the blood test. Find the probability that a randomly selected woman in the sample

  1. was not pregnant given the home test was positive.

  2. had a positive home pregnancy test given they were not pregnant.

Positive Test Negative Test Total
Pregnant 70 4 74
Not Pregnant 5 14 19
Total 75 18 93
Solution.

Here are the solutions:

  1. Since we are given the home test result was positive, we are limited to the 75 women in the positive test column, of which 5 were not pregnant. This gives:

    Positive Test Negative Test Total
    Pregnant 70 4 74
    Not Pregnant 5 14 19
    Total 75 18 93
    \begin{gather*} P(\text{not pregnant given positve test})=\frac{5}{75} \approx 0.067\text{ or } 6.7\% \end{gather*}
  2. Since we are given the woman is not pregnant, we are limited to the 19 women in the not pregnant row, of which 5 had a positive test. This gives:

    Positive Test Negative Test Total
    Pregnant 70 4 74
    Not Pregnant 5 14 19
    Total 75 18 93
    \begin{gather*} P(\text{positive test given not pregnant})=\frac{5}{19} \approx 0.263\text{ or } 26.3\% \end{gather*}

    This result is referred to as a false positive: A positive test result when the woman is not actually pregnant.

In this section we have learned about empirical probability. In the next section we will discuss another kind of probability that you may be familiar with – theoretical probability.

Exercises 4.1.10 Exercises

1.

A recent survey asked a random sample of PCC students if they are currently experiencing food insecurity and if they are currently experiencing housing insecurity. Fill in the missing entries of the contingency table below.

Food Insecure Not Food Insecure Total
Housing Insecure 60
Not Housing Insecure 460 760
Total 680
2.

A recent survey asked a random sample of PCC students if they have purchased food from the cafeteria in the last week, and if they purchased their textbooks through the bookstore. Fill in the missing entries of the contingency table below.

Bookstore No Bookstore Total
Cafeteria 375
No Cafeteria 135
Total 630 850
3.

A recent survey asked PCC students if they regularly eat breakfast and if they regularly floss their teeth Use the completed Venn Diagram to fill in the corresponding contingency table.

A Venn diagram where the breakfast only region shows 3, the floss only region shows 49, the intersection shows 12 and the region outside the circles shows 8 students
Breakfast No Breakfast Total
Floss
No Floss
Total
4.

A recent survey asked PCC students if they used an Apple phone, and if the regularly used a Chromebook outside of school. Use the completed Venn Diagram to fill in the corresponding contingency table.

A Venn diagram where the Apple only region shows 120, the Chromebook only region shows 85, the intersection is 65 and the outside region is 45 students.
Chromebook No Chromebook Total
Apple
No Apple
Total
5.

Use the following information to complete the contingency table:

  • \(\displaystyle \text{P(A and B)} = 10/75\)

  • \(\displaystyle \text{P(A)} = 40/75 \)

  • \(\displaystyle \text{P(not B)} = 45/75\)

A Not A Total
B
Not B
Total
6.

Use the following information to complete the contingency table:

  • \(\displaystyle \text{P(A given B)} = 30/80\)

  • \(\displaystyle \text{P(Not A and Not B)} = 10/120\)

A Not A Total
B
Not B
Total
7.

A professor gave a test to students in a morning class and the same test to the afternoon class. The grades are summarized below.

A B C Total
Morning Class 8 18 13 39
Afternoon Class 10 4 12 26
Total 18 22 25 65

If one student was chosen at random:

  1. What is the probability they were in the morning class?

  2. What is the probability they earned a C?

  3. What is the probability that they earned an A and they were in the afternoon class?

  4. What is the probability that they earned an A given they were in the morning class?

  5. What is the probability that they were in the morning class or they earned a B?

8.

A professor surveyed students in her morning and afternoon Math 105 class, and asked what their class standing was. The class standings are summarized below:

Freshman Sophomore Junior Senor Total
Morning Class 12 5 7 8 32
Afternoon Class 5 13 8 2 28
Total 17 18 15 10 60

If one student was chosen at random:

  1. What is the probability they were in the morning class?

  2. What is the probability they were a Freshman?

  3. What is the probability that they were a Senior and they were in the afternoon class?

  4. What is the probability that they were a Sophomore given they were in the morning class?

  5. What is the probability that they were in the morning class or they were a Junior?

9.

The contingency table below shows the number of credit cards owned by a group of individuals below the age of 35 and above the age of 35.  

Zero One Two or more Total
Between the ages
of 18-35
9 5 19 33
Over age 35 18 10 20 48
Total 27 15 39 81

If one person was chosen at random:

  1. What is the probability they had no credit cards?

  2. What is the probability they had one credit card?

  3. What is the probability they had no credit cards and is over 35?

  4. What is the probability they are between the ages of 18 and 35, or have zero credit cards?

  5. What is the probability they had no credit cards given that they are between the ages of 18 and 35?

  6. What is the probability they have no credit cards given that they are over age 35?

  7. Does it appear that having no credit cards depends on age? Or are they independent? Use probability to support your claim.

10.

The following contingency table provides data from a sample of 6,224 individuals who were exposed to smallpox in Boston. 2 

Data taken from Mostly Harmless Probability & Statistics by Rachel Webb
Inoculated Not Inoculated Total
Lived 238 5136 5374
Died 6 844 850
Total 244 5980 6224
  1. What is the probability that a person was inoculated?

  2. What is the probability that a person lived?

  3. What is the probability that a person died or was inoculated?

  4. What is the probability that a person died given they were inoculated?

  5. What is the probability that a person died given they were not inoculated?

  6. Does it appear that survival depended on if a person were inoculated? Or are they independent? Use probability to support your claim.

11.

The contingency table below shows the survival data for the passengers of the Titanic.

First Second Third Crew Total
Survive 203 118 178 212 711
Not Survive 122 167 528 673 1490
Total 325 285 706 885 2201
  1. What is the probability that a passenger did not survive?

  2. What is the probability that a passenger was crew?

  3. What is the probability that a passenger was first class and did not survive?

  4. What is the probability that a passenger did not survive or was crew?

  5. What is the probability that a passenger survived given they were first class?

  6. What is the probability that a passenger survived given they were second class?

  7. What is the probability that a passenger survived given they were third class?

  8. Does it appear that survival depended on the passenger’s class? Or are they independent? Use probability to support your claim.

12.

The following table shows the utility patents granted for a specific year.

Corporation Government Individual Total
United States 45% 2% 8% 55%
Foreign 41% 1% 3% 45%
Total 86% 11% 3% 100%
  1. What is the probability that a patent is foreign and from the government?

  2. What is the probability that a patent is from the U.S. and from a corporation?

  3. What is the probability that a patent is foreign or from the government?

  4. What is the probability that a patent is from the U.S. given it is from an individual?

  5. What is the probability that a patent is foreign given it is from the government?

13.

There is a 15% chance that a shopper entering a computer store will purchase a computer, a 25% chance they will purchase a game/software, and there is a 10% chance they will purchase both a computer and a game/software.

  1. Create a contingency table for the information.

    Game/Software No Game/Software Total
    Computer
    No Computer
    Total
  2. What is the probability that a shopper will not purchase a computer and will not purchase a game/software?

  3. What is the probability that a shopper will purchase a computer or purchase a game/software?

  4. What is the probability that a shopper will purchase a game/software given they have purchased a computer?

  5. What is the probability that a shopper will purchase a game/software given they did not purchase a computer?

  6. Does it appear that purchasing a game/software depends on whether the shopper purchased a computer? Or are they independent? Use probability to support your claim.

14.

A fitness center coach kept track over the last year of whether members stretched before they exercised, and whether or not they sustained an injury. Among the 400 members, 322 stretched before they exercised, 327 did not sustain an injury, and 270 both stretched and did not sustain an injury.

  1. Create a contingency table for the information.

    Injury No Injury Total
    Stretched
    Not Stretched
    Total
  2. What is the probability that a member sustained an injury?

  3. What is the probability that a member sustained an injury and did not stretch?

  4. What is the probability that a member stretched or did not sustain an injury?

  5. What is the probability that a member sustained an injury given they stretched?

  6. What is the probability that a member sustained an injury given they did not stretch?

  7. Does it appear that sustaining an injury depends on whether the member stretches before exercising? Or are they independent? Use probability to support your claim.

15.

Among the 95 books on a bookshelf, 72 are fiction, 28 are hardcover, and 87 are fiction or hardcover.

  1. Create a contingency table for the information.

    Hardcover Paperback Total
    Fiction
    Nonfiction
    Total
  2. What is the probability that a book is non-fiction and paperback?

  3. What is the probability that a book is fiction given it is hardcover?

16.

After finishing the course, among the 32 students in a Math 105 class, 25 could successfully construct a contingency table, 27 passed the class, and 29 could successfully construct a contingency table or passed the class.

  1. Create a contingency table for the information.

    Contingency Table No Contingency Table Total
    Pass
    No Pass
    Total
  2. What is the probability that a student passed and could not successfully construct a contingency table?

  3. What is the probability that a student passed given they could not successfully construct a contingency table?