Section 4.1 Contingency Tables
Objectives: Section 4.1 Contingency Tables
Students will be able to:
Relate Venn diagrams and contingency tables
Calculate percentages from a contingency table
Calculate “and” empirical probabilities
Calculate “or” empirical probabilities
Calculate conditional probabilities
Determine whether two characteristics are independent
When we looked at categorical data in the previous chapter, it was related to a single variable, or characteristic of interest, such as favorite movie or car color. To illustrate the data, we made a frequency table and used it to create a pie chart or bar chart. But what if we want to illustrate the relationship between two categorical variables? To do this, we can use a contingency table.
Subsection 4.1.1 Contingency Tables
A contingency table summarizes all the possible combinations for two categorical variables. Each value in the table represents the number of times a particular combination of outcomes occurs. For example, suppose we randomly select 250 households from the greater Portland area and ask whether they have a cat and whether they have a dog. In this case, “have a cat” and “have a dog” are the two variables, and each variable has two categories: Yes and No. To create the contingency table, we make columns for the categories of one variable, and rows for the categories of the other variable. We also add a row and column for the subtotals of each category. Each cell of the resulting table contains the number of outcomes having the characteristics of the intersecting row and column categories. For our dog and cat example, the table would look like this:
Dog  No Dog  Total  

Cat  Yes Cat and Yes Dog  Yes Cat and No Dog  Yes Cat Total 
No Cat  No Cat and Yes Dog  No Cat and No Dog  No Cat Total 
Total  Yes Dog Total  No Dog Total  Grand total 
Suppose that of the 250 households surveyed, 180 said they have a cat, 95 said they have a dog, and 52 said they have both a cat and a dog. We can use this information to fill in the cells of the table.
Dog  No Dog  Total  

Cat  52  180  
No Cat  
Total  95  250 
The first cell we can fill in is the grand total, which is the total number of subjects in the study. In this case, there are 250 households participating in the survey. The next two cells we can fill in are the total number of households that have a cat, 180, and the total number of households that have a dog, 95. The final cell we can fill in from the given information is the intersection of the having a dog column and a having a cat row, which is 52 households.
Since each row and column must sum to their totals, we can use subtraction to find the missing numbers as shown below.
Dog  No Dog  Total  

Cat  52  \(18052=128\)  180 
No Cat  \(9552=43\)  \(155128=27\) or \(7043=27\) 
\(250180=70\) 
Total  95  \(25095=155\)  250 
Now that we have our contingency table completed, notice that the numbers in the central four cells add to the grand total as shown in the table on the left. The total row and the total column also add to the grand total as shown in the right table.
Dog  No Dog  Total  

Cat  52 
128 
180 
No Cat  43 
27 
70 
Total  95  155  250 
Dog  No Dog  Total  

Cat  52  128  180 
No Cat  43  27  70 
Total  95 
155 
250 
Subsection 4.1.2 Contingency Tables and Venn Diagrams
If the subtractions we just did seem familiar, they should! This is very similar to what we did for reporting data with a Venn diagram. The Venn diagram for this data is shown below. We also subtracted the intersection from the total of the cat and dog owners to find numbers in the crescent regions.
Notice that the numbers in the four regions of the Venn diagram are the same as the four cells in the center of the contingency table and add to the grand total.
Subsection 4.1.3 “And” Statements
Now we can use the contingency table or the Venn diagram to determine the percentage of households that meet certain conditions. For instance, what percent of those surveyed own a cat and do not own a dog? In the Venn diagram, this is 128 households in the cat only region.
In the contingency table we see the 128 households at the intersection of the row of households who own a cat and the column of households who do not own a dog. As a percentage, the total number of households surveyed, is \(\frac{128}{250}=0.512\) or 51.2% that have a cat and no dog.
Dog  No Dog  Total  

Cat  52  128 
180 
No Cat  43  27  70 
Total  95  155  250 
Subsection 4.1.4 “Or” Statements
How about the percentage of households surveyed that have a cat or a dog? We know from Venn diagrams that the inclusive or includes the number of households who own a cat only, a dog only, and both a cat and a dog, or \(128+52+43=223\) households. As a percentage of the total surveyed, we get \(\frac{223}{250}=0.892\) or 89.2% of households in the sample have a dog or a cat (or both).
We can get the same answer from the contingency table. by adding the cells for households who have a cat and not a dog, a dog and not a cat, and the households that have both a cat and a dog. This also gives us 223 households.
There is another way to calculate an or statements from a contingency table. We could add the row and column totals for having a cat and having a dog, but then we have counted the 52 households in the intersection twice. We can subtract that number to get \(180+9552=223\) households with a dog or a cat, which we know is 89.2% of those surveyed.
Dog  No Dog  Total  

Cat  52 
128 
180 
No Cat  43 
27  70 
Total  95  155  250 
Subsection 4.1.5 Conditional Statements
Another question we can answer using a contingency table is what percentage of dog owning households also own a cat? In this case the group that we are interested in isn’t every household surveyed (the grand total), but just those households that own a dog.
Dog 
No Dog  Total  

Cat  52 
128  180 
No Cat  43 
27  70 
Total  95 
155  250 
We call this a conditional statement because we are only considering the households with a certain condition. If we focus on the column representing the households that own a dog, we see that there is a total of 95 households with a dog, and that 52 of those 95 households also have a cat. Therefore, \(\frac{52}{95} \approx 0.547\) or approximately 54.7% of the households with a dog also have a cat. Another way to phrase this conditional statement is, “What percent of households have a cat given they have a dog.” You will see the word given quite a bit in this chapter and that makes the denominator change. It is also possible to find this conditional percentage using the Venn diagram by taking the number in the intersection and dividing it by the total in the whole dog circle.
Subsection 4.1.6 Contingency Tables with More Than Two Categories
When there are only two categories for each variable, like yes/no questions, Venn diagrams and contingency tables provide basically the same information and can be used interchangeably. A Venn diagram works well for yes/no variables since a subject is either inside the circle (has the characteristic) or outside the circle (does not have the characteristic). If we have more than two possibilities for any of the variables, though, we cannot use a Venn diagram. We can use a contingency table, though. Here is an example where one variable has four categories and the other has three categories.
Example 4.1.2.
910 randomly sampled registered voters from Tampa, FL were asked if they thought workers who have illegally entered the US should (i) be allowed to keep their jobs and apply for US citizenship, (ii) be allowed to keep their jobs as temporary guest workers but not be allowed to apply for US citizenship, or (iii) lose their jobs and have to leave the country. Not sure was also an option (iv). The results of the survey by political ideology are shown below^{ 1 }. Use the contingency table to answer the questions.
Conservative  Moderate  Liberal  Total  

(i) Apply for citizenship  57  120  101  278 
(ii) Guest worker  121  113  28  262 
(iii) Leave the country  179  126  45  350 
(iv) Not sure  15  4  1  20 
Total  372  363  175  910 
What percent of the sampled Tampa, Fl voters identified themselves as conservatives?
What percent of the sampled voters are in favor of the citizenship option?
What percent of the sampled voters identify themselves as conservatives and are in favor of the citizenship option?
What percent of the sampled voters identify themselves as liberal or are in favor of the leaving the country option?
What percent of the sampled voters who identify as conservatives are also in favor of the citizenship option? What percent of moderate and liberal voters share this view?
To answer this question, we find the conservative column and look to the bottom cell for the total number of conservative voters and divide that by the total number of voters surveyed. This gives us \(\frac{372}{910}\approx 0.409\) or approximately 41% of the Tampa, Fl voters who identify as conservative.
For this question we find the apply for citizenship row, look across to find the total, and divide this by the total number of voters surveyed. We get \(\frac{278}{910} \approx 0.305\) or approximately 31% of these voters are in favor of the citizenship option.
For this question we are looking for the cell that is the intersection of those who identify as conservative and those who are in favor of the citizen option. This cell has 57 voters, so we divide that by the total number of voters. This gives us or \(\frac{57}{910} \approx 0.063\) or approximately 6.3% of these voters identify as conservatives and are in favor of the citizenship option.

The or in this question is inclusive, so we need to determine the number of voters who identify as liberal, who are in favor of the leaving the country option, or both.
Conservative Moderate Liberal Total (i) Apply for citizenship 57 120 101
278 (ii) Guest worker 121 113 28
262 (iii) Leave the country 179
126
45
350 (iv) Not sure 15 4 1
20 Total 372 363 175 910 In terms of the individual cells, the number of voters who have the specified characteristics is the sum \(179+126+101+28+45+1=480\text{,}\) which we can divide by the total number of voters surveyed to get the percent. So, we have \(\frac{480}{910} \approx 0.527\) or approximately 53% of the voters identify as liberal or are in favor of the leave the country option.
Another way to calculate this is to add the total number who identify as liberal (175 voters) and the total number who are in favor of the leave the country option (350 voters), then subtract the double counted cell (45 voters) who are liberal and in favor of the leave the country option: \(175+35045=480\)

As we saw before, these are conditional statements. For the first part of this question, we want to focus just on those voters who identify as conservatives, and from among that group determine the percent in favor of the citizenship option. We calculate that \(\frac{57}{372} \approx 0.153\) or approximately 15% of conservative voters are in favor of the citizenship option.
For the second part, we want to focus on just those voters who identify as moderate, and from among that group determine the percent in favor of the citizenship option. Then we have \(\frac{120}{363} \approx 0.33\) or approximately 33% of moderate voters are in favor of the citizen option.
Finally, we want to focus on just those voters who identify as liberal, and from among that group determine the percent in favor of the citizenship option. We calculate \(\frac{101}{175} \approx 0.58\) or approximately 58% of liberal voters are in favor of the citizenship option. Looking at these three percentages, it is clear that support of the citizenship option depends on political ideology. If support of the citizenship option were the same across political ideologies, then we would say that favoring the citizenship option and political ideology were independent of each other.
Subsection 4.1.7 Empirical Probability
If our sample is representative of the population, then we can also interpret a percentage we calculate from a contingency table as a probability, or the likelihood that something will happen. Since a contingency table is constructed from data collected through sampling or an experiment, we call it an empirical or experimental probability. This is different from a theoretical probability which we will look at in the next section.
Subsection 4.1.8 Finding Empirical Probabilities with a Contingency Table
Suppose that 60% of students in our class have a summer birthday (June, July, or August). Now suppose everyone’s name and birth month are written on slips of paper and thrown into a bag. If we pull a slip of paper out of the bag at random, what is the probability that the selected student has a summer birthday? If you think there should be a 60% chance, you are right! The relative frequency of the characteristic of interest will be equal to its empirical probability. To write this as a probability statement, it would look like
Probability is a function named P, and the function is applied to what follows in the parentheses. Let’s look at another example where we write probability statements and find empirical probabilities.
Example 4.1.3.
A survey of licensed drivers asked whether they had received a speeding ticket in the last year and whether their car is red. The results of the survey are shown in the contingency table to the right.
Speeding Ticket  No Speeding Ticket  Total  

Red Car  15  135  150 
Not Red Car  45  470  515 
Total  60  605  665 
Find the probability that a randomly selected survey participant:
has a red car.
has had a speeding ticket in the last year.
has a red car and has not had a speeding ticket in the last year.
has a red car or has had a speeding ticket in the last year.
has had a speeding ticket in the last year given they have a red car.
who has received a speeding ticket in the last year also has a red car.
What do the answers to b and e suggest about the relationship between owning a red car and getting a speeding ticket?

Speeding Ticket No Speeding Ticket Total Red Car 15 135 150 Not Red Car 45 470 515 Total 60 605 665 To find \(P(\text{red car})\text{,}\) we divide the number of participants who own a red car by the total number of people surveyed:\(P(\text{red car})=\frac{150}{665} \approx 0.226\) or 22.6%.

Speeding Ticket No Speeding Ticket Total Red Car 15 135 150 Not Red Car 45 470 515 Total 60 605 665 \(P(\text{speeding ticket})\text{,}\) we divide the number of participants who got a speeding ticket in the last year by the total number of people surveyed: \(P(\text{speeding ticket})=\frac{60}{665} \approx 0.09\) or 9%.

Speeding Ticket No Speeding Ticket Total Red Car 15 135 150 Not Red Car 45 470 515 Total 60 605 665 To find \(P(\text{red and no ticket})\) , we find the intersection of the red car category and the no ticket category and divide by the total number of participants: \(P(\text{red and no ticket}) =\frac{135}{665} \approx 0.203\) or 20.3%

Speeding Ticket No Speeding Ticket Total Red Car 15 135 150 Not Red Car 45 470 515 Total 60 605 665 To find \(P(\text{red or ticket})\text{,}\) we need to add those who drive a red car and did not have a speeding ticket (just red), those who had a speeding ticket and do not drive a red car (just ticket) and those who drive a red car and had a speeding ticket (both), and divide by the total number of participants:
\begin{gather*} P(\text{red and no ticket})=\frac{135+45+15}{665}=\frac{195}{665} \approx 0.293\text{ or } 29.3\% \end{gather*}Recall from our earlier discussion that we could also calculate the or probability as:
\begin{align*} P(\text{red and no ticket})\amp= P(\text{red})+ P(\text{speeding ticket})  P(\text{red and speeding ticket})\\ \amp=\frac{150}{665}+\frac{60}{665}\frac{15}{665}\\ \amp=\frac{195}{665} \end{align*}which gives us the same answer as counting the individual cells.

Speeding Ticket No Speeding Ticket Total Red Car 15 135 150 Not Red Car 45 470 515 Total 60 605 665 The probability \(P(\text{speeding ticket given red car})\) is a conditional probability as we have seen before since it is conditional on the given characteristic occurring. In this problem, the given characteristic is owning a red car, so we isolate our attention to just the row of 150 red car owners and see how many have had a speeding ticket in the last year. Looking at the table, we see that there were 15 red car owners who had a speeding ticket in the last year, so we calculate:
\begin{gather*} P(\text{speeding ticket given red car})=\frac{15}{150} = 0.10\text{ or } 10\% \end{gather*} 
Speeding Ticket No Speeding Ticket Total Red Car 15 135 150 Not Red Car 45 470 515 Total 60 605 665 This question is also asking for a conditional probability, \(P(\text{red car given speeding ticket})\text{,}\) but it is phrased more like we would say it. In this case the given characteristic is that the person has received a speeding ticket, so we will isolate our attention to just the speeding ticket column. Among the 60 people who had a speeding ticket in the last year, we see that 15 also drove a red car. Now we can calculate the probability:
\begin{gather*} P(\text{red car given speeding ticket})=\frac{15}{60} = 0.25\text{ or }25\% \end{gather*}Notice that compared with part e, when we change the conditional characteristic, we change the denominator of the fraction.
In part b, we determined that there was a 9% chance of randomly selecting a participant who had received a speeding ticket in the last year. However, in part e we found that there was a 25% chance of receiving a ticket in the last year if the person had a red car. This seems to suggest that there is a higher likelihood of getting a speeding ticket if you own a red car. This means that getting a speeding ticket is dependent on whether the person drives a red car, since that increases the probability of getting a ticket. We cannot say, however, whether driving a red car makes you speed or whether people who tend to drive faster buy red cars.
Subsection 4.1.9 Conditional Probabilities
We have mentioned conditional probabilities, which we find by isolating our attention to the given row or column. Here is another example of finding conditional probabilities.
Example 4.1.4.
A home pregnancy test was given to a sample of 93 cisgender women, and their pregnancy was then verified by a blood test. The contingency table below shows the home pregnancy test and whether or not they were actually pregnant as determined by the blood test. Find the probability that a randomly selected woman in the sample
was not pregnant given the home test was positive.
had a positive home pregnancy test given they were not pregnant.
Positive Test  Negative Test  Total  

Pregnant  70  4  74 
Not Pregnant  5  14  19 
Total  75  18  93 
Here are the solutions:

Since we are given the home test result was positive, we are limited to the 75 women in the positive test column, of which 5 were not pregnant. This gives:
Positive Test Negative Test Total Pregnant 70 4 74 Not Pregnant 5 14 19 Total 75 18 93 \begin{gather*} P(\text{not pregnant given positve test})=\frac{5}{75} \approx 0.067\text{ or } 6.7\% \end{gather*} 
Since we are given the woman is not pregnant, we are limited to the 19 women in the not pregnant row, of which 5 had a positive test. This gives:
Positive Test Negative Test Total Pregnant 70 4 74 Not Pregnant 5 14 19 Total 75 18 93 \begin{gather*} P(\text{positive test given not pregnant})=\frac{5}{19} \approx 0.263\text{ or } 26.3\% \end{gather*}This result is referred to as a false positive: A positive test result when the woman is not actually pregnant.
In this section we have learned about empirical probability. In the next section we will discuss another kind of probability that you may be familiar with – theoretical probability.
Exercises 4.1.10 Exercises
1.
A recent survey asked a random sample of PCC students if they are currently experiencing food insecurity and if they are currently experiencing housing insecurity. Fill in the missing entries of the contingency table below.
Food Insecure  Not Food Insecure  Total  

Housing Insecure  60  
Not Housing Insecure  460  760  
Total  680 
2.
A recent survey asked a random sample of PCC students if they have purchased food from the cafeteria in the last week, and if they purchased their textbooks through the bookstore. Fill in the missing entries of the contingency table below.
Bookstore  No Bookstore  Total  

Cafeteria  375  
No Cafeteria  135  
Total  630  850 
3.
A recent survey asked PCC students if they regularly eat breakfast and if they regularly floss their teeth Use the completed Venn Diagram to fill in the corresponding contingency table.
Breakfast  No Breakfast  Total  

Floss  
No Floss  
Total 
4.
A recent survey asked PCC students if they used an Apple phone, and if the regularly used a Chromebook outside of school. Use the completed Venn Diagram to fill in the corresponding contingency table.
Chromebook  No Chromebook  Total  

Apple  
No Apple  
Total 
5.
Use the following information to complete the contingency table:
\(\displaystyle \text{P(A and B)} = 10/75\)
\(\displaystyle \text{P(A)} = 40/75 \)
\(\displaystyle \text{P(not B)} = 45/75\)
A  Not A  Total  

B  
Not B  
Total 
6.
Use the following information to complete the contingency table:
\(\displaystyle \text{P(A given B)} = 30/80\)
\(\displaystyle \text{P(Not A and Not B)} = 10/120\)
A  Not A  Total  

B  
Not B  
Total 
7.
A professor gave a test to students in a morning class and the same test to the afternoon class. The grades are summarized below.
A  B  C  Total  

Morning Class  8  18  13  39 
Afternoon Class  10  4  12  26 
Total  18  22  25  65 
If one student was chosen at random:
What is the probability they were in the morning class?
What is the probability they earned a C?
What is the probability that they earned an A and they were in the afternoon class?
What is the probability that they earned an A given they were in the morning class?
What is the probability that they were in the morning class or they earned a B?
8.
A professor surveyed students in her morning and afternoon Math 105 class, and asked what their class standing was. The class standings are summarized below:
Freshman  Sophomore  Junior  Senor  Total  

Morning Class  12  5  7  8  32 
Afternoon Class  5  13  8  2  28 
Total  17  18  15  10  60 
If one student was chosen at random:
What is the probability they were in the morning class?
What is the probability they were a Freshman?
What is the probability that they were a Senior and they were in the afternoon class?
What is the probability that they were a Sophomore given they were in the morning class?
What is the probability that they were in the morning class or they were a Junior?
9.
The contingency table below shows the number of credit cards owned by a group of individuals below the age of 35 and above the age of 35.
Zero  One  Two or more  Total  

Between the ages of 1835  9  5  19  33 
Over age 35  18  10  20  48 
Total  27  15  39  81 
If one person was chosen at random:
What is the probability they had no credit cards?
What is the probability they had one credit card?
What is the probability they had no credit cards and is over 35?
What is the probability they are between the ages of 18 and 35, or have zero credit cards?
What is the probability they had no credit cards given that they are between the ages of 18 and 35?
What is the probability they have no credit cards given that they are over age 35?
Does it appear that having no credit cards depends on age? Or are they independent? Use probability to support your claim.
10.
The following contingency table provides data from a sample of 6,224 individuals who were exposed to smallpox in Boston.^{ 2 }
Inoculated  Not Inoculated  Total  

Lived  238  5136  5374 
Died  6  844  850 
Total  244  5980  6224 
What is the probability that a person was inoculated?
What is the probability that a person lived?
What is the probability that a person died or was inoculated?
What is the probability that a person died given they were inoculated?
What is the probability that a person died given they were not inoculated?
Does it appear that survival depended on if a person were inoculated? Or are they independent? Use probability to support your claim.
11.
The contingency table below shows the survival data for the passengers of the Titanic.
First  Second  Third  Crew  Total  

Survive  203  118  178  212  711 
Not Survive  122  167  528  673  1490 
Total  325  285  706  885  2201 
What is the probability that a passenger did not survive?
What is the probability that a passenger was crew?
What is the probability that a passenger was first class and did not survive?
What is the probability that a passenger did not survive or was crew?
What is the probability that a passenger survived given they were first class?
What is the probability that a passenger survived given they were second class?
What is the probability that a passenger survived given they were third class?
Does it appear that survival depended on the passenger’s class? Or are they independent? Use probability to support your claim.
12.
The following table shows the utility patents granted for a specific year.
Corporation  Government  Individual  Total  

United States  45%  2%  8%  55% 
Foreign  41%  1%  3%  45% 
Total  86%  11%  3%  100% 
What is the probability that a patent is foreign and from the government?
What is the probability that a patent is from the U.S. and from a corporation?
What is the probability that a patent is foreign or from the government?
What is the probability that a patent is from the U.S. given it is from an individual?
What is the probability that a patent is foreign given it is from the government?
13.
There is a 15% chance that a shopper entering a computer store will purchase a computer, a 25% chance they will purchase a game/software, and there is a 10% chance they will purchase both a computer and a game/software.

Create a contingency table for the information.
Game/Software No Game/Software Total Computer No Computer Total What is the probability that a shopper will not purchase a computer and will not purchase a game/software?
What is the probability that a shopper will purchase a computer or purchase a game/software?
What is the probability that a shopper will purchase a game/software given they have purchased a computer?
What is the probability that a shopper will purchase a game/software given they did not purchase a computer?
Does it appear that purchasing a game/software depends on whether the shopper purchased a computer? Or are they independent? Use probability to support your claim.
14.
A fitness center coach kept track over the last year of whether members stretched before they exercised, and whether or not they sustained an injury. Among the 400 members, 322 stretched before they exercised, 327 did not sustain an injury, and 270 both stretched and did not sustain an injury.

Create a contingency table for the information.
Injury No Injury Total Stretched Not Stretched Total What is the probability that a member sustained an injury?
What is the probability that a member sustained an injury and did not stretch?
What is the probability that a member stretched or did not sustain an injury?
What is the probability that a member sustained an injury given they stretched?
What is the probability that a member sustained an injury given they did not stretch?
Does it appear that sustaining an injury depends on whether the member stretches before exercising? Or are they independent? Use probability to support your claim.
15.
Among the 95 books on a bookshelf, 72 are fiction, 28 are hardcover, and 87 are fiction or hardcover.

Create a contingency table for the information.
Hardcover Paperback Total Fiction Nonfiction Total What is the probability that a book is nonfiction and paperback?
What is the probability that a book is fiction given it is hardcover?
16.
After finishing the course, among the 32 students in a Math 105 class, 25 could successfully construct a contingency table, 27 passed the class, and 29 could successfully construct a contingency table or passed the class.

Create a contingency table for the information.
Contingency Table No Contingency Table Total Pass No Pass Total What is the probability that a student passed and could not successfully construct a contingency table?
What is the probability that a student passed given they could not successfully construct a contingency table?