Bernoulli random variable, descriptive
A Bernoulli random variable has exactly two possible outcomes. We typically label one of these outcomes a “success” and the other outcome a “failure”. We may also denote a success by 1
and a failure by 0
.
How long should we expect to flip a coin until it turns up heads
? Or how many times should we expect to roll a die until we get a 1
? These questions can be answered using the geometric distribution. We first formalize each trial — such as a single coin flip or die toss — using the Bernoulli distribution, and then we combine these with our tools from probability (Chapter 3) to construct the geometric distribution.
Stanley Milgram began a series of experiments in 1963 to estimate what proportion of people would willingly obey an authority and give severe shocks to a stranger. Milgram found that about 65% of people would obey the authority and give such shocks. Over the years, additional research suggested this number is approximately consistent across communities and time.^{ 1 }Find further information on Milgram's experiment at www.cnr.berkeley.edu/ucce50/aglabor/7article/article35.htm..
Each person in Milgram's experiment can be thought of as a trial. We label a person a success if she refuses to administer the worst shock. A person is labeled a failure if she administers the worst shock. Because only 35% of individuals refused to administer the most severe shock, we denote the probability of a success with \(p=0.35\text{.}\) The probability of a failure is sometimes denoted with \(q=1p\text{.}\)
Thus, success
or failure
is recorded for each person in the study. When an individual trial only has two possible outcomes, it is called a Bernoulli random variable.
A Bernoulli random variable has exactly two possible outcomes. We typically label one of these outcomes a “success” and the other outcome a “failure”. We may also denote a success by 1
and a failure by 0
.
We chose to label a person who refuses to administer the worst shock a “success” and all others as “failures”. However, we could just as easily have reversed these labels. The mathematical framework we will build does not depend on which outcome is labeled a success and which a failure, as long as we are consistent.
Bernoulli random variables are often denoted as 1
for a success and 0
for a failure. In addition to being convenient in entering data, it is also mathematically handy. Suppose we observe ten trials:
0 1 1 1 1 0 1 1 0 0

Then the sample proportion, \(\hat{p}\text{,}\) is the sample mean of these observations:
This mathematical inquiry of Bernoulli random variables can be extended even further. Because 0
and 1
are numerical outcomes, we can define the mean and standard deviation of a Bernoulli random variable.^{ 2 }If \({p}\) is the true probability of a success, then the mean of a Bernoulli random variable \(X\) is given by
If \(X\) is a random variable that takes value 1 with probability of success \(p\) and 0 with probability \(1p\text{,}\) then \(X\) is a Bernoulli random variable with mean and standard deviation
In general, it is useful to think about a Bernoulli random variable as a random process with only two outcomes: a success or failure. Then we build our mathematical framework using the numerical labels 1
and 0
for successes and failures, respectively.
Dr. Smith wants to repeat Milgram's experiments but she only wants to sample people until she finds someone who will not inflict the worst shock.^{ 3 }This is hypothetical since, in reality, this sort of study probably would not be permitted any longer under current ethical standards. If the probability a person will not give the most severe shock is still 0.35 and the subjects are independent, what are the chances that she will stop the study after the first person? The second person? The third? What about if it takes her \(n1\) individuals who will administer the worst shock before finding her first success, i.e. the first success is on the \(n^{th}\) person? (If the first success is the fifth person, then we say \(n=5\text{.}\))
The probability of stopping after the first person is just the chance the first person will not administer the worst shock: \(10.65=0.35\text{.}\) The probability it will be the second person is
Likewise, the probability it will be the third person is
If the first success is on the \(n^{th}\) person, then there are \(n1\) failures and finally 1 success, which corresponds to the probability \((0.65)^{n1}(0.35)\text{.}\) This is the same as \((10.35)^{n1}(0.35)\text{.}\)
Example 4.3.1 illustrates what is called the geometric distribution, which describes the waiting time until a success for independent and identically distributed (iid) Bernoulli random variables. In this case, the independence aspect just means the individuals in the example don't affect each other, and identical means they each have the same probability of success.
The geometric distribution from Example 4.3.1 is shown in Figure 4.3.2. In general, the probabilities for a geometric distribution decrease exponentially fast.
While this text will not derive the formulas for the mean (expected) number of trials needed to find the first success or the standard deviation or variance of this distribution, we present general formulas for each.
If the probability of a success in one trial is \(p\) and the probability of a failure is \(1p\text{,}\) then the probability of finding the first success in the \(n^{th}\) trial is given by
The mean (i.e. expected value) and standard deviation of this wait time are given by
It is no accident that we use the symbol \(\mu\) for both the mean and expected value. The mean and the expected value are one and the same.
The left side of Equation (4.3.2) says that, on average, it takes \(1/p\) trials to get a success. This mathematical result is consistent with what we would expect intuitively. If the probability of a success is high (e.g. 0.8), then we don't usually wait very long for a success: \(1/0.8 = 1.25\) trials on average. If the probability of a success is low (e.g. 0.1), then we would expect to view many trials before we see a success: \(1/0.1 = 10\) trials.
The probability that an individual would refuse to administer the worst shock is said to be about 0.35. If we were to examine individuals until we found one that did not administer the shock, how many people should we expect to check? The first expression in (4.3.2) may be useful.^{ 4 }We would expect to see about \(1/0.35 = 2.86\) individuals to find the first success.
What is the chance that Dr. Smith will find the first success within the first 4 people?
This is the chance it is the first (\(n=1\)), second (\(n=2\)), third (\(n=3\)), or fourth (\(n=4\)) person as the first success, which are four disjoint outcomes. Because the individuals in the sample are randomly sampled from a large population, they are independent. We compute the probability of each case and add the separate results:
There is an 82% chance that she will end the study within 4 people.
Determine a more clever way to solve Example 4.3.4. Show that you get the same result.^{ 5 }First find the probability of the complement:
Suppose in one region it was found that the proportion of people who would administer the worst shock was “only” 55%. If people were randomly selected from this region, what is the expected number of people who must be checked before one was found that would be deemed a success? What is the standard deviation of this waiting time?
A success is when someone will not inflict the worst shock, which has probability \(p=10.55=0.45\) for this region. The expected number of people to be checked is \(1/p = 1/0.45 = 2.22\) and the standard deviation is \(\sqrt{(1p)/p^2} = 1.65\text{.}\)
Using the results from Example 4.3.6, \(\mu = 2.22\) and \(\sigma = 1.65\text{,}\) would it be appropriate to use the normal model to find what proportion of experiments would end in 3 or fewer trials?^{ 6 }No. The geometric distribution is always right skewed and can never be wellapproximated by the normal model.
The independence assumption is crucial to the geometric distribution's accurate description of a scenario. Mathematically, we can see that to construct the probability of the success on the \(n^{th}\) trial, we had to use the Multiplication Rule for Independent Processes. It is no simple task to generalize the geometric model for dependent trials.