AHSS Chapter exercises

Section 8.5 Chapter exercises

Exercises 8.5.1 Exercises

1. True / False.

Determine if the following statements are true or false. If false, explain why.

A correlation coefficient of -0.90 indicates a stronger linear relationship than a correlation of 0.5.
Correlation is a measure of the association between any two variables.

Solution

(a) True.

(b) False, correlation is a measure of the linear association between any two numerical variables.

2. Cats, Part II.

Exercise 8.2.11.10 presents regression output from a model for predicting the heart weight (in g) of cats from their body weight (in kg). The coefficients are estimated using a dataset of 144 domestic cat. The model output is also provided below.


	Estimate	Std. Error	t value	Pr\((\gt\|t\|)\)

(Intercept)	-0.357	0.692	-0.515	0.607
body wt	4.034	0.250	16.119	0.000
\(s = 1.452\)		\(R^2 = 64.66\)%		\(R^2_{adj} = 64.41\)%

We see that the point estimate for the slope is positive. What are the hypotheses for evaluating whether body weight is positively associated with heart weight in cats?
State the conclusion of the hypothesis test from part (a) in context of the data.
Calculate a 95% confidence interval for the slope of body weight, and interpret it in context of the data.
Do your results from the hypothesis test and the confidence interval agree? Explain.

3. Nutrition at Starbucks, Part II.

Exercise 8.2.11.6 introduced a data set on nutrition information on Starbucks food menu items. Based on the scatterplot and the residual plot provided, describe the relationship between the protein content and calories of these menu items, and determine if a simple linear model is appropriate to predict amount of protein from the number of calories.

Solution

There is an upwards trend. However, the variability is higher for higher calorie counts, and it looks like there might be two clusters of observations above and below the line on the right, so we should be cautious about fitting a linear model to these data.

4. Helmets and lunches.

The scatterplot shows the relationship between socioeconomic status measured as the percentage of children in a neighborhood receiving reduced-fee lunches at school (lunch) and the percentage of bike riders in the neighborhood wearing helmets (helmet). The average percentage of children receiving reduced-fee lunches is 30.8% with a standard deviation of 26.7% and the average percentage of bike riders wearing helmets is 38.8% with a standard deviation of 16.9%.

If the \(R^2\) for the least-squares regression line for these data is 72%, what is the correlation between lunch and helmet?
Calculate the slope and intercept for the least-squares regression line for these data.
Interpret the intercept of the least-squares regression line in the context of the application.
Interpret the slope of the least-squares regression line in the context of the application.
What would the value of the residual be for a neighborhood where 40% of the children receive reduced-fee lunches and 40% of the bike riders wear helmets? Interpret the meaning of this residual in the context of the application.

5. Match the correlation, Part III.

Match each correlation to the corresponding scatterplot.

\(r=-0.72\)
\(r=0.07\)
\(r=0.86\)
\(r=0.99\)

Solution

(a) \(r =-0.72 \rightarrow (2)\text{.}\)

(b) \(r = 0.07 \rightarrow (4)\text{.}\)

(d) \(r = 0.99 \rightarrow (3)\text{.}\)

6. Rate my professor.

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the infuence of non-teaching related characteristics, such as the physical appearance of the instructor. Researchers at University of Texas, Austin collected data on teaching evaluation score (higher score means better) and standardized beauty score (a score of 0 means average, negative score means below average, and a positive score means above average) for a sample of 463 professors.¹ The scatterplot below shows the relationship between these variables, and regression output is provided for predicting teaching evaluation score from beauty score.

Daniel S Hamermesh and Amy Parker. “Beauty in the classroom: Instructors' pulchritude and putative pedagogical productivity”. In: Economics of Education Review 24.4 (2005), pp. 369-376.


	Estimate	Std. Error	t value	Pr\((\gt\|t\|)\)

(Intercept)	4.010	0.0255	157.21	0.0000
beauty		0.0322	4.13	0.0000

Given that the average standardized beauty score is -0.0883 and average teaching evaluation score is 3.9983, calculate the slope. Alternatively, the slope may be computed using just the information provided in the model summary table.
Do these data provide convincing evidence that the slope of the relationship between teaching evaluation and beauty is positive? Explain your reasoning.
List the conditions required for linear regression and check if each one is satisfied for this model based on the following diagnostic plots.

7. Trees.

The scatterplots below show the relationship between height, diameter, and volume of timber in 31 felled black cherry trees. The diameter of the tree is measured 4.5 feet above the ground.²

Source: R Dataset, stat.ethz.ch/R-manual/R-patched/library/datasets/html/trees.html.

Describe the relationship between volume and height of these trees.
Describe the relationship between volume and diameter of these trees.
Suppose you have height and diameter measurements for another black cherry tree. Which of these variables would be preferable to use to predict the volume of timber in this tree using a simple linear regression model? Explain your reasoning.

Solution

(a) There is a weak-to-moderate, positive, linear association between height and volume. There also appears to be some non-constant variance since the volume of trees is more variable for taller trees.

(b) There is a very strong, positive association between diameter and volume. The relationship may include slight curvature.

(c) Since the relationship is stronger between volume and diameter, using diameter would be preferred. However, as mentioned in part (b), the relationship between volume and diameter may not be, and so we may benefit from a model that properly accounts for nonlinearity.