Chapter 12: The Normal Distribution

Author

Zane Billings

Published

2024-07-28

options("scipen" = 9999, "digits" = 4)

This chapter introduces the normal distribution. There’s not much else to say about it.

Q1

What is the probability of observing a value five sigma greater than the mean or more?

Since we know that this property is invariant for all normal family distributions, we can estimate it numerically using a standard normal distribution, for which the standard deviation is 1. Therefore, we can estimate the probability of interest by calculating \[P(x \geq 5) = \int_{5}^{\infty} \mathcal{N}(0, 1) \ \mathrm{d}x,\] which is easy to approximate in R.

prob <- integrate(\(x) dnorm(x), 5, Inf)$value
print(prob)
[1] 0.0000002867

The probability is 0.0000002867, which is quite low, less than a hundred-thousandth of a percent.

Q2

A fever is any temperature greater than 100.4 degrees Fahrenheit. Given the following measurements, what is the probability that the patient has a fever?

100.0, 98.8, 101.0, 100.5, 99.7

temps <- c(100.0, 98.8, 101.0, 100.5, 99.7)
(temp_mean <- mean(temps))
[1] 100
(temp_sd <- sd(temps))
[1] 0.8337

Assuming the temperature measurements have normally distributed error away from the true underlying temperature value (which we assume is constant), we can estimate the mean as 100 degrees and the standard deviation as 0.83. Then, the probability that the patient has a fever is \[ P(\text{temp} \geq 100.4) = \int_{100.4}^{\infty} \mathcal{N}(100, 0.83) \ \mathrm{d} (\text{temp}).\]

fever_prob <- integrate(\(x) dnorm(x, temp_mean, temp_sd), 100.4, Inf)
print(fever_prob)
0.3157 with absolute error < 0.0000009

Given all of our assumptions and measurements, the probability that the patient has a fever is about 31.57 percent.

Q3

Suppose in Chapter 11 we tried to measure the depth of a well by timing coin drops and we got the following values:

2.5, 3.0, 3.5, 4.0, 2.0.

The distance an object falls can be calculated (in meters) with the formula \[\text{distance} = \frac{1}{2}\times G \times \text{time}^2\] where \(G = 9.8 \ \text{m}/\text{s}^2,\) the gravitational constant. What is the probability that the well is over 500 meters deep?

First, we need to calculate the fall distances we would have gotten from these times.

times <- c(2.5, 3.0, 3.5, 4.0, 2.0)
distances <- 0.5 * 9.8 * times ^ 2
knitr::kable(data.frame("t" = times, "d" = distances))
t d
2.5 30.62
3.0 44.10
3.5 60.02
4.0 78.40
2.0 19.60

Now, assuming that the errors in measuring the distance are normally distributed, we can calculate the mean and standard deviation.

(dist_mean <- mean(distances))
[1] 46.55
(dist_sd <- sd(distances))
[1] 23.36

Finally, we compute the probability by integrating from 500 to infinity.

prob_500_depth <- integrate(\(x) dnorm(x, dist_mean, dist_sd), 500, Inf)
print(prob_500_depth)
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000002872 with absolute error < 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000034

The probability might as well be 0, it’s actually somewhere around \(10 ^ {-83.54}\).

Q4

What is the probability there is no well (i.e. the well is really 0 meters deep)? You’ll notice that probability is higher than you might expect, given your observation is that there is a well. There are two good explanations for this probability being higher than it should. The first is that the normal distribution is a poor model for our measurements, the second is that, when making up numbers for an example, I chose values that you likely wouldn’t see in real life. Which is more likely to you?

The question is ill-posed. The probability that the depth of the well is 0 meters is 0, because the probability of any single point is 0, since we’ve assumed a continuous distribution for the measurements. We have to approximate the probability by finding the probability of a small neighborhood around zero, which is similar but not quite the same thing. If we instead interpret the question to mean the depth of the well is zero or less, we can integrate from -infinity to zero.

prob_0_depth <- integrate(\(x) dnorm(x, dist_mean, dist_sd), -Inf, 0)
print(prob_0_depth)
0.02312 with absolute error < 0.0000028

According to our model, there is almost a \(2.31 \%\) chance that the well has a zero or negative depth! Which clearly doesn’t make sense. Notably, in the official solution, the author models the times instead of the distances, but also gets a nonsensical result.

Clearly, a normal distribution is not appropriate here. Neither times nor distances can be negative, and yet using a normal distribution for either gives positive probabilities of negative results. Furthermore, the data are incredibly spread out on the distance scale, implying that our measurements are not very good. So both problems are true.

The official solution claims that there is “no reason to question the assumptions of the normal distribution here”, but that is an obviously untrue statement.