boot
package for easy bootstrapping.rsample
package and advanced R programming techniques to iterate over lists of bootstrap resamples.credit: https://commons.wikimedia.org/wiki/File:Illustration_bootstrap.svg
credit: https://commons.wikimedia.org/wiki/File:Illustration_bootstrap.svg
data()
functiondata()
function.birthwt
, is in the MASS
package, so you can access it with either MASS::birthwt
or data("birthwt", package = "MASS")
.low
, 1 is low birth weight) between mothers who smoke (smoke = 1
) and don’t (smoke
= 0).set.seed()
to make sure our results are the same every time we run our code!set.seed(370)
B <- 1000
# Alternatively use replicate() if you don't like the weird function here
# or use a loop
resamples <- lapply(
1:B,
\(n) birthwt[sample.int(nrow(birthwt), replace = TRUE), ]
)
str(resamples, 1)
List of 1000
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
$ :'data.frame': 189 obs. of 10 variables:
[list output truncated]
ht
= 1) and without (ht = 0
) a history of hypertension using the standard CI, and using a bootstrap CI with the percentile method.for a \(2\times2\) table.
Exposed | Unexposed | |
---|---|---|
Exposed | a | b |
Unexposed | c | d |
point_estimate_rr <- rr_ht(birthwt)
contigency_table_ht <- table(
factor(birthwt$ht, c(1, 0), c("Exposed", "Unexposed")),
factor(birthwt$smoke, c(1, 0), c("Case", "Control"))
)
calculate_log_rr_se <- function(contigency_table) {
a <- contigency_table[1, 1]
b <- contigency_table[1, 2]
c <- contigency_table[2, 1]
d <- contigency_table[2, 2]
se <- sqrt((1/a + 1/c) - (1/(a+b) + 1/(c+d)))
return(se)
}
log_rr_se <- calculate_log_rr_se(contigency_table_ht)
# It's on the log scale, so we have to calculate the CI like this!
exp(log_rr_se * c(-1.96, 1.96) + log(point_estimate_rr))
[1] 0.9915692 3.9760368
vector
, we can get multiple statistics at once.get_smoking_stats <- function(resample) {
risk_smoke <- mean(resample$low[resample$smoke == 1])
risk_no_smoke <- mean(resample$low[resample$smoke == 0])
odds_smoke <- risk_smoke / (1 - risk_smoke)
odds_no_smoke <- risk_no_smoke / (1 - risk_no_smoke)
rd <- risk_smoke - risk_no_smoke
rr <- risk_smoke / risk_no_smoke
or <- odds_smoke / odds_no_smoke
out <- c(
"Risk difference" = rd,
"Risk ratio" = rr,
"Odds ratio" = or
)
return(out)
}
smoking_stats <- sapply(resamples, get_smoking_stats)
str(smoking_stats)
num [1:3, 1:1000] 0.2051 2.0082 2.7044 0.0224 1.0879 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:3] "Risk difference" "Risk ratio" "Odds ratio"
..$ : NULL
apply()
!point_estimates <- get_smoking_stats(birthwt)
boot_cis <- apply(smoking_stats, 1, \(x) quantile(x, c(0.025, 0.975)))
# Just some code to show everything neatly
rbind(
"Lower" = boot_cis[1, ],
"Point" = point_estimates,
"Upper" = boot_cis[2, ]
) |>
t() |>
round(digits = 2)
Lower | Point | Upper | |
---|---|---|---|
Risk difference | 0.01 | 0.15 | 0.30 |
Risk ratio | 1.04 | 1.61 | 2.57 |
Odds ratio | 1.05 | 2.02 | 4.18 |
boot
packageB
, a good rule of thumb is 1,000 just for you, 10,000 for your boss, and 100,000 for a paper if it’s feasible).boot
package.boot
packageboot
. It must have two arguments, the first is the data, and the second is a list of indices that are included in a resample.boot_smoking_stats <- function(data, idx) {
resample <- data[idx, ]
out <- get_smoking_stats(resample)
return(out)
}
library(boot)
bootstraps_smoking <- boot::boot(birthwt, boot_smoking_stats, R = 1000)
bootstraps_smoking
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot::boot(data = birthwt, statistic = boot_smoking_stats, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 0.1532315 0.0009951909 0.07034402
t2* 1.6076421 0.0444346127 0.35834007
t3* 2.0219436 0.1290172225 0.71556649
boot
.boot
, you must always manually set the index
argument.Warning in boot::boot.ci(bootstraps_smoking, index = 1): bootstrap variances
needed for studentized intervals
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot::boot.ci(boot.out = bootstraps_smoking, index = 1)
Intervals :
Level Normal Basic
95% ( 0.0144, 0.2901 ) ( 0.0200, 0.2876 )
Level Percentile BCa
95% ( 0.0189, 0.2864 ) ( 0.0153, 0.2834 )
Calculations and Intervals on Original Scale
smoking_point_estimates <- bootstraps_smoking$t0
smoking_cis_list <- lapply(
1:length(smoking_point_estimates),
\(i) boot::boot.ci(bootstraps_smoking, type = "bca", index = i)
)
str(smoking_cis_list, 1)
List of 3
$ :List of 4
..- attr(*, "class")= chr "bootci"
$ :List of 4
..- attr(*, "class")= chr "bootci"
$ :List of 4
..- attr(*, "class")= chr "bootci"
smoking_cis <- sapply(
smoking_cis_list,
\(x) x$bca[4:5]
)
# Same cleanup code as before
rbind(
"lower" = smoking_cis[1, ],
"point" = smoking_point_estimates,
"upper" = smoking_cis[2, ]
) |>
t() |>
round(digits = 2)
lower | point | upper | |
---|---|---|---|
Risk difference | 0.02 | 0.15 | 0.28 |
Risk ratio | 1.05 | 1.61 | 2.42 |
Odds ratio | 1.07 | 2.02 | 3.67 |
confint()
).(No solution typed up for this problem, if we have time/interest we can cover it together.)