optim()
.deSolve
.optim()
optim()
par
is a vector of your parameter values. You need to pick starting values for the algorithm, usually the choice isn’t super important. (It can be for complicated problems.)fn
is the function you want to minimize (or maximize with the fnscale = -1
control argument).method
is the optimizer to use, this is a really technical choice. I prefer BFGS
but Brent
is also good for 1 parameter problems and Nelder-Mead
is usually good for 2+ parameter problems if BFGS
isn’t working.control
has a lot of complicated options that you usually won’t need, fnscale = -1
is the main one to know about.optim()
lm()
and glm()
, etc., do all of that stuff for me? Why do I need to do it by hand?glm()
. But this won’t be true in our next example.dbinom(7, size = 20, prob = 1 - (1 - p)^10)
.optim()
to find the best value of \(p\).initial_guess <- 7 / 20
# You could write out the math for this function by hand if you wanted to
# But dbinom uses some nice tricks to be faster and more accurate.
nll <- function(p) {
nll <- -dbinom(7, size = 20, prob = 1 - (1 - p)^10, log = TRUE)
return(nll)
}
optim(
par = initial_guess,
fn = nll,
method = "L-BFGS-B", # Or "Brent"
lower = 0.001,
upper = 0.999
)
$par
[1] 0.04217128
$value
[1] 1.690642
$counts
function gradient
15 15
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
optim()
.hessian = TRUE
. (Math terms: the Hessian returned here is the negative Fisher information matrix. The standard errors of the parameters are the square roots of the diagnol of the inverse Fisher information matrix.)res <- optim(
par = 0.1,
fn = nll,
method = "L-BFGS-B",
lower = 0.00001,
upper = 0.99999,
hessian = TRUE
)
est <- res$par
se <- sqrt(diag(solve(res$hessian)))
# Approximate CI
ci_lower <- est - 1.96 * se
ci_upper <- est + 1.96 * se
c(ci_lower, ci_upper)
[1] 0.01137851 0.07296406
c(..., ...)
) and your optimization function should look like this.optim()
you need to make sure that par
, lower
, and upper
are all length-two vectors. You should still use the “L-BFGS-B” method for this problem.nll_zip <- function(par) {
p_infect <- par[1]
kappa <- par[2]
p_pos <- kappa * (1 - (1 - p_infect) ^ 10)
nll <- -dbinom(7, 20, prob = p_pos, log = TRUE)
return(nll)
}
initial_guess <- c(0.05, 0.5)
res <- optim(
par = initial_guess,
fn = nll_zip,
method = "L-BFGS-B",
lower = rep(0.001, 2),
upper = rep(0.999, 2),
hessian = TRUE
)
res
$par
[1] 0.1068171 0.5171114
$value
[1] 1.690642
$counts
function gradient
14 14
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
$hessian
[,1] [,2]
[1,] 307.8070 111.32366
[2,] 111.3237 40.27431
est <- res$par
se <- sqrt(diag(solve(res$hessian)))
# Approximate CI
ci_lower <- est - 1.96 * se
ci_upper <- est + 1.96 * se
cat(
paste0("Infection risk: ", round(est[[1]], 2), " (", round(ci_lower[[1]], 2),
", ", round(ci_upper[[1]], 2), ")"),
paste0("Risk WNV in area: ", round(est[[2]], 2), " (", round(ci_lower[[2]], 2),
", ", round(ci_upper[[2]], 2), ")"),
sep = "\n"
)
Infection risk: 0.11 (-6.31, 6.52)
Risk WNV in area: 0.52 (-17.22, 18.26)
The likelihood of a sample with multiple observations is the product of the likelihood for each datapoint. (Or, the negative log-likelihood of the sample is the sum of the negative log-likelihood of each datapoint.) Using the mos-nyc.csv
dataset, which describes the number of positive pools out of total pools collected for some counties in New York State, estimate the mosquito infection rate under both the simple and mixture models. Assume these samples use 50 mosquitoes per pool instead of 10.
For an extra super bonus problem (not solved here), figure out how to combine your optim()
code with what we learned about bootstrapping to compare the Wald-type CI from the Hessian with a bootstrap CI.
deSolve
# Parameters
omega <- 4 * pi # angular frequency = sqrt(k / m)
A <- -2 # initial displacement
B <- 1 / omega # initial velocity divided by omega
# Time vector
t <- seq(0, 2, by = 0.01) # 0 to 2 seconds
# Analytical solution
x <- A * cos(omega * t) + B * sin(omega * t)
# Plot
plot(t, x, type = "l", col = "blue", lwd = 2,
xlab = "Time (seconds)", ylab = "Position",
main = "Spring movement")
abline(h = 0, col = "gray", lty = 2)
deSolve
package. These are tools that use good approximations to get the solution.# We have to do some transformations to make this DE work with deSolve
# but you can ignore that for now or you can google "transform a second order
# DE into system of first order DEs" if you want.
library(deSolve)
# Function for our DE (system)
spring_ode <- function(t, state, parameters) {
with(as.list(c(state, parameters)), {
dx1 <- x2
dx2 <- -omega^2 * x1
list(c(dx1, dx2))
})
}
# Initial state: x(0) = -2, dx/dt(0) = 1
state <- c(x1 = -2, x2 = 1)
# Time points to solve for
times <- seq(0, 2, by = 0.01)
# Parameters list
parameters <- c(omega = omega)
# Solve ODE
out <- ode(y = state, times = times, func = spring_ode, parms = parameters)
str(out)
'deSolve' num [1:201, 1:3] 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "time" "x1" "x2"
- attr(*, "istate")= int [1:21] 2 222 449 NA 6 6 0 52 22 NA ...
- attr(*, "rstate")= num [1:5] 0.01 0.01 2.01 0 0
- attr(*, "lengthvar")= int 2
- attr(*, "type")= chr "lsoda"
# Convert to data frame
out_df <- as.data.frame(out)
# Plot displacement vs time
plot(out_df$time, out_df$x1, type = "l", col = "red", lwd = 3,
xlab = "Time (seconds)", ylab = "Position",
main = "Numerical Solution of spring movement")
lines(t, x, col = "blue", lty = 2, lwd = 2)
abline(h = 0, col = "gray", lty = 2)
Source: DOI 10.3390/sym14122583
\[ \begin{aligned} \frac{dS}{dt} &= -\beta I S \\ \frac{dI}{dt} &= \beta I S - \gamma I \\ \frac{dR}{dt} &= \gamma I \end{aligned} \]
with()
thing from before, but it does make your life easier so I recommend learning it.sir_ode <- function(t, y, parameters) {
# If you don't use with() you have to do this
b <- parameters["beta"]
g <- parameters["gamma"]
S <- y["S"]
I <- y["I"]
R <- y["R"]
dS <- -b * I * S
dI <- b * I * S - g * I
dR <- g * I
# For deSolve the output always has to be a list like this
out <- list(c(dS, dI, dR))
return(out)
}
deSolve
and get a solution.sir_soln <- deSolve::ode(
y = sir_initial_condition,
times = t_seq,
func = sir_ode,
parms = sir_parms
)
head(sir_soln)
time | S | I | R |
---|---|---|---|
0.00 | 1000.0000 | 1.000000 | 0.0000000 |
0.01 | 999.8951 | 1.099654 | 0.0052452 |
0.02 | 999.7798 | 1.209224 | 0.0110131 |
0.03 | 999.6529 | 1.329698 | 0.0173557 |
0.04 | 999.5135 | 1.462153 | 0.0243301 |
0.05 | 999.3602 | 1.607779 | 0.0319991 |
plot(
NULL, NULL,
xlim = c(0, 10), ylim = c(0, 1000),
xlab = "t", ylab = "n"
)
my_colors <- c("deepskyblue2", "darkorange", "mediumseagreen")
lines(sir_soln[,"time"], sir_soln[,"S"], col = my_colors[1], lwd = 2, lty = 2)
lines(sir_soln[,"time"], sir_soln[,"I"], col = my_colors[2], lwd = 2, lty = 2)
lines(sir_soln[,"time"], sir_soln[,"R"], col = my_colors[3], lwd = 2, lty = 2)
legend(
"topright",
legend = c("S", "I", "R"),
lty = 2, lwd = 2, col = my_colors
)
\[ \begin{aligned} \frac{dS}{dt} &= -\beta I S + \zeta R\\ \frac{dI}{dt} &= \beta I S - \gamma I \\ \frac{dR}{dt} &= \gamma I - \zeta R \end{aligned} \]
sirs_initial_condition <- sir_initial_condition
sirs_t <- seq(0, 1000, 0.01)
sirs_parms <- c(sir_parms, zeta = 0.25)
sirs_ode <- function(t, y, parameters) {
# If you don't use with() you have to do this
b <- parameters["beta"]
g <- parameters["gamma"]
z <- parameters["zeta"]
S <- y["S"]
I <- y["I"]
R <- y["R"]
dS <- -b * I * S + z * R
dI <- b * I * S - g * I
dR <- g * I - z * R
# For deSolve the output always has to be a list like this
out <- list(c(dS, dI, dR))
return(out)
}
sirs_soln <- deSolve::ode(
y = sirs_initial_condition,
times = sirs_t,
func = sirs_ode,
parms = sirs_parms
)
head(sirs_soln)
time | S | I | R |
---|---|---|---|
0.00 | 1000.0000 | 1.000000 | 0.0000000 |
0.01 | 999.8951 | 1.099654 | 0.0052388 |
0.02 | 999.7798 | 1.209224 | 0.0109865 |
0.03 | 999.6530 | 1.329698 | 0.0172938 |
0.04 | 999.5136 | 1.462153 | 0.0242165 |
0.05 | 999.3604 | 1.607779 | 0.0318156 |
plot(
NULL, NULL,
xlim = c(0, 10), ylim = c(0, 1000),
xlab = "t", ylab = "n",
main = "SIRS model solution"
)
my_colors <- c("deepskyblue2", "darkorange", "mediumseagreen")
lines(sirs_soln[,"time"], sirs_soln[,"S"], col = my_colors[1], lwd = 2, lty = 2)
lines(sirs_soln[,"time"], sirs_soln[,"I"], col = my_colors[2], lwd = 2, lty = 2)
lines(sirs_soln[,"time"], sirs_soln[,"R"], col = my_colors[3], lwd = 2, lty = 2)
legend(
"topright",
legend = c("S", "I", "R"),
lty = 2, lwd = 2, col = my_colors
)
- Here are the equations implied by the model. \[
\begin{aligned}
\frac{dS}{dt} &= n - b_{I}IS - b_{P}PS -mS \\
\frac{dI}{dt} &= b_{I}IS + b_{I}PS -gI - mI\\
\frac{dR}{dt} &= gI - mR \\
\frac{dP}{dt} &= qI - cP
\end{aligned}
\]
deSolve
AND optim
SIR-observed.csv
and we want to estimate parameters assuming an SIR model fits the best. Note that we typically only observe case counts in real data, so that’s what is in this data.day | total_cases |
---|---|
1 | 1 |
2 | 5 |
3 | 22 |
4 | 44 |
5 | 109 |
6 | 271 |
deSolve
run. The hard part is figuring out how to fit it to our data.sir_pred <- function(par) {
sir_parms <- c(beta = par[1], gamma = par[2])
sir_initial_condition <- c(S = 1000, I = 1, R = 0)
times <- seq(1, 100, 1)
sim <- deSolve::ode(
y = sir_initial_condition,
times = times,
fun = sir_ode,
parms = sir_parms
)
i_hat <- sim[, "I"]
fit_ssr <- sum((dat$total_cases - i_hat) ^ 2)
return(fit_ssr)
}
optim()
.res <- optim(
par = c(0.1, 0.1),
fn = sir_pred,
hessian = TRUE,
method = "L-BFGS-B",
lower = c(0, 0)
)
point_sir <- res$par
se_sir <- sqrt(diag(solve(res$hessian)))
ci_sir <- point_sir + outer(se_sir, c(-1.96, 1.96))
out <- cbind(point_sir, ci_sir) |> round(5)
rownames(out) <- c("beta", "gamma")
colnames(out) <- c("point", "lower", "upper")
out
point | lower | upper | |
---|---|---|---|
beta | 3.62103 | 3.47919 | 3.76288 |
gamma | 0.04126 | 0.04124 | 0.04128 |
sir_pred <- function(par) {
sir_parms <- c(beta = par[1], gamma = par[2])
sir_initial_condition <- c(S = 1000, I = 1, R = 0)
times <- seq(1, 100, 1)
# New part
# Return a huge number if the paramaters are negative
if (any(par <= 0)) return(1e12)
sim <- deSolve::ode(
y = sir_initial_condition,
times = times,
fun = sir_ode,
parms = sir_parms
)
i_hat <- sim[, "I"]
fit_ssr <- sum((dat$total_cases - i_hat) ^ 2)
return(fit_ssr)
}
set.seed(103)
res <- optim(
par = c(0.1, 0.1),
fn = sir_pred,
hessian = TRUE,
# Use simulated annealing, which can be better for these kind of problems.
method = "SANN"
)
point_sir <- res$par
se_sir <- sqrt(diag(solve(res$hessian)))
ci_sir <- point_sir + outer(se_sir, c(-1.96, 1.96))
out <- cbind(point_sir, ci_sir) |> round(5)
rownames(out) <- c("beta", "gamma")
colnames(out) <- c("point", "lower", "upper")
out
point | lower | upper | |
---|---|---|---|
beta | 0.00100 | 0.00100 | 0.00100 |
gamma | 0.05441 | 0.05438 | 0.05445 |