In this lab, we’ll analyze the QCRC_FINAL_Deidentified.xlsx dataset to practice:
Two-sample t-test
Chi-square test
Linear regression
Logistic regression
Writing if statements
Iteration using for loops and apply
Let’s get started!
Part 1 - Loading Data
First, load the required packages and read the Excel file.
Code
# Install if not already installed \# install.packages("readxl")library(readxl)# Adjust the path if neededdata <-read_excel(here::here("data/QCRC_FINAL_Deidentified.xlsx"))# View first few rowshead(data)
Patient_DEID Decatur_Transfer Age Female
Min. : 17205428 Length:288 Min. :19.00 Length:288
1st Qu.: 19933699 Class :character 1st Qu.:55.00 Class :character
Median : 75429122 Mode :character Median :65.00 Mode :character
Mean : 64011213 Mean :63.47
3rd Qu.:101302250 3rd Qu.:74.25
Max. :104729011 Max. :89.00
Race Died 30D_Mortality 60D_Mortality
Length:288 Min. :0.0000 Min. :0.0000 Min. :0.0000
Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Mode :character Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.3403 Mean :0.3264 Mean :0.3333
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000
Death date_deid ICU_LOS BMI Died in ICU
Length:288 Min. : 0.000 Length:288 Min. :0.0000
Class :character 1st Qu.: 3.428 Class :character 1st Qu.:0.0000
Mode :character Median : 9.000 Mode :character Median :0.0000
Mean :10.527 Mean :0.2674
3rd Qu.:15.000 3rd Qu.:1.0000
Max. :56.000 Max. :1.0000
Code status Epoprostenol Remdesivir_or_placebo Intubated
Length:288 Min. :0.0000 Min. :0.0000 Min. :0.0000
Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Mode :character Median :0.0000 Median :0.0000 Median :1.0000
Mean :0.1007 Mean :0.1701 Mean :0.7361
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000
ICU_ADMIT_DEID ICU_DC_DEID
Min. :2020-02-21 17:27:16 Min. :2020-02-22 21:34:23
1st Qu.:2020-03-29 06:25:12 1st Qu.:2020-04-04 16:43:58
Median :2020-04-05 21:36:22 Median :2020-04-16 00:44:00
Mean :2020-04-08 21:22:42 Mean :2020-04-17 20:43:44
3rd Qu.:2020-04-18 00:48:11 3rd Qu.:2020-04-28 18:41:36
Max. :2020-05-11 06:55:41 Max. :2020-06-09 17:29:00
HOSPITAL_ADMIT_DEID HOSPITAL_DC_DEID Vent LOS
Min. :2020-02-21 04:48:00 Min. :2020-03-13 04:48:00 Length:288
1st Qu.:2020-03-27 04:48:00 1st Qu.:2020-04-10 04:48:00 Class :character
Median :2020-04-03 04:48:00 Median :2020-04-24 04:48:00 Mode :character
Mean :2020-04-06 18:58:00 Mean :2020-04-24 21:43:00
3rd Qu.:2020-04-16 04:48:00 3rd Qu.:2020-05-06 04:48:00
Max. :2020-05-10 04:48:00 Max. :2020-06-09 04:48:00
CRRT CRRT days HD HD days
Min. :0.0000 Length:288 Min. :0.0000 Length:288
1st Qu.:0.0000 Class :character 1st Qu.:0.0000 Class :character
Median :0.0000 Mode :character Median :0.0000 Mode :character
Mean :0.2292 Mean :0.1354
3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000
Pressor >2 hours Pressor days Absolute lymphocyte WBC
Min. :0.0000 Length:288 Length:288 Length:288
1st Qu.:0.0000 Class :character Class :character Class :character
Median :1.0000 Mode :character Mode :character Mode :character
Mean :0.6354
3rd Qu.:1.0000
Max. :1.0000
D dimer CRP Peak troponin
Length:288 Length:288 Length:288
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
P:F ratio at ICU admission P:F ratio at first intubation Calculated_Cstat
Length:288 Length:288 Length:288
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
First_Intubation_deid Last_Intubation_deid
Length:288 Length:288
Class :character Class :character
Mode :character Mode :character
Part 2 - Two-Sample t-test
Suppose we want to compare the mean Age of patients who were Intubated vs. those who were not.
Code
# Convert Intubated to a factor for claritydata$Intubated <-factor(data$Intubated, labels =c("No", "Yes"))# Check group meanstapply(data$Age, data$Intubated, mean, na.rm =TRUE)
No Yes
65.07895 62.89151
Code
# Run t-testt.test(Age ~ Intubated, data = data)
Welch Two Sample t-test
data: Age by Intubated
t = 0.95244, df = 111.54, p-value = 0.3429
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
-2.363342 6.738218
sample estimates:
mean in group No mean in group Yes
65.07895 62.89151
Exercise
Run the above code.
Now insert a new code chunk below and write the code to test whether BMI differs between patients who were Intubated vs. those who were not.
Part 3 - Chi-square Test
Let’s check if CRRT use is associated with mortality (Died).
Code
# Convert to factorsdata$CRRT <-factor(data$CRRT, labels =c("No", "Yes"))data$Died <-factor(data$Died, labels =c("No", "Yes"))# Create a contingency tabletab <-table(data$CRRT, data$Died)tab
No Yes
No 165 57
Yes 25 41
Code
# Perform chi-square testchisq.test(tab)
Pearson's Chi-squared test with Yates' continuity correction
data: tab
X-squared = 28.501, df = 1, p-value = 9.367e-08
Exercise
Run the test above.
Now insert a new code chunk below and write the code to repeat the chi-square test for Intubated vs. Died.
Part 4 - Linear Regression
We’ll model ICU_LOS as a function of Age.
Code
# Fit linear modellm_model <-lm(ICU_LOS ~ Age, data = data)# View resultssummary(lm_model)
Call:
lm(formula = ICU_LOS ~ Age, data = data)
Residuals:
Min 1Q Median 3Q Max
-10.747 -6.982 -1.579 4.437 45.348
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.979425 2.183423 4.571 7.25e-06 ***
Age 0.008621 0.033422 0.258 0.797
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.778 on 286 degrees of freedom
Multiple R-squared: 0.0002326, Adjusted R-squared: -0.003263
F-statistic: 0.06653 on 1 and 286 DF, p-value: 0.7966
Exercise
Run the code above.
Now, add a chuck and write the code to fit a model predicting Vent LOS using BMI adjust the analysis to account for Age by adding in an additional covariate.
Part 5 - Logistic Regression
Let’s predict the odds of dying (Died) based on Age and ICU_LOS.
Code
# Fit logistic regression (notice the family)log_model <-glm(Died ~ Age + ICU_LOS, data = data, family = binomial)# View summarysummary(log_model)
Call:
glm(formula = Died ~ Age + ICU_LOS, family = binomial, data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.930819 0.695340 -5.653 1.58e-08 ***
Age 0.049605 0.009818 5.052 4.36e-07 ***
ICU_LOS 0.002475 0.014708 0.168 0.866
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 369.34 on 287 degrees of freedom
Residual deviance: 338.84 on 285 degrees of freedom
AIC: 344.84
Number of Fisher Scoring iterations: 4
Code
# View odss ratiosexp(coef(log_model))
(Intercept) Age ICU_LOS
0.01962759 1.05085602 1.00247765
Exercise
Run the model above.
Now, insert a new chunk ande try predicting Intubated using BMI and Age.
Part 6 - If Statements
if lets you run code when a condition is true.
Example:
Code
# Example using the first patientpatient_age <- data$Age[1]if (patient_age >45) {print("Patient is older than 45.")} else {print("Patient is 45 or younger.")}
[1] "Patient is older than 45."
You can also use else if:
Code
# Example with more categoriesif (patient_age >65) {print("Patient is elderly.")} elseif (patient_age >45) {print("Patient is middle-aged.")} else {print("Patient is young.")}
[1] "Patient is elderly."
Exercise
Write an if-else statement that tests if a patient’s age is greater than 45 and prints:
“Patient is older than 45.” if true
Otherwise prints “Patient is 45 or younger.”
Code
age_value <-72if (age_value >65) {print("Patient is elderly")} else {print("Patient is not elderly")}
---title: "Intermediate R Lab using QCRC Data"format: htmleditor: visual---# IntroductionIn this lab, we’ll analyze the **QCRC_FINAL_Deidentified.xlsx** dataset to practice:- Two-sample t-test- Chi-square test- Linear regression- Logistic regression- Writing `if` statements- Iteration using `for` loops and `apply`Let’s get started!------------------------------------------------------------------------# Part 1 - Loading DataFirst, load the required packages and read the Excel file.```{r}# Install if not already installed \# install.packages("readxl")library(readxl)# Adjust the path if neededdata <-read_excel(here::here("data/QCRC_FINAL_Deidentified.xlsx"))# View first few rowshead(data)# View variable namesnames(data)# Quick summarysummary(data)```# Part 2 - Two-Sample t-testSuppose we want to compare the mean Age of patients who were Intubated vs. those who were not.```{r}# Convert Intubated to a factor for claritydata$Intubated <-factor(data$Intubated, labels =c("No", "Yes"))# Check group meanstapply(data$Age, data$Intubated, mean, na.rm =TRUE)# Run t-testt.test(Age ~ Intubated, data = data)```# Exercise- Run the above code.- Now insert a new code chunk below and write the code to test whether BMI differs between patients who were Intubated vs. those who were not.# Part 3 - Chi-square TestLet’s check if `CRRT` use is associated with mortality (`Died`).```{r}# Convert to factorsdata$CRRT <-factor(data$CRRT, labels =c("No", "Yes"))data$Died <-factor(data$Died, labels =c("No", "Yes"))# Create a contingency tabletab <-table(data$CRRT, data$Died)tab# Perform chi-square testchisq.test(tab)```## Exercise- Run the test above.- Now insert a new code chunk below and write the code to repeat the chi-square test for `Intubated` vs. `Died`.# Part 4 - Linear RegressionWe’ll model `ICU_LOS` as a function of `Age`.```{r}# Fit linear modellm_model <-lm(ICU_LOS ~ Age, data = data)# View resultssummary(lm_model)```## Exercise- Run the code above.- Now, add a chuck and write the code to fit a model predicting `Vent LOS` using `BMI` adjust the analysis to account for `Age` by adding in an additional covariate.# Part 5 - Logistic RegressionLet’s predict the odds of dying (`Died`) based on `Age` and `ICU_LOS`.```{r}# Fit logistic regression (notice the family)log_model <-glm(Died ~ Age + ICU_LOS, data = data, family = binomial)# View summarysummary(log_model)# View odss ratiosexp(coef(log_model))```## Exercise- Run the model above.- Now, insert a new chunk ande try predicting `Intubated` using `BMI` and `Age`.# Part 6 - If Statements`if` lets you run code when a condition is true.Example:```{r}# Example using the first patientpatient_age <- data$Age[1]if (patient_age >45) {print("Patient is older than 45.")} else {print("Patient is 45 or younger.")}```You can also use `else if`:```{r}# Example with more categoriesif (patient_age >65) {print("Patient is elderly.")} elseif (patient_age >45) {print("Patient is middle-aged.")} else {print("Patient is young.")}```## Exercise- Write an if-else statement that tests if a patient’s age is greater than 45 and prints: - “Patient is older than 45.” if true - Otherwise prints “Patient is 45 or younger.”```{r}age_value <-72if (age_value >65) {print("Patient is elderly")} else {print("Patient is not elderly")}```# Great work! You have practiced:- Hypothesis testing- Regression models- Conditional logic