Intermediate R Lab using QCRC Data

Introduction

In this lab, we’ll analyze the QCRC_FINAL_Deidentified.xlsx dataset to practice:

  • Two-sample t-test
  • Chi-square test
  • Linear regression
  • Logistic regression
  • Writing if statements
  • Iteration using for loops and apply

Let’s get started!


Part 1 - Loading Data

First, load the required packages and read the Excel file.

Code
# Install if not already installed \# install.packages("readxl")

library(readxl)

# Adjust the path if needed

data <- read_excel(here::here("data/QCRC_FINAL_Deidentified.xlsx"))

# View first few rows

head(data)
# A tibble: 6 × 37
  Patient_DEID Decatur_Transfer   Age Female Race           Died `30D_Mortality`
         <dbl> <chr>            <dbl> <chr>  <chr>         <dbl>           <dbl>
1     17205428 0                   89 Female Caucasian or…     1               1
2     17234405 .                   73 Female African Amer…     0               0
3     17239956 0                   53 Male   Caucasian or…     0               0
4     17350926 .                   55 Male   African Amer…     1               1
5     17377951 0                   66 Female African Amer…     1               1
6     17397040 0                   78 Male   African Amer…     1               1
# ℹ 30 more variables: `60D_Mortality` <dbl>, `Death date_deid` <chr>,
#   ICU_LOS <dbl>, BMI <chr>, `Died in ICU` <dbl>, `Code status` <chr>,
#   Epoprostenol <dbl>, Remdesivir_or_placebo <dbl>, Intubated <dbl>,
#   ICU_ADMIT_DEID <dttm>, ICU_DC_DEID <dttm>, HOSPITAL_ADMIT_DEID <dttm>,
#   HOSPITAL_DC_DEID <dttm>, `Vent LOS` <chr>, CRRT <dbl>, `CRRT days` <chr>,
#   HD <dbl>, `HD days` <chr>, `Pressor >2 hours` <dbl>, `Pressor days` <chr>,
#   `Absolute lymphocyte` <chr>, WBC <chr>, `D dimer` <chr>, CRP <chr>, …
Code
# View variable names

names(data)
 [1] "Patient_DEID"                  "Decatur_Transfer"             
 [3] "Age"                           "Female"                       
 [5] "Race"                          "Died"                         
 [7] "30D_Mortality"                 "60D_Mortality"                
 [9] "Death date_deid"               "ICU_LOS"                      
[11] "BMI"                           "Died in ICU"                  
[13] "Code status"                   "Epoprostenol"                 
[15] "Remdesivir_or_placebo"         "Intubated"                    
[17] "ICU_ADMIT_DEID"                "ICU_DC_DEID"                  
[19] "HOSPITAL_ADMIT_DEID"           "HOSPITAL_DC_DEID"             
[21] "Vent LOS"                      "CRRT"                         
[23] "CRRT days"                     "HD"                           
[25] "HD days"                       "Pressor >2 hours"             
[27] "Pressor days"                  "Absolute lymphocyte"          
[29] "WBC"                           "D dimer"                      
[31] "CRP"                           "Peak troponin"                
[33] "P:F ratio at ICU admission"    "P:F ratio at first intubation"
[35] "Calculated_Cstat"              "First_Intubation_deid"        
[37] "Last_Intubation_deid"         
Code
# Quick summary

summary(data)
  Patient_DEID       Decatur_Transfer        Age           Female         
 Min.   : 17205428   Length:288         Min.   :19.00   Length:288        
 1st Qu.: 19933699   Class :character   1st Qu.:55.00   Class :character  
 Median : 75429122   Mode  :character   Median :65.00   Mode  :character  
 Mean   : 64011213                      Mean   :63.47                     
 3rd Qu.:101302250                      3rd Qu.:74.25                     
 Max.   :104729011                      Max.   :89.00                     
     Race                Died        30D_Mortality    60D_Mortality   
 Length:288         Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 Class :character   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Mode  :character   Median :0.0000   Median :0.0000   Median :0.0000  
                    Mean   :0.3403   Mean   :0.3264   Mean   :0.3333  
                    3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
                    Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
 Death date_deid       ICU_LOS           BMI             Died in ICU    
 Length:288         Min.   : 0.000   Length:288         Min.   :0.0000  
 Class :character   1st Qu.: 3.428   Class :character   1st Qu.:0.0000  
 Mode  :character   Median : 9.000   Mode  :character   Median :0.0000  
                    Mean   :10.527                      Mean   :0.2674  
                    3rd Qu.:15.000                      3rd Qu.:1.0000  
                    Max.   :56.000                      Max.   :1.0000  
 Code status         Epoprostenol    Remdesivir_or_placebo   Intubated     
 Length:288         Min.   :0.0000   Min.   :0.0000        Min.   :0.0000  
 Class :character   1st Qu.:0.0000   1st Qu.:0.0000        1st Qu.:0.0000  
 Mode  :character   Median :0.0000   Median :0.0000        Median :1.0000  
                    Mean   :0.1007   Mean   :0.1701        Mean   :0.7361  
                    3rd Qu.:0.0000   3rd Qu.:0.0000        3rd Qu.:1.0000  
                    Max.   :1.0000   Max.   :1.0000        Max.   :1.0000  
 ICU_ADMIT_DEID                 ICU_DC_DEID                 
 Min.   :2020-02-21 17:27:16   Min.   :2020-02-22 21:34:23  
 1st Qu.:2020-03-29 06:25:12   1st Qu.:2020-04-04 16:43:58  
 Median :2020-04-05 21:36:22   Median :2020-04-16 00:44:00  
 Mean   :2020-04-08 21:22:42   Mean   :2020-04-17 20:43:44  
 3rd Qu.:2020-04-18 00:48:11   3rd Qu.:2020-04-28 18:41:36  
 Max.   :2020-05-11 06:55:41   Max.   :2020-06-09 17:29:00  
 HOSPITAL_ADMIT_DEID           HOSPITAL_DC_DEID                Vent LOS        
 Min.   :2020-02-21 04:48:00   Min.   :2020-03-13 04:48:00   Length:288        
 1st Qu.:2020-03-27 04:48:00   1st Qu.:2020-04-10 04:48:00   Class :character  
 Median :2020-04-03 04:48:00   Median :2020-04-24 04:48:00   Mode  :character  
 Mean   :2020-04-06 18:58:00   Mean   :2020-04-24 21:43:00                     
 3rd Qu.:2020-04-16 04:48:00   3rd Qu.:2020-05-06 04:48:00                     
 Max.   :2020-05-10 04:48:00   Max.   :2020-06-09 04:48:00                     
      CRRT         CRRT days               HD           HD days         
 Min.   :0.0000   Length:288         Min.   :0.0000   Length:288        
 1st Qu.:0.0000   Class :character   1st Qu.:0.0000   Class :character  
 Median :0.0000   Mode  :character   Median :0.0000   Mode  :character  
 Mean   :0.2292                      Mean   :0.1354                     
 3rd Qu.:0.0000                      3rd Qu.:0.0000                     
 Max.   :1.0000                      Max.   :1.0000                     
 Pressor >2 hours Pressor days       Absolute lymphocyte     WBC           
 Min.   :0.0000   Length:288         Length:288          Length:288        
 1st Qu.:0.0000   Class :character   Class :character    Class :character  
 Median :1.0000   Mode  :character   Mode  :character    Mode  :character  
 Mean   :0.6354                                                            
 3rd Qu.:1.0000                                                            
 Max.   :1.0000                                                            
   D dimer              CRP            Peak troponin     
 Length:288         Length:288         Length:288        
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
 P:F ratio at ICU admission P:F ratio at first intubation Calculated_Cstat  
 Length:288                 Length:288                    Length:288        
 Class :character           Class :character              Class :character  
 Mode  :character           Mode  :character              Mode  :character  
                                                                            
                                                                            
                                                                            
 First_Intubation_deid Last_Intubation_deid
 Length:288            Length:288          
 Class :character      Class :character    
 Mode  :character      Mode  :character    
                                           
                                           
                                           

Part 2 - Two-Sample t-test

Suppose we want to compare the mean Age of patients who were Intubated vs. those who were not.

Code
# Convert Intubated to a factor for clarity
data$Intubated <- factor(data$Intubated, labels = c("No", "Yes"))

# Check group means
tapply(data$Age, data$Intubated, mean, na.rm = TRUE)
      No      Yes 
65.07895 62.89151 
Code
# Run t-test
t.test(Age ~ Intubated, data = data)

    Welch Two Sample t-test

data:  Age by Intubated
t = 0.95244, df = 111.54, p-value = 0.3429
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 -2.363342  6.738218
sample estimates:
 mean in group No mean in group Yes 
         65.07895          62.89151 

Exercise

  • Run the above code.

  • Now insert a new code chunk below and write the code to test whether BMI differs between patients who were Intubated vs. those who were not.

Part 3 - Chi-square Test

Let’s check if CRRT use is associated with mortality (Died).

Code
# Convert to factors
data$CRRT <- factor(data$CRRT, labels = c("No", "Yes"))
data$Died <- factor(data$Died, labels = c("No", "Yes"))

# Create a contingency table
tab <- table(data$CRRT, data$Died)
tab
     
       No Yes
  No  165  57
  Yes  25  41
Code
# Perform chi-square test
chisq.test(tab)

    Pearson's Chi-squared test with Yates' continuity correction

data:  tab
X-squared = 28.501, df = 1, p-value = 9.367e-08

Exercise

  • Run the test above.

  • Now insert a new code chunk below and write the code to repeat the chi-square test for Intubated vs. Died.

Part 4 - Linear Regression

We’ll model ICU_LOS as a function of Age.

Code
# Fit linear model
lm_model <- lm(ICU_LOS ~ Age, data = data)

# View results
summary(lm_model)

Call:
lm(formula = ICU_LOS ~ Age, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.747  -6.982  -1.579   4.437  45.348 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 9.979425   2.183423   4.571 7.25e-06 ***
Age         0.008621   0.033422   0.258    0.797    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.778 on 286 degrees of freedom
Multiple R-squared:  0.0002326, Adjusted R-squared:  -0.003263 
F-statistic: 0.06653 on 1 and 286 DF,  p-value: 0.7966

Exercise

  • Run the code above.

  • Now, add a chuck and write the code to fit a model predicting Vent LOS using BMI adjust the analysis to account for Age by adding in an additional covariate.

Part 5 - Logistic Regression

Let’s predict the odds of dying (Died) based on Age and ICU_LOS.

Code
# Fit logistic regression (notice the family)
log_model <- glm(Died ~ Age + ICU_LOS, data = data, family = binomial)

# View summary
summary(log_model)

Call:
glm(formula = Died ~ Age + ICU_LOS, family = binomial, data = data)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.930819   0.695340  -5.653 1.58e-08 ***
Age          0.049605   0.009818   5.052 4.36e-07 ***
ICU_LOS      0.002475   0.014708   0.168    0.866    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 369.34  on 287  degrees of freedom
Residual deviance: 338.84  on 285  degrees of freedom
AIC: 344.84

Number of Fisher Scoring iterations: 4
Code
# View odss ratios
exp(coef(log_model))
(Intercept)         Age     ICU_LOS 
 0.01962759  1.05085602  1.00247765 

Exercise

  • Run the model above.

  • Now, insert a new chunk ande try predicting Intubated using BMI and Age.

Part 6 - If Statements

if lets you run code when a condition is true.

Example:

Code
# Example using the first patient
patient_age <- data$Age[1]

if (patient_age > 45) {
  print("Patient is older than 45.")
} else {
  print("Patient is 45 or younger.")
}
[1] "Patient is older than 45."

You can also use else if:

Code
# Example with more categories
if (patient_age > 65) {
  print("Patient is elderly.")
} else if (patient_age > 45) {
  print("Patient is middle-aged.")
} else {
  print("Patient is young.")
}
[1] "Patient is elderly."

Exercise

  • Write an if-else statement that tests if a patient’s age is greater than 45 and prints:

    • “Patient is older than 45.” if true

    • Otherwise prints “Patient is 45 or younger.”

Code
age_value <- 72

if (age_value > 65) {
  print("Patient is elderly")
} else {
  print("Patient is not elderly")
}
[1] "Patient is elderly"

Great work! You have practiced:

  • Hypothesis testing

  • Regression models

  • Conditional logic