Module 4: Functions

Learning Objectives

After module XX, you should be able to:

  • Name the parts of a function
  • Write a function
  • Use the R/RStudio debugging tools for functions

Writing your own functions

why create your own function?

  1. to cut down on repetitive coding
  2. to organize code into manageable chunks
  3. to avoid running code unintentionally
  4. to use names that make sense to you

Writing your own functions

Here we will write a function that multiplies some number (x) by 2:

times_2 <- function(x) x*2

When you run the line of code above, you make it ready to use (no output yet!) Let’s test it!

times_2(x=10)
[1] 20

Writing your own functions:

Adding the curly brackets - { } - allows you to use functions spanning multiple lines:

times_3 <- function(x) {
  x*3
}
times_3(x=10)
[1] 30

Writing your own functions: return

If we want something specific for the function’s output, we use return(). Note, if you want to return more than one object, you need to put it into a list using the list() function.

times_4 <- function(x) {
  output <- x * 4
  return(list(output, x))
}
times_4(x = 10)
[[1]]
[1] 40

[[2]]
[1] 10
  • R will “implicitly” return the last evaluated expression by default.
  • You should get in the habit of using “explicit” return statements so that you don’t get unexpected output.

Function Syntax

This is a brief introduction. The syntax is:

functionName = function(inputs) {
< function body >
return(list(value1, value2))
}

Note to create the function for use you need to

  1. Code/type the function
  2. Execute/run the lines of code

Only then will the function be available in the Environment pane and ready to use.

Writing your own functions: multiple arguments

Functions can take multiple arguments / inputs. Here the function has two arguments x and y

times_2_plus_y <- function(x, y) {
  out <- x * 2 + y
  return(out)
}
times_2_plus_y(x = 10, y = 3)
[1] 23

Writing your own functions: arugment defaults

Functions can have default arguments. This lets us use the function without specifying the arguments

times_2_plus_y <- function(x = 10, y = 3) {
  out <- x * 2 + y
  return(out)
}
times_2_plus_y()
[1] 23

We got an answer b/c we put defaults into the function arguments.

Writing a simple function

Let’s write a function, sqdif, that:

  1. takes two numbers x and y with default values of 2 and 3.
  2. takes the difference
  3. squares this difference
  4. then returns the final value
functionName = function(inputs) {
< function body >
return(list(value1, value2))
}
sqdif <- function(x=2,y=3){
     output <- (x-y)^2
     return(output)
}

sqdif()
[1] 1
sqdif(x=10,y=5)
[1] 25
sqdif(10,5)
[1] 25
  • Note that this function is implicitly vectorized even if we didn’t mean for it to be.
sqdif(c(2, 3, 4), c(7, 3.14, 98))
[1]   25.0000    0.0196 8836.0000

Writing your own functions: characters

Functions can have any kind of data type input. For example, here is a function with characters:

loud <- function(word) {
  output <- rep(toupper(word), 5)
  return(output)
}
loud(word = "hooray!")
[1] "HOORAY!" "HOORAY!" "HOORAY!" "HOORAY!" "HOORAY!"

The ... (elipsis) argument

  • What if we want our function to take in an arbitrary amount of numbers?
  • As an example, let’s consider a function that calculates the geometric mean.

\[ \text{geometric mean}(\mathbf{x}) = \sqrt[n]{x_1 \cdot x_2 \cdot \ldots \cdot x_n} = \exp\left( \frac{1}{n}\left(\log x_1 + \ldots + \log x_n \right) \right) \]

  • We can do this on a vector, of course.
geo_mean <- function(x) {
    output <- exp(mean(log(x)))
    return(output)
}

geo_mean(c(2, 3, 4, 5))
[1] 3.309751

The ... (elipsis) argument

  • But sometimes it’s easier to write this a different way.
geo_mean <- function(...) {
    x <- c(...)
    output <- exp(mean(log(x)))
    return(output)
}

geo_mean(2, 3, 4, 5)
[1] 3.309751
  • ... is a placeholder for as many arguments as you want.

... for pass-through

  • The most common use of ... is to “pass-through” arguments to underlying functions.

  • mean() has additional arguments, what if we want to use those in our geometric mean calculation?

geo_mean <- function(x, trim = 0, na.rm = FALSE) {
        output <- exp(mean(log(x), trim = trim, na.rm = na.rm))
    return(output)
}

Or we can do this an easier way.

Function debugging tools if you get stuck

Function instead of Loop

Now, let’s see how we can repeat our loop example from the measles dataset with a function instead.

Here we are going to set up a function that takes our data frame and outputs the median, first and third quartiles, and range of measles cases for a specified country.

get_country_stats <- function(df, iso3_code){
    
    country_data <- subset(df, iso3c == iso3_code)
    
    # Get the summary statistics for this country
    country_cases <- country_data$measles_cases
    country_quart <- quantile(
        country_cases, na.rm = TRUE, probs = c(0.25, 0.5, 0.75)
    )
    country_range <- range(country_cases, na.rm = TRUE)
    
    country_name <- unique(country_data$country)
    
    country_summary <- data.frame(
        country = country_name,
        min = country_range[[1]],
        Q1 = country_quart[[1]],
        median = country_quart[[2]],
        Q3 = country_quart[[3]],
        max = country_range[[2]]
    )
    
    return(country_summary)
}

Now we can use the function.

meas <- readRDS(here::here("data", "measles_final.Rds"))
get_country_stats(df=meas, iso3_code="IND")
  country  min      Q1  median       Q3    max
1   India 3305 31135.5 47109.5 80797.25 252940
get_country_stats(df=meas, iso3_code="PAK")
   country min     Q1 median    Q3   max
1 Pakistan 386 2065.5 4075.5 17422 55543

Summary

  • Simple functions take the form:
functionName = function(arguments) {
    < function body >
    return(list(outputs))
}
  • We can specify arguments defaults when you create the function

You try it!

Create your own function that calculates the incidence rate per \(K\) cases in a given country for all years where that country appeared (e.g. incidence per 10,000 people or another number \(K\)).

Step 1. Determine your arguments.

Step 2. Begin your function by subsetting the data to include only the country specified in the arguments (i.e, country_data).

Step 3. Get the years of interest (what errors might you run into here?).

Step 4. Calculate the incidence rate per \(K\) people in each year.

Step 5. Return the output neatly.

Bonus exercise: allow the country code argument to be optional, or add a new argument that allows the user to only calculate the incidence for certain years.

Mini Exercise Answer

get_incidence <- function(df, iso3_code, K = 10000){
    
    country_data <- subset(df, iso3c == iso3_code)
    
    incidence <- country_data$measles_cases / country_data$population * K
    names(incidence) <- country_data$year
    
    return(incidence)
}

get_incidence(df=meas, iso3_code="IND")
      2023       2022       2021       2020       2019       2018       2017 
0.45603205 0.28907547 0.04049550 0.04013214 0.07540965 0.14224947 0.08884979 
      2016       2015       2014       2013       2012       2011       2010 
0.12886248 0.22805022 0.20294566 0.06416849 0.02593200 0.26744142 0.25356807 
      2009       2008       2007       2006       2005       2004       2003 
0.45918728 0.36675830 0.34583747 0.54747898 0.31794361 0.48794093 0.42192914 
      2002       2001       2000       1999       1998       1997       1996 
0.36459551 0.47990173 0.36649458 0.20195097 0.33276727 0.60861874 0.47872368 
      1995       1994       1993       1992       1991       1990       1989 
0.38882932 0.53443386 0.54397290 0.79109798 0.87586166 1.02948793 1.94564007 
      1988       1987       1986       1985       1984       1983       1982 
1.93562738 3.10083364 1.94360299 2.06623051 2.50206072 1.73819235 2.00496602 
      1981       1980 
2.83437652 1.63650050 
get_incidence(df=meas, iso3_code="PAK")
      2023       2022       2021       2020       2019       2018       2017 
0.73692544 0.35526364 0.44939088 0.12090842 0.09252406 1.50215163 0.42402323 
      2016       2015       2014       2013       2012       2011       2010 
0.12658949 0.01829650 0.06578580 0.42607889 0.39791131 0.22084288 0.22221137 
      2009       2008       2007       2006       2005       2004       2003 
0.04539161 0.06072114 0.15396495 0.42910095 0.17095625 0.24893257 0.28404208 
      2002       2001       2000       1999       1998       1997       1996 
0.23906241 0.24174444 0.13370480 0.19640005 0.16036998 0.13075755 0.07942591 
      1995       1994       1993       1992       1991       1990       1989 
0.12920918 0.10994611 0.15667487 0.24245113 0.05176020 1.88755151 0.21035120 
      1988       1987       1986       1985       1984       1983       1982 
5.14440235 4.41204007 4.20439485 2.74769085 1.84269015 2.30609510 2.26464854 
      1981       1980 
3.44249798 3.54397944 

Sidenote: combining functions and loops

  • Writing functions is a way to make complex code a lot easier to understand.
  • These two loops do exactly the same thing, but the one with the function is much easier to read for most people.
my_countries <- c("IND", "PAK", "BGD", "NPL", "BTN")
n <- length(my_countries)
res <- vector(mode = "list", length = n)
for (i in 1:n) {
    this_country <- my_countries[[i]]
    country_data <- subset(meas, iso3c == this_country)
    incidence <- country_data$measles_cases / country_data$population * 10000
    names(incidence) <- country_data$year
    res[[i]] <- incidence
}
str(res)
List of 5
 $ : Named num [1:44] 0.456 0.2891 0.0405 0.0401 0.0754 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.7369 0.3553 0.4494 0.1209 0.0925 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.0162 0.0182 0.012 0.1439 0.3521 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.3117 0.0426 0.0476 0.1322 0.1491 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.127 0.0895 NA NA 0.0261 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
for (i in 1:n) {
    res[[i]] <- get_incidence(meas, my_countries[[i]], 10000)
}
str(res)
List of 5
 $ : Named num [1:44] 0.456 0.2891 0.0405 0.0401 0.0754 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.7369 0.3553 0.4494 0.1209 0.0925 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.0162 0.0182 0.012 0.1439 0.3521 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.3117 0.0426 0.0476 0.1322 0.1491 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...
 $ : Named num [1:44] 0.127 0.0895 NA NA 0.0261 ...
  ..- attr(*, "names")= chr [1:44] "2023" "2022" "2021" "2020" ...