Column-wise and row-wise operations in dplyr and tidyverse

3 minute read

Published:

With the development of dplyr or its umbrella package tidyverse, it becomes quite straightforward to perform operations over columns or rows in R. These column- or row-wise methods can also be directly integrated with other dplyr verbs like select, mutate, filter and summarise, making them more comparable with other functions in apply or map families. In this blog, I will briefly cover some useful manipulations over rows or columns. row- and column-wise operations

1. Column-wise operation

Example 1: select those string columns with less than 5 levels in the dataset of starwars.

# anonymous function
starwars %>% 
  select_if(~ is.character(.x) & length(unique(.x)) <= 5)

We can use a convenient function select_if to identify certain columns by multiple criteria. This is essentially equivalent to the following expanded format.

# expanded form
starwars %>%
  select_if(function(x) is.character(x) & length(unique(x)) <= 5)

It is worth noting that we are using tilde ($\sim$) to define an anonymous function, and thus we should use $.x$ to refer to the selected columns. See this link for detailed illustration of tilde ($\sim$), dot ($.$) and dot x ($.x$) in dplyr. If there is only one single argument in the anonymous function (in most cases), you can use dot (.) or placeholder (…1) as alternatives. In other words, the dot (.), placeholder (…1) and dot x (.x) are equivalents in a single argument anonymous function. For details, see the underlying function specifications viaas_mapper.

as_mapper(~ length(unique(.x)))
# <lambda>
# function (..., .x = ..1, .y = ..2, . = ..1) 
# length(unique(.x))
# attr(,"class")
# [1] "rlang_lambda_function" "function" 

If you want to calculate the levels of those selected columns, you can try across function and summarise the number of levels by column.

# solution 1
starwars %>% 
  summarise(across(where(is.character), ~ length(unique(.x))))

# solution 2 (recommended)
starwars %>% 
  summarise_if(is.character, ~ length(unique(.x)))

Alternatively, you can make use of the map or map_dbl function in purrr by the following command. Note that when a map function is applied to a data.frame, it will operate over columns by default.

# solution 3
starwars %>%
  select_if(~ is.character(.x)) %>%
  map_dbl(~length(unique(.x)))

Example 2: select those numeric columns and calculate the means and sds across columns in the dataset of starwars.

# solution 1
starwars %>% 
  summarise(across(where(~ is.numeric(.x)), 
                   list(Mean = ~ mean(.x, na.rm = T), 
                        Sd = ~ sd(.x, na.rm = T))))

This example provides us a good illustration of the use of dot x (.x) in dplyr style syntax, since we have some missing values (NAs) in certain columns. Thus, we need to specify the parameter with na.rm = T inside the functions.

There is indeed a more convenient and elegant way of solving this by using the function of summarise_if. It allows us to select certain columns and operate by columns like this:

# solution 2 (recommended)
starwars %>%
  summarise_if(is.numeric,
               list(Sum = sum, Mean = mean, Sd = sd),
               na.rm = T)

If you are more comfortable with the map function in purrr, you can resolve this issue in a different way. Note: we are mapping three functions to a data.frame, and combining them as named vectors. This will return a list of named vectors.

# solution 3
starwars %>% 
  select_if(is.numeric) %>% 
  map(., ~ c(Sum = sum(.x, na.rm = T), 
             Mean = mean(.x, na.rm = T),
             Sd = sd(.x, na.rm = T)))

2. Row-wise operation

Example 3: calculate the sums, means and sds for each row for the dataset of iris.

iris %>% 
  rowwise() %>%
  mutate(
    Rowsum = sum(c_across(Sepal.Length:Petal.Width)),
    Rowsd = sd(c_across(Sepal.Length:Petal.Width)), 
    Rowmean = mean(c_across(Sepal.Length:Petal.Width))
 ) %>% 
  ungroup()

Here the function c_across is specifically designed to work with rowwise operations. Note: rowwise groups your data by row (class: rowwise_df), and it is best to ungroup immediately. Of course, if you are more comfortable with apply function, you can also use the following command.

iris %>% 
  select(Sepal.Length:Petal.Width) %>% 
  apply(., 1, function(x) c(sum(x), sd(x), mean(x))) %>% 
  as.tibble() %>% t()

Useful links