Exploring dplyr

I have been continuing my trial and improvement work with the tidyverse “an opinionated collection of R packages designed for data science” (link).

Today, I have been working through a dplyr vignette (link). I have been mindful for some time that this part of my R use needed significant improvement.

The vignette is really helpful and guided me through some fundamental procedures I should have known much earlier in my tidyverse use of data frames and tibbles (link).

The vignette points out that when working with data you must:

  • Figure out what you want to do.
  • Describe those tasks in the form of a computer program.
  • Execute the program.

The dplyr package:

  • Constrains options, and helps you think about data manipulation challenges.
  • It provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to translate your thoughts into code.
  • It uses efficient backends.

I have created a GitHub repository (link) to share this example. I have attached the csv file I used for the exercise. It is a file from the 2019 FIFA Women’s World Cup in France (link).

I enjoyed working through each of the basic verbs of data manipulation:

  • filter(): select cases based on their values.
  • arrange(): reorder the cases.
  • select() and rename(): select variables based on their names.
  • mutate() and transmute(): add new variables that are functions of existing variables.
  • summarise(): condense multiple values to a single value.

The syntax and function of all these verbs are very similar in dplyr:

  • The first argument is a data frame.
  • The subsequent arguments describe what to do with the data frame. You can refer to columns in the data frame directly without using $.
  • The result is a new data frame

Photo Credit

Opening Game (FIFA)

Final (FIFA)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.