Introduction to the Tidyverse

Meeting Times:

  • Wednesday, July 17, 12:00 PM - 5:00 PM
  • Thursday July 18, 8:30 AM – 5:00 PM
  • Friday July 19, 8:30 AM – 5:00 PM

Classroom: TBA

Module Summary:
This module will cover the basics of data cleaning and wrangling in the Tidyverse, which is a collection of R packages used for data science that share “an underlying design philosophy, grammar, and data structures.” Infectious disease modeling uses more than just line list or time series data, and modelers are often working with multiple data sources of varying structure, quality, and data type.

This module will allow participants to gain hands-on coding experience in reshaping, summarizing, and manipulating data to allow for visualization, analysis, and modeling. Participants will require their own laptop with R and RStudio installed, as most of the module content is coding-based and interactive.

Prerequisites:

  • Familiar with base R; can read in data files, explore data, and make basic plots
  • Basic understanding of different data types in R
  • Aware of RStudio + RMarkdown

Module Content:

  • Using RStudio IDE and coding with RMarkdown
  • Tidyverse overview and code-along
  • Import and explore data
  • `tidyr` and `dplyr` verbs
  • Summary functions
  • Strings and factors
  • Date/time and joins
  • Manipulating across a dataset
  • Data viz with ggplot2
  • Applications to real-world data

Instructors

Sarah Bowden, PhD

Sarah Bowden, PhD

Dr. Sarah Bowden is a Data Scientist in the Division of Global Migration Health at CDC. She has been coding in R since 2007 and has enjoyed seeing the Tidyverse develop and grow over time. Dr. Bowden uses Tidyverse tools and best practices in her day-to-day coding activities and has trained and mentored 20+ undergraduate, graduate, and postdoctoral fellows in data science and public health analytics over the past 7 years.

Learn More >>

Reni Kaul, PhD

Reni Kaul, PhD

Professor, Department of Biostatistics & Bioinformatics, Emory University

Dr. Reni Kaul is a Prevention Effectiveness Fellow on the Analytics and Modeling Track in the Immunization Services Division at the CDC. She is a certified Carpentries Instructor and is committed to creating an inclusive learning environment. She has previously designed and taught coding courses in R for undergraduate and graduate students

Check out github for course examples

Robbie Richards, PhD

Robbie Richards, PhD

Dr. Robbie Richards is Teaching Faculty in the School of Biological Sciences at Georgia Tech. A quantitative ecologist by training, he now specializes in statistical and data science education for biologists. He has designed and led a variety of coding courses in R and the tidyverse for undergraduate and graduate students, including core Biostatistics courses in the BS Biology and MS Bioinformatics curricula at Tech.

Learn More

Required Software:
R + RStudio (to be downloaded and installed prior to the start of the course)
`tidyverse` package
`rmarkdown` package
`magrittr` package

Recommended Reading:
Tidyverse.org
R4DS