Introduction to the Tidyverse
Meeting Times:
- Wednesday, July 17, 1:30 PM - 5:00 PM
- Thursday July 18, 8:30 AM – 5:00 PM
- Friday July 19, 8:30 AM – 5:00 PM
Classroom: Randall Rollins Building (RR 205)
Module Summary:
This module will cover the basics of data cleaning and wrangling in the Tidyverse, which is a collection of R packages used for data science that share “an underlying design philosophy, grammar, and data structures.” Infectious disease modeling uses more than just line list or time series data, and modelers are often working with multiple data sources of varying structure, quality, and data type.
This module will allow participants to gain hands-on coding experience in reshaping, summarizing, and manipulating data to allow for visualization, analysis, and modeling. Participants will require their own laptop with R and RStudio installed, as most of the module content is coding-based and interactive.
Prerequisites:
- Familiar with base R; can read in data files, explore data, and make basic plots
- Basic understanding of different data types in R
- Aware of RStudio + RMarkdown
Module Content:
- Using RStudio IDE and coding with RMarkdown
- Tidyverse overview and code-along
- Import and explore data
- `tidyr` and `dplyr` verbs
- Summary functions
- Strings and factors
- Date/time and joins
- Manipulating across a dataset
- Data viz with ggplot2
- Applications to real-world data
Instructors
Sarah Bowden, PhD
Dr. Sarah Bowden is a Data Scientist in the Division of Global Migration Health at CDC. She has been coding in R since 2007 and has enjoyed seeing the Tidyverse develop and grow over time. Dr. Bowden uses Tidyverse tools and best practices in her day-to-day coding activities and has trained and mentored 20+ undergraduate, graduate, and postdoctoral fellows in data science and public health analytics over the past 7 years.
Reni Kaul, PhD
Professor, Department of Biostatistics & Bioinformatics, Emory University
Dr. Reni Kaul is a Prevention Effectiveness Fellow on the Analytics and Modeling Track in the Immunization Services Division at the CDC. She is a certified Carpentries Instructor and is committed to creating an inclusive learning environment. She has previously designed and taught coding courses in R for undergraduate and graduate students
Robbie Richards, PhD
Dr. Robbie Richards is Teaching Faculty in the School of Biological Sciences at Georgia Tech. A quantitative ecologist by training, he now specializes in statistical and data science education for biologists. He has designed and led a variety of coding courses in R and the tidyverse for undergraduate and graduate students, including core Biostatistics courses in the BS Biology and MS Bioinformatics curricula at Tech.
Required Software:
R + RStudio (to be downloaded and installed prior to the start of the course)
`tidyverse` package
`rmarkdown` package
`magrittr` package
Recommended Reading:
Tidyverse.org
R4DS