The Hitchhiker’s Guide to Reproducible Research

Meeting Times:

  • Monday, July 22, 8:30 AM – 5:00 PM
  • Tuesday July 23, 8:30 AM – 5:00 PM
  • Wednesday July 24, 8:30 AM – 12:00 PM

Classroom: TBA

Module Summary:
Are you tired of spending time copying and pasting analytic results into Microsoft Word? Dreading emails asking for updated analyses in unrealistic time frames? Concerned about the accuracy of your soon-to-be published work? This course will help you discover a simpler, cleaner approach to achieving computational reproducibility and recovering some peace of mind.

Through a combination of lectures, demonstrations, and hands-on coding-based exercises, you will learn to create shareable, reliable analytic codebases that deliver consistent results, saving you time and ensuring the scientific integrity of your work.

Prerequisites:

  • Basic skills in R are required (e.g., reading data, simple manipulations of data, producing simple summary statistics).
  • Basic skills in R Markdown and familiarity with R Studio are suggested, but not required. Pre-reading material for students with no experience in R Markdown and/or R Studio will be provided as needed.
  • Some experience in statistics (e.g., very basic understanding of regression analysis and hypothesis testing) is helpful, but not required.

Module Content:

  • Motivations for reproducible computing
  • Basic usage of command line interfaces
  • Writing and running R scripts from the command line
  • Project organization for reproducible computing
  • Version control with git and collaboration with GitHub
  • Virtual environments and managing R package libraries
  • Literate programming and automated reporting
  • Containerization for deployment (time permitting)

Instructors

David Benkeser, PhD

David Benkeser, PhD

Associate Professor, Department of Biostatistics and Bioinformatics, Emory University

David’s research focuses on the theory and applications of machine learning in causal inference, where advanced statistical computing often takes a central role in the work. I direct the Data Science Certificate at the Rollins School of Public Health, where I also teach courses on causal inference and reproducible programming.

Learn More >>

Julia Wrobel, PhD

Julia Wrobel, PhD

Assistant Professor, Department of Biostatistics and Bioinformatics, Emory University

Julia's work is focused on methods for analyzing and visualizing functional data, with applications in wearable devices, neurobiology, and cannabis-impaired driving.  She is also a fellow of the Emory University Center for AI Learning, and an Associate Editor for Reproducibility at the Journal of the American Statistical Association, where she encourages reproducibility in statistical methods research.

Learn More >>

Required Software:

Students will be provided demonstration videos for installing all the required software and packages ahead of the course.

  • Software
    • R Studio
    • R (version >4.1.0)
    • bash/zsh
    • Make
    • Quarto
    • git
  • R packages
    • knitr
    • rmarkdown
    • here
    • renv

Recommended Reading:
Additional readings will be provided to augment the material seen in the course.

For a general idea of the content, see https://benkeser.github.io/info550/, a previous version of a course taught at Emory that covers similar content to this course.