Certificates

Certificate in Data Science

Program Overview

Data science is a concept that unifies statistics, data analysis, informatics, and their related methods to understand and analyze actual phenomena with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. However, data science is different from computer science and information science.

The rise of data science has driven advances in technology across almost all areas of our life, including health. Modern computational tools give us the ability to manage, process, and analyze data on previously unthinkable scales. Recent advances in statistics and machine learning allow us to glean new insights for these data. These new advances demand an innovative approach to training public health practitioners of the future.

Trainees should be equipped with a skill set that allows them to address challenges raised by modern approaches to data collection and analysis. Trainees must also be equipped with an understanding of the challenges, limitations, and ethical implications of these novel approaches. Students in the Certificate in Data Science at Rollins will be trained to meet the needs of a rapidly advancing health research field.

Pursuing data science training within a top public health school will allow students to see how modern data science can be used towards advancing the public good, rather than increasing corporate profits. The construction of the data science certificate program aims to set students up to succeed in a highly competitive job market.

Certificate Competencies

The certificate in data science has five specific competencies that students who complete the certificate are expected to master.

Use open-source software to analyze data.
Apply modern software tools to construct reproducible data science workflows.
Identify settings where machine learning can be used to inform public health and clinical decision making and apply common machine learning frameworks to data.
Develop data science products that increase accessibility and interpretability of analytic findings.
Communicate effectively with public health stakeholders.

Curriculum

Certificate Courses

This certificate program has four required courses (8-9 credit hours) and also requires 3-4 hours of elective credits.

Students must take one class in each of the following categories:

R programming
Data science toolkit
Machine learning
Current topics

Students must also ensure that their applied practice experience (APE) and integrative learning experience (ILE) can be related to data science, as described below. In extenuating circumstances, students may replace the APE and/or ILE requirement with additional elective courses in lieu of these requirements.

R Programming

For non-BIOS Students Only. The goal of the course is to will provide an introduction to R in organizing, analyzing, and visualizing data. Once you've completed this course you'll be able to enter, save, retrieve, summarize, display and analyze data.

Department of Biostatistics and Bioinformatics

Online, In Person

Fall, Spring

2 credit hours

For BIOS Students Only. This course covers the basic contents of R programming with applications on statistical data analysis. Topics include data types, language syntax, graphics packages, debugging, the tidy verse, efficient programming and package creation.

Department of Biostatistics and Bioinformatics

In Person

Fall, Spring

2 credit hours

Data Science Toolkit

Prerequisites: BIOS 544 or BIOS 545, R programming experience needed or permission of the instructor. This course is an elective for Masters and PhD students interested in learning some fundamental tools used in modern data science. Together, the tools covered in the course will provide the ability to develop fully reproducible pipelines for data analysis, from data processing and cleaning to analysis to result tables and summaries. By the end of the course students will have learned the tools necessary to: develop reproducible workflows collaboratively (using version control based on Git/GitHub), execute these workflows on a local computer (using command line operations, RMarkdown, and GNU Makefiles), execute the workflows in a containerized environment allowing end-to-end reproducibility (using Docker), and execute the workflow in a cloud environment (using Amazon Web Services EC2 and S3 services). Along the way, we will cover a few other tools for data science including best coding practices, basic python, software unit testing, and continuous integration services.

In Person

Fall, Spring

2 credit hours

Machine Learning

Prerequisites: Multivariate Calculus (Calculus III), Linear Algebra, and Python programming. This course covers fundamental machine learning theory and techniques. The topics include basic theory, classification methods, model generalization, clustering, and dimension reduction. The material will be conveyed by a series of lectures, homeworks, and projects.

Department of Biostatistics and Bioinformatics

In Person

Fall, Spring

3 credit hours

Prerequisites: BIOS 500, 506, or 508 and (BIOS 544 or BIOS 545 or EPI 534) or permission of instructor. The elective course gives an introduction to machine learning techniques and theory, with a focus on its use in practical applications. The Applied Machine Learning course teaches you a wide-ranging set of techniques of supervised and unsupervised machine learning approaches using R as the programming language.

Online, In Person

Fall, Spring

2 credit hours

Current Topics

This course is the culminating experience of the data science certificate program and is to be taken in the spring semester of second year. The course must be taken by certificate-enrolled students in addition to any degree-required integrated learning experience (ILE) requirements. The course provides a review of current topics of interest in data science, helps prepare students for the data science job market, and involves a culminating data science project that relates to students' degree-required ILE. The first several meetings of this course focus on helping students identify suitable data science products and planning for the skills and tools that are needed to complete the ILE-related requirements for the data science certificate. Subsequent classes will cover modern topics in data science (e.g., R Shiny, communicating with diverse audiences, software unit testing, data sharing and privacy) and lectures on preparations for applying for data science-related jobs.

In Person

Spring

2 credit hours

Electives

Prerequisites: BIOS 501 or permission of instructor. This is the overview course for the Bioinformatics, Imaging and Genetics (BIG) concentration in the PhD program of the Department of Biostatistics and Bioinformatics. It aims to introduce students to modern high-dimensional biomedical data, including data in bioinformatics and computational biology, biomedical imaging, and statistical genetics. This course will be co-taught by all BIG core faculty members, with each faculty member giving one or two lectures. The focus of the course will be on the data characteristics, opportunities and challenges for statisticians, as well as current developments and hot areas of the research fields of bioinformatics, biomedical imaging and statistical genetics.

Department of Biostatistics and Bioinformatics

In Person

Fall

1 credit hours

Prerequisites: BIOS 500, 506, or 508. This class is designed to cover the concepts and implementations of up-to-date analytic methodologies and strategies in observational studies, and to equip the students with the mindset and essential tools to handle data from observational research either for prediction (statistical learning) or causal inference. Propensity score methods, establishing/validating prediction models, risk stratification, the guidance of Good Research Practice, etc. will be illustrated along with real-life projects and backed up by the recent literatures.

Department of Biostatistics and Bioinformatics

In Person

Spring

2 credit hours

Prerequisites: BIOS 500 & BIOS 501, BIOS 506 or BIOS 508 (concurrent) or permission of instructor. This class is designed to help students master statistical programming in SAS. Students in this class will develop programming style and skills for data manipulation, report generation, simulation and graphing. This class does not directly satisfy any competencies as defined by the Department of Biostatistics and Bioinformatics, the Rollins School of Public Health or the Council on Education for Public Health (CEPH). That being said, SAS is a primary data analysis and data management software system in use worldwide, particularly in public health settings. Students who master the skills offered in this course will have a much easier time completing the work for their thesis and will find themselves more ready for a public health career with a more analytical bent.

Department of Biostatistics and Bioinformatics

In Person

Fall

2 credit hours

Prerequisites: BIOS 501 or equivalents and basic programming in R or permission of the instructor. This elective course introduces statistical and machine learning methods for the analysis of single-cell genomic data, with an emphasis on conceptual understanding and practical applications. The course begins with classical statistical methods for bulk transcriptomics and then covers recent statistical and machine learning approaches for single-cell genomics (including transcriptomics and epigenomics) as well as spatial data. Weekly lab sessions are dedicated to practicing the methods introduced in class. Students will learn both the methodological foundations and hands-on workflows for data analysis. By the end of the course, students will be able to perform a complete analyze workflow for single-cell genomics data.

Department of Biostatistics and Bioinformatics

In Person

Fall

2 credit hours

This course will provide a pragmatic and hands-on introduction to the Python programming language, with a focus on practical applications and projects, rather than theoretical topics. We cover data types, control flow, object-oriented programming, and graphical user interface-driven applications. Students will learn to work with packages, data structures, and tools for data science and cybersecurity. The examples and problems used in this course are drawn from diverse areas such as text processing, simple graphics creation and image manipulation, HTML and web programming, and genomics.

Department of Biostatistics and Bioinformatics

Online, In Person

Fall, Spring

3 credit hours

Department of Biostatistics and Bioinformatics

Online, In Person

Fall

3 credit hours

Prerequisites: BIOS 500, BIOS 506, BIOS 508 or permission of instructor. In this course, you'll learn about the basic structure of relational databases and how to read and write simple and complex SQL statements and advanced data manipulation techniques. By the end of this course, you'll have a solid working knowledge of structured query language. You'll feel confident in your ability to write SQL queries to create tables; retrieve data from single or multiple tables; delete, insert, and update data in a database; and gather significant statistics from data stored in a database. This course will teach key concepts of Structured Query Language (SQL), and gain a solid working knowledge of this powerful and universal database programming language. This course provides a comprehensive introduction to the language of relational databases: Structured Query Language (SQL). Topics covered include: Entity-Relationship modeling, the Relational Model, the SQL language: data retrieval statements, data manipulation.

Online, In Person

Fall, Spring

3 credit hours

The course introduces the use of geographic information systems (GIS) in the analysis of public health data. We develop GIS skills through homework, quizzes, and a case study. Specific skills include map layouts, visualization, and basic GIS operations such as buffering, layering, summarizing, geocoding, digitizing and spatial queries.

Online

Fall, Spring

2 credit hours

Prerequisites: INFO DATA 530 or permission of the instructor. The course continues the use of geographic information systems (GIS) in the analysis of public health data and adds more advanced features. We develop GIS skills through homework, quizzes and a final project, and particularly build upon the skills learned in INFO 530 such as map layouts, visualization, basic spatial statistics, and basic GIS operations such as buffering, layering, summarizing, geocoding, digitizing and spatial queries. We add new topics such as raster analysis open source GIS, (qgis), geo databases, story maps, and making maps in R.

Online, In Person

Fall, Spring

2 credit hours

Prerequisites: BIOS 544 or BIOS 545. This course will teach students to use data visualizations to analyze public health, medical, and biological sciences data and communicate information derived from these data to various audiences. Students will learn key concepts and methods in creating data visualizations and put them into practice with hands on assignments creating data visualization and critiquing public health visualizations. Multidisciplinary review and feedback on student designs can help to improve the quality and effectiveness of student visualization, therefore students will often work in pairs or groups.

Online, In Person

Fall, Spring

2 credit hours

Pre-requisites: BIOS 544 or 545 and DATA 521 or equivalent with the instructor?s permission. This course introduces exposomics in environmental health, emphasizing integration of exposure data with molecular and biological omics. The exposome shifts focus from single exposures to cumulative environmental effects. Students learn how high-resolution mass spectrometry generates, processes, and quality-controls exposomic data, and how to integrate these with other omics to identify environmental drivers, biological targets, and pathways linking exposures to health outcomes. Cheminformatics modules cover chemical annotation, database querying, and molecular similarity. Using R, large language models, and advanced tools, students analyze real datasets, apply statistical and machine learning methods, and interpret high-dimensional data to build multi-omics models of environment?health relationships.

Gangarosa Department of Environmental Health

Blended/Hybrid, In Person

Fall, Spring

2 credit hours

Prerequisites: BIOS 544 or BIOS 545 or instructor permission, and prior coursework in genetics, cellular and molecular biology, and epidemiology are highly encouraged. This elective course provides students with an overview of systems biology, genetics, epigenomics, and transcriptomics, within the context of environmental health. We will cover policy and translational implications and teach the underlying biological principles driving these analyses, laboratory methods involved, analytic approaches, and epidemiologic considerations. Upon completion of this course, students should be better equipped to read and interpret the scientific literature utilizing these methods and begin to consider how these approaches could be included in their own research.

Gangarosa Department of Environmental Health

In Person

Fall

2 credit hours

Prerequisites: BIOS 500 and EPI 530. It is preferred that students also take BIOS 501 or BIOS 591P. Students should be comfortable using R. While not required, it is preferable that students take BIOS 544 concurrently or prior to taking this course. In the Methods for Environmental Mixtures course, students will learn the importance of evaluating environmental exposures as mixtures, as well as an overview of selected environmental mixture methods and data analysis techniques commonly used in public health research. This course focuses on developing an understanding of when to use a specific method, the pros and cons of different approaches, and hands-on applications of environmental mixture methods in R. The course is an elective that is open to second year MPH students and PhD students. It is required that students bring their laptops to class.

Gangarosa Department of Environmental Health

In Person

Fall

2 credit hours

Various topics by GDEH faculty. Check OPUS/Atlas for current topics and descriptions.

Gangarosa Department of Environmental Health

Online, In Person

Fall, Spring

1 credit hours

Provides an introduction to SAS programming environment and instructs students in the techniques needed to create, organize, and edit data into a final dataset that is ready for epidemiologic analysis.

Department of Epidemiology

Online, In Person

Spring

1 credit hours

Prerequisites EPI 530, BIOS 500, EPI 534, and BIOS 591P concurrent. MSPH and PhD students only.

This course builds on the fundamental epidemiologic concepts introduced in EPI 530: Epidemiologic Methods I. Specifically, causality, bias (including confounding, information bias, and selection bias), and concepts of mediation and interaction will be revisited in greater depth. By the end of the course, students will be able to do the following: formulate research questions to evaluate causality; evaluate the strengths and limitations of epidemiologic studies; assess how the strengths and limitations of a study affect interpretation of study results; utilize epidemiologic methods to address confounding; identify epidemiologic methods to address selection bias and information bias; and calculate measures to assess interaction.

Department of Epidemiology

In Person

Spring

4 credit hours

Prerequisites: BIOS 500 and EPI 552 or instructor permission, Knowledge of R is recommended. Genomic epidemiology is an increasingly important approach to studying disease risks in populations. This course will introduce the basic genetic principles as they apply to the identification of genetic variations associated with disease; illustrate the population and quantitative genetic concepts that are necessary to study the relationship between genetic variation and disease variation in populations; and provide hands-on experience to address the analytical needs for conducting genomic epidemiologic research. Studentswill gain experience with R and PLINK using high dimensional genetic data.

Department of Epidemiology

In Person

Fall

2 credit hours

EPI 530, 545, and 550 and/or instructor permission.

This course covers epidemiologic concepts in further depth than previous methods courses and provides an overview of advanced topics in the analysis of epidemiologic data. The course reviews basic concepts behind cohort studies, and introduces students to fundamental survival analysis concepts, including risk and survival, hazards, competing risks, cause-specific and sub-distribution risk, risk difference, and risk ratio estimators. Generalized linear models for conditionally and marginally adjusted risk differences and ratios, as well as methods for correct variance estimation. Concepts of time-dependent confoundinng, and methods that can be used to analyze complex longitudinal data (IP weighting, marginal standardization). This is a required course for students in the MSPH and PhD Epidemiology program.

Department of Epidemiology

In Person

Spring

4 credit hours

All certificate students should enroll in DATA 555 in the spring semester of their second year. This course will facilitate the integration of the development of an approved data science product into the students’ existing ILE requirements.

All students should make a good faith effort to complete a data science component as a part of their ILE and enroll in DATA 555. However, if extenuating circumstances preclude a student from identifying an appropriate data science component for their ILE, then an additional 4 credit hours of electives may be completed in lieu of DATA 555.

Additional Requirements

To satisfy the certificate APE requirement, either:

A data science-related APE should be completed
3 additional credit hours from the list of electives above should be completed

We will offer a 2-credit Current Topics in Data Science course (DATA 555) that students will complete in the spring semester of the second year. This course must be taken in addition to each degree program’s specific ILE requirements. If a project cannot be identified, then the student must complete an additional 4 credit hours of electives from the list of acceptable elective courses.

Admissions

Applicants can indicate interest in certificate programs on their SOPHAS application to receive more information.

Current students can declare intent to enroll in certificate programs after matriculation.

Learn more about enrolling in certificate programs (login required)

Contact

Contact your ADAP with certificate questions!

Find your ADAP

Get in Touch:

Certificate Director

Contact Name

David Benkesser, PhD

Contact Email

benkesser@emory.edu

Ready to Take the Next Step?

Explore Degree Programs Start Your Application Request Info

Back to Focus Areas

Back to Professional Development

Back to Academic Resources

Back to How to Apply

Back to Tuition & Funding

Back to Centers & Labs

Back to Research Accelerators

Back to Hands-On Learning

Back to In the Community

Back to The Student Experience

Back to Careers in Public Health

Back to Focus Areas

Back to Professional Development

Back to Academic Resources

Back to How to Apply

Back to Tuition & Funding

Back to Centers & Labs

Back to Research Accelerators

Back to Hands-On Learning

Back to In the Community

Back to The Student Experience

Back to Careers in Public Health

Certificate in Data Science

On This Page

Program Overview

Certificate Competencies

Curriculum

Certificate Courses

R Programming

Department of Biostatistics and Bioinformatics

Department of Biostatistics and Bioinformatics

Data Science Toolkit

Machine Learning

Department of Biostatistics and Bioinformatics

Current Topics

Electives

Department of Biostatistics and Bioinformatics

Department of Biostatistics and Bioinformatics

Department of Biostatistics and Bioinformatics

Department of Biostatistics and Bioinformatics

Department of Biostatistics and Bioinformatics

Department of Biostatistics and Bioinformatics

Gangarosa Department of Environmental Health

Gangarosa Department of Environmental Health

Gangarosa Department of Environmental Health

Gangarosa Department of Environmental Health

Department of Epidemiology

Department of Epidemiology

Department of Epidemiology

Department of Epidemiology

Additional Requirements

Admissions

Contact

Get in Touch:

Ready to Take the Next Step?