Certificate in Data Science
On This Page
Program Overview
Data science is a concept that unifies statistics, data analysis, informatics, and their related methods to understand and analyze actual phenomena with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. However, data science is different from computer science and information science.
The rise of data science has driven advances in technology across almost all areas of our life, including health. Modern computational tools give us the ability to manage, process, and analyze data on previously unthinkable scales. Recent advances in statistics and machine learning allow us to glean new insights for these data. These new advances demand an innovative approach to training public health practitioners of the future.
Trainees should be equipped with a skill set that allows them to address challenges raised by modern approaches to data collection and analysis. Trainees must also be equipped with an understanding of the challenges, limitations, and ethical implications of these novel approaches. Students in the Certificate in Data Science at Rollins will be trained to meet the needs of a rapidly advancing health research field.
Pursuing data science training within a top public health school will allow students to see how modern data science can be used towards advancing the public good, rather than increasing corporate profits. The construction of the data science certificate program aims to set students up to succeed in a highly competitive job market.
Certificate Competencies
The certificate in data science has five specific competencies that students who complete the certificate are expected to master.
- Use open-source software to analyze data.
- Apply modern software tools to construct reproducible data science workflows.
- Identify settings where machine learning can be used to inform public health and clinical decision making and apply common machine learning frameworks to data.
- Develop data science products that increase accessibility and interpretability of analytic findings.
- Communicate effectively with public health stakeholders.
Curriculum
Certificate Courses
This certificate program has four required courses (8-9 credit hours) and also requires 3-4 hours of elective credits.
Students must take one class in each of the following categories:
- R programming
- Data science toolkit
- Machine learning
- Current topics
Students must also ensure that their applied practice experience (APE) and integrative learning experience (ILE) can be related to data science, as described below. In extenuating circumstances, students may replace the APE and/or ILE requirement with additional elective courses in lieu of these requirements.
R Programming
For non-BIOS Students Only. The goal of the course is to will provide an introduction to R in organizing, analyzing, and visualizing data. Once you've completed this course you'll be able to enter, save, retrieve, summarize, display and analyze data.
For BIOS Students Only. This course covers the basic contents of R programming with applications on statistical data analysis. Topics include data types, language syntax, graphics packages, debugging, the tidy verse, efficient programming and package creation.
Department of Biostatistics and Bioinformatics
Data Science Toolkit
Prerequisites: BIOS 544 or BIOS 545, R programming experience needed or permission of the instructor. This course is an elective for Masters and PhD students interested in learning some fundamental tools used in modern data science. Together, the tools covered in the course will provide the ability to develop fully reproducible pipelines for data analysis, from data processing and cleaning to analysis to result tables and summaries. By the end of the course students will have learned the tools necessary to: develop reproducible workflows collaboratively (using version control based on Git/GitHub), execute these workflows on a local computer (using command line operations, RMarkdown, and GNU Makefiles), execute the workflows in a containerized environment allowing end-to-end reproducibility (using Docker), and execute the workflow in a cloud environment (using Amazon Web Services EC2 and S3 services). Along the way, we will cover a few other tools for data science including best coding practices, basic python, software unit testing, and continuous integration services.
Machine Learning
Prerequisites: Multivariate Calculus (Calculus III), Linear Algebra, and Python programming. This course covers fundamental machine learning theory and techniques. The topics include basic theory, classification methods, model generalization, clustering, and dimension reduction. The material will be conveyed by a series of lectures, homeworks, and projects.
Department of Biostatistics and Bioinformatics
Prerequisites: BIOS 500, 506, or 508 and (BIOS 544 or BIOS 545 or EPI 534) or permission of instructor. The elective course gives an introduction to machine learning techniques and theory, with a focus on its use in practical applications. The Applied Machine Learning course teaches you a wide-ranging set of techniques of supervised and unsupervised machine learning approaches using R as the programming language.
Current Topics
This course is the culminating experience of the data science certificate program and is to be taken in the spring semester of second year. The course must be taken by certificate-enrolled students in addition to any degree-required integrated learning experience (ILE) requirements. The course provides a review of current topics of interest in data science, helps prepare students for the data science job market, and involves a culminating data science project that relates to students' degree-required ILE. The first several meetings of this course focus on helping students identify suitable data science products and planning for the skills and tools that are needed to complete the ILE-related requirements for the data science certificate. Subsequent classes will cover modern topics in data science (e.g., R Shiny, communicating with diverse audiences, software unit testing, data sharing and privacy) and lectures on preparations for applying for data science-related jobs.
Electives
Prerequisites: BIOS 501 or permission of instructor. This is the overview course for the Bioinformatics, Imaging and Genetics (BIG) concentration in the PhD program of the Department of Biostatistics and Bioinformatics. It aims to introduce students to modern high-dimensional biomedical data, including data in bioinformatics and computational biology, biomedical imaging, and statistical genetics. This course will be co-taught by all BIG core faculty members, with each faculty member giving one or two lectures. The focus of the course will be on the data characteristics, opportunities and challenges for statisticians, as well as current developments and hot areas of the research fields of bioinformatics, biomedical imaging and statistical genetics.
Department of Biostatistics and Bioinformatics
Prerequisites: BIOS 500, 506, or 508. This class is designed to cover the concepts and implementations of up-to-date analytic methodologies and strategies in observational studies, and to equip the students with the mindset and essential tools to handle data from observational research either for prediction (statistical learning) or causal inference. Propensity score methods, establishing/validating prediction models, risk stratification, the guidance of Good Research Practice, etc. will be illustrated along with real-life projects and backed up by the recent literatures.
Department of Biostatistics and Bioinformatics
Prerequisites: BIOS 500 & BIOS 501, BIOS 506 or BIOS 508 (concurrent) or permission of instructor. This class is designed to help students master statistical programming in SAS. Students in this class will develop programming style and skills for data manipulation, report generation, simulation and graphing. This class does not directly satisfy any competencies as defined by the Department of Biostatistics and Bioinformatics, the Rollins School of Public Health or the Council on Education for Public Health (CEPH). That being said, SAS is a primary data analysis and data management software system in use worldwide, particularly in public health settings. Students who master the skills offered in this course will have a much easier time completing the work for their thesis and will find themselves more ready for a public health career with a more analytical bent.
Department of Biostatistics and Bioinformatics
Prerequisites: BIOS 501 or equivalents and basic programming in R or permission of the instuctor. This course covers the basics of microarray and second-generation sequencing data analysis using R/BioConductor and other open source software. Topics include gene expression microarray, RNA-seq, ChIP-seq and general DNA sequence analyses. We will introduce technologies, data characteristics, statistical challenges, existing methods and potential research topics. Students will also learn to use proper Bioconductor packages and other open source software to analyze different types of data and deliver biologically interpretable results.
Department of Biostatistics and Bioinformatics
This course will provide a pragmatic and hands-on introduction to the Python programming language, with a focus on practical applications and projects, rather than theoretical topics. We cover data types, control flow, object-oriented programming, and graphical user interface-driven applications. Students will learn to work with packages, data structures, and tools for data science and cybersecurity. The examples and problems used in this course are drawn from diverse areas such as text processing, simple graphics creation and image manipulation, HTML and web programming, and genomics.
This course will provide a pragmatic and hands-on introduction to the Python programming language, with a focus on practical applications and projects, rather than theoretical topics. We cover data types, control flow, object-oriented programming, and graphical user interface-driven applications. Students will learn to work with packages, data structures, and tools for data science and cybersecurity. The examples and problems used in this course are drawn from diverse areas such as text processing, simple graphics creation and image manipulation, HTML and web programming, and genomics.
Prerequisites: BIOS 500, BIOS 506, BIOS 508 or permission of instructor. In this course, you'll learn about the basic structure of relational databases and how to read and write simple and complex SQL statements and advanced data manipulation techniques. By the end of this course, you'll have a solid working knowledge of structured query language. You'll feel confident in your ability to write SQL queries to create tables; retrieve data from single or multiple tables; delete, insert, and update data in a database; and gather significant statistics from data stored in a database. This course will teach key concepts of Structured Query Language (SQL), and gain a solid working knowledge of this powerful and universal database programming language. This course provides a comprehensive introduction to the language of relational databases: Structured Query Language (SQL). Topics covered include: Entity-Relationship modeling, the Relational Model, the SQL language: data retrieval statements, data manipulation.
The course introduces the use of geographic information systems (GIS) in the analysis of public health data. We develop GIS skills through homework, quizzes, and a case study. Specific skills include map layouts, visualization, and basic GIS operations such as buffering, layering, summarizing, geocoding, digitizing and spatial queries.
Prerequisites: INFO DATA 530 or permission of the instructor. The course continues the use of geographic information systems (GIS) in the analysis of public health data and adds more advanced features. We develop GIS skills through homework, quizzes and a final project, and particularly build upon the skills learned in INFO 530 such as map layouts, visualization, basic spatial statistics, and basic GIS operations such as buffering, layering, summarizing, geocoding, digitizing and spatial queries. We add new topics such as raster analysis open source GIS, (qgis), geo databases, story maps, and making maps in R.
Prerequisites: BIOS 544 or BIOS 545. This course will teach students to use data visualizations to analyze public health, medical, and biological sciences data and communicate information derived from these data to various audiences. Students will learn key concepts and methods in creating data visualizations and put them into practice with hands on assignments creating data visualization and critiquing public health visualizations. Multidisciplinary review and feedback on student designs can help to improve the quality and effectiveness of student visualization, therefore students will often work in pairs or groups.
Pre-requisites: BIOS 544 or 545 and DATA 521 or equivalent with the instructor?s permission. This course introduces exposomics in environmental health, emphasizing integration of exposure data with molecular and biological omics. The exposome shifts focus from single exposures to cumulative environmental effects. Students learn how high-resolution mass spectrometry generates, processes, and quality-controls exposomic data, and how to integrate these with other omics to identify environmental drivers, biological targets, and pathways linking exposures to health outcomes. Cheminformatics modules cover chemical annotation, database querying, and molecular similarity. Using R, large language models, and advanced tools, students analyze real datasets, apply statistical and machine learning methods, and interpret high-dimensional data to build multi-omics models of environment?health relationships.
Gangarosa Department of Environmental Health
This elective course provides students with an overview of systems biology, genetics, epigenomics, and transcriptomics, within the context of environmental health. We will cover policy and translational implications and teach the underlying biological principles driving these analyses, laboratory methods involved, analytic approaches, and epidemiologic considerations. Upon completion of this course, students should be better equipped to read and interpret the scientific literature utilizing these methods and begin to consider how these approaches could be included in their own research.
Gangarosa Department of Environmental Health
Prerequisites: Students should have taken BIOS 500 and EPI 530. It is preferred that students also take BIOS 501 or BIOS 591P. Students should be comfortable using R. While not required, it is preferable that students take BIOS 544 concurrently or prior to taking this course. In the Methods for Environmental Mixtures course, students will learn the importance of evaluating environmental exposures as mixtures, as well as an overview of selected environmental mixture methods and data analysis techniques commonly used in public health research. This course focuses on developing an understanding of when to use a specific method, the pros and cons of different approaches, and hands-on applications of environmental mixture methods in R. The course is an elective that is open to second year MPH students and PhD students. It is required that students bring their laptops to class.
Gangarosa Department of Environmental Health
Prerequisites EPI 530, BIOS 500, EPI 534, and BIOS 591P concurrent. MSPH and PhD students only.
This course builds on the fundamental epidemiologic concepts introduced in EPI 530: Epidemiologic Methods I. Specifically, causality, bias (including confounding, information bias, and selection bias), and concepts of mediation and interaction will be revisited in greater depth. By the end of the course, students will be able to do the following: formulate research questions to evaluate causality; evaluate the strengths and limitations of epidemiologic studies; assess how the strengths and limitations of a study affect interpretation of study results; utilize epidemiologic methods to address confounding; identify epidemiologic methods to address selection bias and information bias; and calculate measures to assess interaction.
Department of Epidemiology
Prerequisites: BIOS 500 and EPI 552 or instructor permission, Knowledge of R is recommended. Genomic epidemiology is an increasingly important approach to studying disease risks in populations. This course will introduce the basic genetic principles as they apply to the identification of genetic variations associated with disease; illustrate the population and quantitative genetic concepts that are necessary to study the relationship between genetic variation and disease variation in populations; and provide hands-on experience to address the analytical needs for conducting genomic epidemiologic research. Studentswill gain experience with R and PLINK using high dimensional genetic data.
Department of Epidemiology
EPI 530, 545, and 550 and/or instructor permission.
This course covers epidemiologic concepts in further depth than previous methods courses and provides an overview of advanced topics in the analysis of epidemiologic data. The course reviews basic concepts behind cohort studies, and introduces students to fundamental survival analysis concepts, including risk and survival, hazards, competing risks, cause-specific and sub-distribution risk, risk difference, and risk ratio estimators. Generalized linear models for conditionally and marginally adjusted risk differences and ratios, as well as methods for correct variance estimation. Concepts of time-dependent confoundinng, and methods that can be used to analyze complex longitudinal data (IP weighting, marginal standardization). This is a required course for students in the MSPH and PhD Epidemiology program.
Department of Epidemiology
All certificate students should enroll in DATA 555 in the spring semester of their second year. This course will facilitate the integration of the development of an approved data science product into the students’ existing ILE requirements.
All students should make a good faith effort to complete a data science component as a part of their ILE and enroll in DATA 555. However, if extenuating circumstances preclude a student from identifying an appropriate data science component for their ILE, then an additional 4 credit hours of electives may be completed in lieu of DATA 555.
Additional Requirements
To satisfy the certificate APE requirement, either:
- A data science-related APE should be completed
- 3 additional credit hours from the list of electives above should be completed
We will offer a 2-credit Current Topics in Data Science course (DATA 555) that students will complete in the spring semester of the second year. This course must be taken in addition to each degree program’s specific ILE requirements. If a project cannot be identified, then the student must complete an additional 4 credit hours of electives from the list of acceptable elective courses.
Admissions
Applicants can indicate interest in certificate programs on their SOPHAS application to receive more information.
Current students can declare intent to enroll in certificate programs after matriculation.
Learn more about enrolling in certificate programs (login required)
Contact
Contact your ADAP with certificate questions!