
Certificate in Data Science
Certificate in Data Science
On This Page
Program Overview
Data science is a concept that unifies statistics, data analysis, informatics, and their related methods to understand and analyze actual phenomena with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. However, data science is different from computer science and information science.
The rise of data science has driven advances in technology across almost all areas of our life, including health. Modern computational tools give us the ability to manage, process, and analyze data on previously unthinkable scales. Recent advances in statistics and machine learning allow us to glean new insights for these data. These new advances demand an innovative approach to training public health practitioners of the future.
Trainees should be equipped with a skill set that allows them to address challenges raised by modern approaches to data collection and analysis. Trainees must also be equipped with an understanding of the challenges, limitations, and ethical implications of these novel approaches. Students in the Certificate in Data Science at Rollins will be trained to meet the needs of a rapidly advancing health research field.
Pursuing data science training within a top public health school will allow students to see how modern data science can be used towards advancing the public good, rather than increasing corporate profits. The construction of the data science certificate program aims to set students up to succeed in a highly competitive job market.
Certificate Competencies
The certificate in data science has five specific competencies that students who complete the certificate are expected to master.
- Use open-source software to analyze data.
- Apply modern software tools to construct reproducible data science workflows.
- Identify settings where machine learning can be used to inform public health and clinical decision making and apply common machine learning frameworks to data.
- Develop data science products that increase accessibility and interpretability of analytic findings.
- Communicate effectively with public health stakeholders.
Curriculum
Certificate Courses
This certificate program has four required courses (8-9 credit hours) and also requires 3-4 hours of elective credits.
Students must take one class in each of the following categories:
- R programming
- Data science toolkit
- Machine learning
- Current topics
Students must also ensure that their applied practice experience (APE) and integrative learning experience (ILE) can be related to data science, as described below. In extenuating circumstances, students may replace the APE and/or ILE requirement with additional elective courses in lieu of these requirements.
R Programming
For non-BIOS Students Only. The goal of the course is to will provide an introduction to R in organizing, analyzing, and visualizing data. Once you've completed this course you'll be able to enter, save, retrieve, summarize, display and analyze data.
Department of Biostatistics and Bioinformatics
For BIOS Students Only. This course covers the basic contents of R programming with applications on statistical data analysis. Topics include data types, language syntax, graphics packages, debugging, the tidy verse, efficient programming and package creation.
Department of Biostatistics and Bioinformatics
Data Science Toolkit
Prerequisites: BIOS 544 or BIOS 545, R programming experience needed or permission of the instructor. This course is an elective for Masters and PhD students interested in learning some fundamental tools used in modern data science. Together, the tools covered in the course will provide the ability to develop fully reproducible pipelines for data analysis, from data processing and cleaning to analysis to result tables and summaries. By the end of the course students will have learned the tools necessary to: develop reproducible workflows collaboratively (using version control based on Git/GitHub), execute these workflows on a local computer (using command line operations, RMarkdown, and GNU Makefiles), execute the workflows in a containerized environment allowing end-to-end reproducibility (using Docker), and execute the workflow in a cloud environment (using Amazon Web Services EC2 and S3 services). Along the way, we will cover a few other tools for data science including best coding practices, basic python, software unit testing, and continuous integration services.
Machine Learning
Prerequisites: Multivariate Calculus (Calculus III), Linear Algebra, and Python programming. This course covers fundamental machine learning theory and techniques. The topics include basic theory, classification methods, model generalization, clustering, and dimension reduction. The material will be conveyed by a series of lectures, homeworks, and projects.
Department of Biostatistics and Bioinformatics
Prerequisites: BIOS 500 and (BIOS 544 or BIOS 545 or EPI 534) or permission of instructor. The elective course gives an introduction to machine learning techniques and theory, with a focus on its use in practical applications. The Applied Machine Learning course teaches you a wide-ranging set of techniques of supervised and unsupervised machine learning approaches using R as the programming language.
Current Topics
This course is the culminating experience of the data science certificate program and is to be taken in the spring semester of second year. The course must be taken by certificate-enrolled students in addition to any degree-required integrated learning experience (ILE) requirements. The course provides a review of current topics of interest in data science, helps prepare students for the data science job market, and involves a culminating data science project that relates to students' degree-required ILE. The first several meetings of this course focus on helping students identify suitable data science products and planning for the skills and tools that are needed to complete the ILE-related requirements for the data science certificate. Subsequent classes will cover modern topics in data science (e.g., R Shiny, communicating with diverse audiences, software unit testing, data sharing and privacy) and lectures on preparations for applying for data science-related jobs.
All certificate students should enroll in DATA 555 in the spring semester of their second year. This course will facilitate the integration of the development of an approved data science product into the students’ existing ILE requirements.
All students should make a good faith effort to complete a data science component as a part of their ILE and enroll in DATA 555. However, if extenuating circumstances preclude a student from identifying an appropriate data science component for their ILE, then an additional 4 credit hours of electives may be completed in lieu of DATA 555.
Additional Requirements
To satisfy the certificate APE requirement, either:
- A data science-related APE should be completed
- 3 additional credit hours from the list of electives above should be completed
We will offer a 2-credit Current Topics in Data Science course (DATA 555) that students will complete in the spring semester of the second year. This course must be taken in addition to each degree program’s specific ILE requirements. If a project cannot be identified, then the student must complete an additional 4 credit hours of electives from the list of acceptable elective courses.
Admissions
Declaring Intent
Each fall semester students will declare their interest in the certificate by submitting a formal declaration of intent. The declaration will ask students to answer specific questions to gauge the student’s interest and desire to complete the Data Science Certificate.
Please note: the declaration of intent form requires that you sign in using your emory.edu email. Students are responsible for enrolling in required and elective courses each semester prior to add/drop/swap ends. Additionally, the department is unable to increase enrollment capacity.