I’ve seen countless articles, blog posts, and tweets arguing fervently for or against the idea of pursuing an advanced degree in order to enter the data science field.
I think that this debate is far too context-dependent to have a winning side, but pursuing an advanced degree after college was the right path for me. Typically, the only degrees I’ve seen included in these conversations are master’s and PhDs in data science, computer science, and statistics. While all of those degrees have obvious value and relevance to data science, I would like to introduce degrees in public health to the conversation.
The field of public health is all about protecting and improving the health of the public, and it has a wide array of sub-fields. Two of these, epidemiology and biostatistics, are highly quantitative and have much in common with data science. Epidemiology is the study of the distribution and determinants of disease and other health-related states and events, while biostatistics is, unsurprisingly, largely focused on statistical methods relevant to health and medicine. I completed a two-year Master of Public Health (MPH) in Epidemiology and gained invaluable skills and conceptual knowledge during the course of the degree.
So what makes this degree so well-suited for a career in data science? Well, most importantly, much of the available coursework is directly applicable to careers in data science. There are courses in data science, machine learning, SQL, geospatial analysis, regression, longitudinal analyses, and an array of advanced statistical methods and study design approaches*. While the available coursework likely won’t be sufficient if you’re interested in the most computer science-heavy branches of data science (e.g. data engineering), there’s a lot to be learned by someone looking for more analytic positions.
*Note: Schools vary significantly in how much they prioritize the R and SAS (used extensively in government and non-profit settings) programming languages, which is important to consider when looking at programs. Some schools have largely transitioned to R, while others remain firmly committed to SAS. In many instances, like mine, you may learn both but rely heavily on one or the other.
It’s worth noting that the term “data science” itself has only existed since 2001. Epidemiologists and biostatisticians, however, have been using many of the techniques that now fall under the data science umbrella for decades. This longevity is important to consider when shopping for graduate programs. While data science degree programs have been popping up left and right over the past few years, most schools of public health have been around for far longer. This means that they’ve had time to curate their faculties, develop a curriculum over many iterations, and build a strong alumni network that can be an essential resource later on in your career. You won’t be one of the first few classes of students while professors work out the kinks in their courses — instead, you will likely be part of a well-oiled machine where professors have been working together for years and have developed courses that complement each other well.
Perhaps the greatest benefit of epidemiology coursework is the rich context it provides for thinking about data. The ultimate goal of epidemiology is moving beyond correlation to establish causation, and there is a strong focus on study design, potential bias, and proper use of complex statistical methods to achieve this extraordinarily difficult goal. The stakes are also high — epidemiological work is used to make clinical, policy, and funding decisions with very real, very human consequences — so questions about data quality and statistical validity are taken seriously. After two years in this degree program, I absolutely look at data differently. My gut instinct is to think critically about what population we’ve truly captured in our data (in contrast to the population we intended to capture), as well as what questions we can answer validly based upon the data collection process. While it’s fairly easy to look up an image classification tutorial and train a model to detect malignancies on images of skin, epidemiologists are trained to be thinking critically about the data, such as questioning whether or not that model was trained using the full range of skin tones (they’re usually not) and the implications that failure to do so may have for model performance in populations that are already marginalized. These questions have important consequences outside of healthcare too (think the facial recognition software used by police departments), and training in epidemiology makes it easier to recognize and confront these damaging pitfalls. Epidemiological thinking is an asset in far more benign contexts as well, such as considering whether or not your sales data accurately represents all of your customers, or whether or not you can truly attribute that change in sales to that ad campaign.
If you are interested in working in data science in the healthcare sector, a degree in public health is particularly relevant. It is incredibly valuable to learn statistics and algorithm usage with real healthcare data in the context of real-world problems, and there are many elective courses that you can take to gain a better clinical understanding of the diseases you’ll be working with. These courses, such as “Cancer Epidemiology” or “Psychiatric Epidemiology”, also cover the study designs and analytic techniques most frequently used to study these diseases — material you probably wouldn’t cover in a standard statistics or data science degree. Many data scientists who move into healthcare later on in their careers lack this industry-specific knowledge, and it can set you apart in an application process as well as making it easier to do your job well.
Finally, most reputable public health programs ensure that you graduate with highly marketable experience. Again, the fact that these programs have been around for decades means that they’ve had time to develop relationships with a variety of companies and organizations and can help you get your foot in the door. My school required a “practicum” experience (essentially an internship) between the first and second year of the program, and provided assistance in finding and securing these valuable opportunities. Having this requirement and support makes it far easier to gain data science experience at a healthcare company or related organization, which can also lead to a job offer down the road. Additionally, most programs require a thesis as a graduation requirement. While typically far less daunting than PhD dissertations, epidemiology and biostatistics master’s theses are essentially publishable year-long data science projects that provide excellent experience to feature on your resumé and bring up in job interviews.
While my MPH experience was overwhelmingly positive and the right choice for me, I firmly believe that the decision to pursue any advanced degree is highly personal and dependent upon your unique circumstances. However, I do believe that if you are looking around for a degree program, especially if your long-term goal is to work in health, then quantitative degrees in public health are more than worth your consideration.
Emily is a data analyst working in psychiatric epidemiology in New York City. She is a suicide-prevention professional who is enthusiastic about taking a data-driven approach to the mental health field. Emily holds a Master of Public Health from Columbia University.