5 BIG DATA CHALLENGES IN PUBLIC HEALTH ANALYSIS
Challenge 1: Spatial Data is No Longer Optional
Until recently, geospatial data was not of high enough quality, was hard to get, expensive, and required a specialized GIS Data Scientist who knew how to work with geospatial data. Because of this, there was much guesswork that went into understanding the underlying patterns of infectious disease and other public health issues. Even issues such as alcohol abuse and obesity have new angles that can be examined using GIS data.
These two issues are on the CDC top 10 public health concerns, so determining new ways to study and understand these problems is a national concern. Proximity to bars and liquor stores can have an impact on alcohol abuse, and the quantity, quality, and public access to greenspaces can help lower obesity. Both of these types of analysis are well-suited to GIS analysis and data.
Challenge 2: Spatial Data is Still Limited
Soon, it will easy for public health officials to more accurately predict and track disease transmission at the patient scale, but for now, there is a gap in the detail that is available. Many public health interests and concerns are only reported at the household, zip code, or even county scale. This leaves much guesswork for analysts and data scientists when trying to understand underlying behavioral motivations. For example, if city health officials know their entire city has a high incidence of lung cancer, they might miss out on a critical finding that those incidences are clustered around an old factory. Having more spatial resolution is essential, but not necessarily available or accessible to today’s researchers.
Challenge 3: Access to Data is not Equal Around the Globe
In the US and much of Europe, we take for granted the quantity and quality of data that is freely or readily available. However, in many countries, this isn’t the case. Further, in some countries, the local government may try to hide or change the data that is reported. This makes added challenges when working in other parts of the globe.
Challenge 4: Multi-Disciplinary Skill Set is Needed
With all the difficulties mentioned so far, it’s clear that a lot of skills are needed. Typically, when looking at public health challenges, a team may include (1) a Medical Doctor or a Social Scientist with experience working in health, (2) a social economist, (3) a statistician or data scientist, and/or (4) a software engineer or analyst with skills in big data and geospatial analysis.
This isn’t to say that a team of one couldn’t accomplish a complex public health analysis, only that these are the skills needed are varied and problem-dependant.
Challenge 5: Data Privacy
Personally Identifying Information (PII) is any information (data point) that either directly identifies someone—such as name or social security number. Alternatively, PII can be other unusual traits about someone from the data—such as a rare disease, or unusual height. It can also be a combination of factors, such as age, sex, occupation, and zip code. In some cases, this is enough to identify someone. Data privacy concerns are compounded when working with health data (HIPAA) or if you are a researcher who intends to publish your data as open source.
Challenge 6: Big Data
Along with the availability of new and diverse data types, comes the problem of how to deal with all the data. In addition to the skill sets mentioned in Challenge 4, big data requires new approaches. These include data linking, data warehousing, and potentially scalable data analysis and access software as the data gets larger.
View More: The Challenges are discussed in the academic lecture at Harvard Uniersity: