Mining Gold from Public Health Data: The Unrealized Opportunity in Healthcare’s Open Datasets
Disclaimer: The views and opinions expressed in this essay are my own and do not reflect the views of my employer, Datavant, or any other organization with which I am affiliated.
Table of Contents
- The Economics of Healthcare Data and Why Most Companies Get It Wrong
- Public Imaging Datasets and the Community Hospital Opportunity
- Public Health Surveillance Data and the Government Sales Opportunity
- Provider Network Analytics and Insurance Marketplace Data
- Environmental Health Data and the Emerging Multimodal Opportunity
- Clinical Notes and Social Determinants Extraction
- Critical Care Research Databases and Academic Validation Opportunities
- Commercial EHR Data Platforms and the Real-World Evidence Gold Rush
- The Incumbent Commercial Data Vendors and Their Stranglehold on Pharma
- Synthesis: Building Defensible Businesses on Healthcare Data
Abstract
This essay examines the emerging landscape of public and private healthcare datasets, exploring viable business models for health tech entrepreneurs and angel investors. We analyze ten major data resources spanning both freely accessible and commercial datasets: Google Cloud Healthcare API, CDC Open Data, HHS/[HealthData.gov](http://HealthData.gov) portals, SatHealth multimodal environmental health data, SDOH-NLI clinical notes corpus, MIMIC-IV and eICU critical care databases, and commercial offerings from Truveta, IQVIA, Komodo Health, and Symphony Health. For each dataset, we detail technical implementation approaches, go-to-market strategies, and specific business opportunities ranging from clinical decision support to real-world evidence generation. Key themes include the convergence of multimodal data sources, the critical importance of linking public datasets with proprietary clinical data, and the strategic advantages of vertical-specific solutions over horizontal platforms. We argue that sustainable businesses require not just data access but differentiated domain expertise, proprietary data layering, and strategic go-to-market execution.
The Economics of Healthcare Data and Why Most Companies Get It Wrong
Keep reading with a 7-day free trial
Subscribe to Thoughts on Healthcare Markets and Technology to keep reading this post and get 7 days of free access to the full post archives.

