Quick Links: Knowledge Base, Podcast, and Social
Knowledge Base — search and filter every article and podcast episode by topic, section, and keyword: kb.onhealthcare.tech
Listen to the Podcast — every article is also available as an audio episode. Free subscribers get the public episodes; paid subscribers get the full archive including subscriber-only episodes. Listen on Apple Podcasts, Spotify, or browse all episodes on the Substack Podcast page.
For paid subscribers — your subscription unlocks the entire research archive (538+ deep-dives), every paid podcast episode, and full search inside the Knowledge Base. To listen to paid episodes in Apple or Spotify, link your Substack subscription via the show settings on those platforms (instructions inside the Substack app under Subscriptions → Podcast).
For free subscribers — free posts and free podcast episodes are always public on Apple/Spotify and Substack. Upgrade any time at onhealthcare.tech/subscribe to access the paid archive and paid episodes.
Follow on Social — X · YouTube · TikTok · Instagram
A Harvard team just published a study in Science showing o1 outperformed ER physicians at diagnosis. The popular take is “AI beat doctors.” The popular take misses the most important finding in the paper. Thread.
Setup: 76 real ED cases from a Boston academic medical center. o1 and physicians both given triage info first, then progressively more workup data. Each stage: produce a differential, rank the candidates. Physicians blinded to model outputs in the comparison arm.
The headline number: o1 hit around 67% accuracy at triage. Physicians were at 50-55%. By full workup, both converged above 80%. So yes, the gap is real. But the gap is biggest exactly where information is sparsest, and that detail matters a lot.
Why it matters: at triage, the cognitive task is generating a wide net of plausible diagnoses. Humans are systematically narrow-net generators. The clinical literature calls this premature closure. Researchers estimate roughly 12 million Americans experience diagnostic error per year. This study is a controlled demo of why.
Important caveat that got buried: this is text in, text out. No imaging interpretation. No real-time lab handling. No conversational diagnosis. The inputs are chart vignettes, closer to chart review than clinical care. Anyone calling physicians obsolete based on this has not read the methods.
Now the finding almost nobody covered. The human-plus-AI condition did not outperform AI alone. Every health AI pitch deck assumes physician plus machine equals better than either alone. That is the entire copilot premise. This paper puts that assumption on the back foot empirically.
The mechanism: automation bias plus anchoring. Physicians used the model’s ranked list as a starting point. They accepted incorrect rankings too often. They discounted correct AI suggestions that conflicted with their own prior. Net result: roughly a wash. Radiology has seen this same pattern for a decade.
Subscribe to www.onhealthcare.tech for free and paid articles, podcasts, and more. For a further deep dive on the topic, see article.





