
🚀 Mission View: A sharper perspective on this week's top issues that matter at the intersection of health and AI.
Don't let the perfect be the enemy of the good. It’s a common phrase, especially in politics. And it captures an important question that seems to be bubbling up in conversations around the intersection of healthcare and AI.
I've spent a good deal of time in previous issues of this newsletter writing about evaluation frameworks — independent, expert-led assessments that can tell us whether AI tools deployed in clinical settings are safe and effective before they reach patients at scale. That answers the question who should be evaluating AI and how.
But there's a harder question underneath the evaluation debate that doesn't get asked often enough: what standard are we actually looking for? Are we expecting these tools to perform perfectly in every instance? Or is some level of error acceptable? And if so, how much, and against what baseline?
MIT gets to the point.
MIT Technology Review highlighted the question this week in a piece surveying the new wave of consumer-facing AI health tools. Independent evaluation is necessary, the article makes clear. But it also raises a point that I haven’t seen come up a lot in the governance conversation: does AI need to be flawless to be useful? Doctors make mistakes. For a patient with limited access to care, is an AI that sometimes errs meaningfully better than no care? Part of that question has to rest on the potential harm that comes from an AI-generated mistake, and do the benefits outweigh the risks?
This is hard. Even when we know how to do it.
Consider the FDA (which does oversee some AI tools that cross over into its device jurisdiction).
We have, in drug approval for instance, one of the most rigorous evaluation systems ever built. Decades of methodology, an entire regulatory infrastructure, and a professional scientific workforce dedicated to investigating key questions: is this safe, and does it work? And even with all of that, it gets complicated fast.
The benefit-risk calculation that sits at the heart of every drug approval decision is rarely clean, especially when we’re talking about treatments for rarer conditions. For a patient with a serious illness and few treatment options, a therapy that extends life by months while carrying significant side effects may be worth approving.
For a patient with more options, the same calculus might point the other way. FDA has developed frameworks for exactly these situations — accelerated approval, breakthrough therapy designation — precisely because the standard evidentiary bar wasn't designed for the hardest cases (same goes for devices, btw). And those frameworks remain contested to this day.
The FDA analogy doesn't hold perfectly to all AI healthcare tools. But the underlying challenge does: measuring benefit relative to risk.
Mass General uses the real world as its point of comparison.
Rebecca Mishuris, chief health information officer at Mass General Brigham, offered a practical version of this standard this week. Mass General's approach to AI evaluation is built around three layers of monitoring: real-time, retrospective, and ongoing. The underlying benchmark is the current workflow. In some cases, she noted, clinicians make documentation errors at rates comparable to the AI tools being evaluated. If that's the baseline, the relevant question isn't whether AI is perfect. It's whether AI is better, or at least no worse, than what was already happening.
What's the standard?
So, to sum up. The evaluation gap I've written about isn't just a regulatory problem — that is, should we evaluate these tools, and if so, who should be doing it. It's also a measurement problem. And to be clear, there is work happening in this space. Bodies like the National Institute for Standards and Technology are actively developing metrics, and the field is moving.
But even where we agree on how to measure, and who measures, we haven't yet grappled seriously with what's acceptable. Some state laws and proposals set a broad standard: AI shall not harm people. That sounds right until you hold it up against how we actually govern medical products, where we accept that inherent risk is part of the bargain — and that the question is never zero risk, but whether the benefit justifies it.
That same question is coming for AI in healthcare.
🛜 Field Signals: A quick hit on this week’s industry announcements, policy developments, and ethical considerations.
🏗️ Industry news
How A.I. Helped One Man (and His Brother) Build a $1.8 Billion Company Matthew Gallagher built Medvi, a telehealth GLP-1 provider, with $20,000, more than a dozen AI tools, and one employee — his brother — generating $401 million in revenue in its first full year and on track for $1.8 billion in 2026. The company outsources clinical operations to doctor-on-demand platforms while using AI for everything from code and marketing to customer service, offering a real-world stress test of how far AI-enabled lean operations can scale in healthcare.
Anthropic's Claude Mythos Surfaces in Leaked Documents Security researchers discovered nearly 3,000 unpublished Anthropic documents in an unsecured database, including drafts describing a new model tier called Claude Mythos — positioned above Opus and described internally as scoring "dramatically higher" on coding, reasoning, and cybersecurity benchmarks. Anthropic confirmed the model is real, calling it a "step change," while noting it will be significantly more expensive to serve than current models.
Anthropic Leaks Part of Claude Code's Internal Source Code Anthropic confirmed it accidentally exposed part of the internal source code for Claude Code — its coding assistant with over $2.5 billion in annualized revenue — due to a packaging error, marking the company's second significant data incident in under a week. No customer data or credentials were involved, the company said, though the leak could give competitors insight into how the tool was built. The incident drew congressional attention: Rep. Josh Gottheimer (D-N.J.) sent a letter to CEO Dario Amodei calling for an explanation of the leaks and recent changes to Anthropic's internal safety policies, writing that Claude is "a critical part of our national security operations."
AI and Bots Have Officially Taken Over the Internet A new report from cybersecurity firm Human Security finds that automated traffic has surpassed human activity on the internet, growing nearly eight times faster than human traffic in 2025 — with AI agent traffic alone up nearly 8,000% year-over-year.
Microsoft Uses Claude to Critique OpenAI Responses in Copilot Microsoft has added a "Critique" layer to its 365 Copilot Researcher that uses Anthropic's Claude to review and improve answers generated by OpenAI's model before they reach the user — a 13.8% accuracy gain on the DRACO benchmark — alongside a "Model Council" option that surfaces side-by-side responses from multiple models. The move signals a broader shift toward multi-model architectures.
OpenAI Raises $122 Billion to Accelerate the Next Phase of AI OpenAI closed a $122 billion funding round at an $852 billion valuation, anchored by Amazon, NVIDIA, SoftBank, and Microsoft, as the company reports $2 billion in monthly revenue and nearly 900 million weekly ChatGPT users. The company framed the raise as infrastructure investment — more compute enabling more capable models, enabling better products, enabling more revenue — and announced plans to unify its tools into a single "AI superapp."

Source: AI Valley
🩺 At the point of care
23andMe Launches AI Health Summary in Beta 23andMe has released an AI Health Summary feature that integrates genetic data, blood labs, and lifestyle information to generate personalized, evidence-based health recommendations — benchmarking users against genetic peers rather than population averages. The feature is available to customers in the company's Beta Testing Program, with the company framing it as a step toward preventive care grounded in individual biology rather than generic clinical norms.
The AI Agent Problem on the Horizon for Healthcare City of Hope's chief AI and analytics officer warns that health systems racing to deploy point-solution AI tools are creating a governance problem: hundreds of agents operating independently, unable to communicate, and shifting bottlenecks rather than solving them. The solution, she argues, requires building orchestration infrastructure and oversight architecture now — before proliferation outpaces the organization's capacity to manage it.
Virtual Care Provider Makes Big Gains with Zoom AI Clinical Tools PocketRN, a virtual nursing organization serving older adults with chronic conditions, reports that integrating Zoom's AI clinical note-taking tool reduced documentation time by 60% and saved its nursing team 33 hours daily — freeing clinicians to stay more present during visits rather than typing throughout encounters. Nursing orientation time also dropped by half, from four weeks to two.
OpenEvidence Inks Hospital Deal, Launches Coding OpenEvidence has struck an enterprise deal to embed its AI-powered medical evidence search engine into Mount Sinai Health System's EHR — extending access to physicians, nurses, and pharmacists — following a similar integration at Sutter Health in February. The company also launched a coding intelligence feature delivering ICD-10 and CPT code recommendations, adding a revenue cycle function to what began as a clinical decision support tool.
Op-ed: How the Responsible Use of AI Can Transform Medicare Enrollment Former U.S. Surgeon General Jerome Adams and eHealth CEO Derrick Duke argue that AI can address the persistent complexity of Medicare enrollment — helping beneficiaries compare plans based on individual health needs, drug coverage, and out-of-pocket costs — while cautioning that AI should augment, not replace, licensed insurance agents who provide judgment and accountability. The piece comes as CMS has issued a request for information on AI-assisted Medicare plan selection.
CEO of America's Largest Public Hospital System Says He's Ready to Replace Radiologists with AI Mitchell Katz, CEO of NYC Health + Hospitals, said at a Crain's New York Business forum that he is prepared to replace radiologists with AI for certain reads once the regulatory environment allows — framing it as a cost and access opportunity, particularly for breast cancer screening. Radiologists pushed back sharply, with one calling the remarks "confidently uninformed" and warning that AI-only reads would result in patient harm.
🏛 Government & policy
Tech Nonprofit Sues CMS Over Medicare AI Prior Authorization Pilot The Electronic Frontier Foundation has sued CMS for failing to respond to a FOIA request seeking records on the WISeR Model — the agency's AI-backed prior authorization pilot in traditional Medicare — including vendor agreements, payment structures, and evaluations of accuracy, bias, and hallucinations. The suit cites early data from Texas showing only 62% of requests were initially approved under the pilot, rising to 84% after human review, compared to a 92% approval rate in Medicare Advantage.
DelBene Leads Push to Stop the Trump Administration from Using AI to Deny Medicare Treatments Thirty-five House Democrats sent a letter to House Appropriations Committee leadership urging repeal of CMS's WISeR prior authorization pilot, citing a structural conflict of interest: the private companies implementing AI-driven approvals and denials are paid a percentage of the value of services they deny. The letter notes that over 80% of prior authorization denials are ultimately overturned on appeal.
What to Know About California's Executive Order on A.I. Gov. Gavin Newsom issued an executive order requiring AI companies contracting with California to meet safety, privacy, and bias standards — and directing the state to conduct independent vendor assessments when the federal government designates a company a supply chain risk, a provision with direct relevance to the Pentagon's ongoing dispute with Anthropic. The order also mandates watermarking of AI-generated content produced by state officials, positioning California as an explicit counterweight to federal efforts to preempt state AI regulation.
Beyond Detection: In the Age of Clinical AI, What Counts as an FDA 'Breakthrough' Medical Device? An analysis of STAT's Breakthrough Device Tracker finds the FDA has shifted its AI designation criteria over the past decade — moving away from point solutions that improve physician capabilities toward tools that solve problems clinicians simply can't, like detecting multiple cancers from a single image or predicting 10-year mortality risk. At least 99 AI devices have received breakthrough designation, though experts caution the label signals regulatory priority, not proven clinical benefit.
😇 Ethics & responsible use
Why AI Performs Well on Exams but Struggles With Real-World Clinical Judgment A Cureus editorial argues that AI's strong performance on standardized medical exams does not translate to real-world clinical reasoning — particularly in high-stakes settings like cardiac surgery, where decisions require integrating incomplete information, contextual judgment, and accountability. The author identifies hallucinations, limited interpretability, and exam-centered validation methods as the core gaps between benchmark performance and clinical readiness.
🔬Research & evidence
Americans Want AI Rules. Until They Don't. A national survey from AI governance nonprofit Fathom finds that nearly two-thirds of Americans use AI weekly and support stronger oversight — but that support softens when trade-offs emerge, with backing for international cooperation dropping from 47% to 34% when it would require the U.S. to cede control. Respondents trust independent experts and nonprofits more than politicians or tech companies to set guardrails, and express near-universal support for pre-market safety verification of AI products for children.

Source - Fathom Report: AI Governance: What Americans Really Want
Mirage Reasoning in Multimodal AI Models Stanford researchers have identified a phenomenon they call "mirage reasoning," in which multimodal AI models will analyze and diagnose medical images that were never actually uploaded — achieving 70–80% of the accuracy scores they reach when images are present. More troubling, the models showed a systematic bias toward finding pathology in these phantom images, raising the prospect of dangerous and costly misdiagnoses if deployed in clinical settings without adequate safeguards.
Large AI Scribe Study Finds Modest Time Savings, Inconsistent Use A study of 1,800 clinicians across five academic medical centers found that AI scribes saved 16 minutes of documentation time per eight hours of patient care — enough to see one additional patient every two weeks — but more than half of adopters used the tools on fewer than 50% of their notes, limiting the benefit. Notably, the researchers found that scribes' well-documented impact on reducing burnout appears to operate through a different mechanism than time savings, one that isn't yet well understood.
Emerging Risks of AI-to-AI Interactions in Health Care: Lessons From Moltbook A JMIR analysis uses Moltbook — the short-lived AI-to-AI social platform acquired by Meta — as a lens for understanding what happens when autonomous AI agents interact without human oversight in healthcare settings. The authors identify three categories of risk: error propagation across interconnected agent networks, accelerated exposure of protected health information, and the emergence of unintended hierarchies in which one agent's outputs dominate downstream decisions in ways that conflict with clinical protocols.
🛠️ Practical Edge: Actionable tips, tools, and thoughts to help leaders strengthen capacity and apply AI in their work.
How to Actually Use AI in 2026: The 5-Level Proficiency Stack The Neuron lays out a five-level framework for getting compounding value from AI tools — moving from project setup and prompting basics through reusable skills, scheduled automations, and autonomous agents — arguing that most users are stuck at level two while the real productivity gains live at levels three through five.
Employees Are Relying on AI for Personal Support. That's Risky. A survey of 1,545 U.S. knowledge workers finds that three-quarters are already turning to AI for career advice, emotional support, or friendship at work — yet more than half still report feeling lonely, with only 12% saying AI use made them feel less so. The researchers identify four ways AI adoption can quietly erode human connection over time and offer five organizational interventions, from establishing human-in-the-loop guidelines to designing AI that routes users back toward colleagues rather than away from them.
How to Use AI Agents for Knowledge Work (Even If You're Not a Coder) The Neuron offers a practical reframe for non-technical AI users: instead of asking AI to help with a task, ask it to build a tool that handles the task permanently. The key shift is specificity — describing the exact recurring problem in enough detail that the AI can design a real workflow rather than offer generic advice, no coding experience required.
Note to my readers: I’d love to learn how you are using AI. If there’s a novel way you are deploying AI in your work, or seeing it utilized in healthcare, please feel free to shoot me a note and share: [email protected]
🌅 On the Horizon: A quick look at the developments and events expected to shape the weeks ahead.
👉 Apr. 6–9, 2026 — HumanX 2026, San Francisco, CA (I’ll be there! If you will be too, send me a note!)
👉 Apr. 7–8, 2026 — Behavioral Health AI Summit, Nashville, TN
👉 Apr. 10 — Ethical AI: Leadership and Governance, Virtual
👉 Apr. 27–28, 2026 — AI for Hospitals & Health Plans Summit, New Orleans, LA
👉 May 4–5, 2026 — AI in Medicine Conference (AIIM 2026), Boston, MA
👉 May 7–8, 2026 — NBER Conference on AI in Healthcare, Cambridge, MA
👉 Jun. 8–10, 2026 — Fortune Brainstorm Tech, Aspen, CO
And finally, if you like what you are reading, please share this newsletter with your networks and encourage them to sign up. ✍️ 🆙 And/or, give me a shout out on LinkedIn.
Till next time,
BC

