🚀 Mission View: A sharper perspective on this week's top issues that matter at the intersection of health and AI.

Things with AI are always moving at a dizzying pace. And how AI intersects with healthcare is no exception. AI is pushing beyond scribes and documentation into clinical decision-making: cardiology risk assessment, neurological diagnosis, sepsis detection, and surgical navigation. OpenAI and Anthropic are moving aggressively into healthcare, including direct-to-consumer offerings that promise to put medical expertise in everyone's pocket.

But two stories this week stuck out to me and suggest we may want to pump the brakes.

When "Passing the Test" Doesn't Mean "Helping Patients"

A randomized trial published in Nature Medicine this week reveals a gap between AI's performance on paper and its usefulness to actual humans. Researchers at Oxford gave 1,298 people medical scenarios and asked them to identify conditions and decide on treatment, with or without help from leading LLMs.

In the study, participants used LLMs to identify health conditions and decide on an appropriate course of action, such as seeing a doctor or going to the hospital, based on information provided in a series of specific medical scenarios developed by doctors.

A key finding was that LLMs were no better than traditional methods. Those using LLMs did not make better decisions than participants who relied on traditional methods like online searches (i.e. Google) or their own judgment.

The study also revealed a two-way communication breakdown. Participants often didn’t know what information the LLMs needed to offer accurate advice, and the responses they received frequently combined good and poor recommendations, making it difficult to identify the best course of action. The images below via the New York Times reporting on the study, help illustrate the idea that how a user interacts and prompts an AI impacts the output.

The researchers recommended "that developers, as well as policymakers and regulators, consider human user testing as a foundation for better evaluating interactive capabilities before any future deployments."

Better testing makes sense as part of the solution, but it alone won't solve the problem, IMO. If only some patients can effectively query these systems while others struggle to articulate symptoms or interpret responses, we risk creating (or doubling down on) a two-tiered healthcare system, where AI widens existing disparities in who can successfully navigate care. Improving health literacy (a long-standing, underfunded goal) and AI literacy also have to be part of the equation.

When "FDA-Approved" Doesn't Mean "Safe in Practice"

Meanwhile, a Reuters investigation published this week documents what happens when AI medical devices move from controlled trials into operating rooms.

The case study: Johnson & Johnson's TruDi Navigation System, which added AI in 2021 to help surgeons navigate inside patients' heads during sinus procedures. Before AI was added, the FDA had received reports of seven device malfunctions over roughly three years. After AI was integrated, more than 100 malfunctions and adverse events, with at least 10 people injured between late 2021 and November 2025.

One patient suffered a stroke after the AI system allegedly "misled and misdirected" her surgeon, who inadvertently damaged her carotid artery. Part of her skull had to be removed. She's still in therapy. Lawsuits allege similar injuries to other patients.

The Reuters investigation further found:

  • 1,357 AI-enabled medical devices are now FDA-authorized—double the number from 2022

  • A Johns Hopkins/Georgetown/Yale study found AI medical devices had twice the recall rate of traditional devices, with 43% recalled within the first year

And just as AI medical devices are increasing in the market, the FDA's capacity to evaluate them may be decreasing. The Trump administration's DOGE cuts eliminated 15 of the 40 scientists in the FDA's AI expertise unit. Workload for remaining reviewers has nearly doubled. Former FDA scientists told Reuters that "some senior regulators have no idea how these technologies work."

Innovation ≠ Implementation. Both stories reveal the same fundamental concern: AI systems that may perform well in controlled environments but fail when real humans try to use them under real-world conditions. That's a much harder, and more expensive, issue to solve for. It's also the question a well-regulated healthcare system has always required us to answer. Before a new drug reaches patients, it goes through years of trials proving it works with real people under real conditions. Same thing with the most complex and risk-prone medical devices.

And even when AI systems are proven safe and effective, deployment without addressing health literacy creates winners and losers (as noted above). If some patients can skillfully navigate AI health tools while others can't, we run the risk of amplifying existing inequities in who can successfully navigate the system.

The AI healthcare revolution may be inevitable. But this week's evidence suggests we may be moving faster than our understanding of how to deploy these systems safely.

🛜 Field Signals: A quick hit on this week’s industry announcements, policy developments, and ethical considerations.

🏗️ Industry news

Today OpenAI retires GPT-4o, GPT-4.1, and o4-mini from ChatGPT
OpenAI announced it is sunsetting GPT-4o, GPT-4.1, and o4-mini inside ChatGPT, despite vocal user pushback asking to preserve 4o. The company is consolidating around newer model families, reinforcing how quickly frontier models now move from flagship to legacy.

OpenAI Launches Frontier Enterprise Agent Platform
OpenAI introduced Frontier, a new enterprise layer designed to manage AI agents across teams and tools. The platform treats agents as AI coworkers that can share context, coordinate tasks, and operate within company security rules. It signals a shift from standalone copilots to managed AI ecosystems.

Something Big Is Happening in AI Development
Matt Schumer, the Co-founder and CEO of Otherside AI, penned an essay (that has gotten a lot of attention) where he argues we are at a real inflection point: AI is shifting from a tool that assists developers to systems that can increasingly execute broader, less technical tasks. You can debate the pace and scope of that shift, or if the tone of the essay was a bit over the top (I vote yes). But the underlying point is harder to dismiss: pay attention. AI’s capability curve is steepening again, and the cost of ignoring it is rising.

The New Yorker Asks: What Is Claude? Anthropic Doesn’t Know, Either
Anthropic is doing something unusually honest for a frontier lab: admitting it can’t fully explain what its own model is “thinking,” then treating that as a serious research agenda rather than a PR problem. Gideon Lewis-Kraus walks through how Anthropic is studying Claude with behavioral “stress tests,” plus a few darkly funny examples (like the vending-machine experiment) that make clear why “we don’t know” is not a comforting answer. Net: the models are getting more capable faster than our ability to confidently predict their behavior.

AI’s “loud quitting” week: safety people head for the exits
The team at TechBrew reports on a cluster of high-profile departures across OpenAI, and Anthropic, which are feeding a familiar worry: the companies developing the fastest may be shedding the very people most focused on brakes, guardrails, and long-term risk. One notable Anthropic safety researcher posted a public resignation letter framing AI as part of a broader set of “interconnected crises,” while an OpenAI researcher warned that advertising incentives could collide with user trust as chatbots become an “archive of human candor.”

🩺 At the point of care

AI Expands in Clinical Care
Health systems are embedding AI more deeply into neurology, cardiology, and sepsis detection. What began as documentation support is now moving into core diagnostic and workflow functions. The shift from pilot projects to clinical integration is accelerating.

🏛 Government & policy

Federal AI Use Surges to Nearly 3,000 Projects
A Washington Post analysis found 2,987 active AI use cases across 29 federal agencies in 2025, up from 1,684 the year before. NASA leads with 420 projects, while HHS follows closely with 398.

Source: The Washington Post, “Trump set off a surge of AI in the federal government. See what happened.” February 9, 2026

AI Transparency Questions Around Federal Health Website
A new public-facing health site has raised questions about how AI tools are being deployed in government communications. The episode underscores ongoing transparency and procurement challenges as federal AI use expands.

😇 Ethics & responsible use

AI in Healthcare: More of the Same or Structural Change?
Andy Slavitt, the Co-Founder and General Partner at Town Hall Ventures (and former CMS administrator), and Toyin Ajayi, M.D., the Co-Founder and CEO at Cityblock Health offer their take on how AI should be deployed across populations, including some of those most vulnerable.

🔬Research & evidence

RCT: LLMs Improve Complex Cardiology Assessments Under Supervision
A randomized trial found that general cardiologists using an LLM assistant made fewer clinically significant errors and improved management quality. Importantly, performance improved under physician oversight, not autonomous use.

AI Doesn’t Reduce Work, It Intensifies It
An eight-month field study published in Harvard Business Review found that generative AI accelerated tasks but expanded scope and workload. Employees felt productive, but busier and more cognitively strained. The authors recommend intentional norms and pacing strategies around AI use.

🛠️ Practical Edge: Actionable tips, tools, and thoughts to help leaders strengthen capacity and apply AI in their work.

Claude Desktop Comes to Windows
Anthropic has brought full Claude desktop functionality to Windows, including multi-step task execution and local file access. For Windows-heavy enterprise environments, this meaningfully expands operational use.

Claude Opus 4.6 Now Available to Nonprofits
Nonprofits on Team and Enterprise plans now have access to Anthropic’s most capable model at no extra cost. For resource-constrained organizations, this lowers the barrier to using frontier AI tools.

Copilot Can Now Send Phone Reminders
Microsoft Copilot can now push reminders directly to mobile devices. It’s a small feature, but one that nudges AI from suggestion engine toward task execution.

PDF to Video AI
This tool converts static PDFs into narrated explainer videos with AI-generated visuals. Useful for training, education, or repurposing internal decks for broader audiences.

Note to my readers: I’d love to learn how you are using AI. If there’s a novel way you are deploying AI in your work, or seeing it utilized in healthcare, please feel free to shoot me a note and share: [email protected] 

🌅 On the Horizon: A quick look at the developments and events expected to shape the weeks ahead.

👉 Mar. 12–18, 2026 — SXSW 2026, Austin, TX

👉 Mar. 30–31, 2026 — IAPP Global Privacy Summit, Washington DC

👉 Apr. 6–9, 2026 — HumanX 2026, San Francisco, CA

And finally, if you like what you are reading, please share this newsletter with your networks and encourage them to sign up. ✍️ 🆙 And/or, give me a shout out on LinkedIn.

Till next time,

BC

Keep Reading