LLMs exceed physicians on complex text-based differential diagnosis

aurareturn · 2026-02-13T13:54:07 1770990847

This was using o3. GPT 5.2/5.3 should be much improved.

Just like software engineering, it may be best to leave it up to the AI to do the work but let a human guide it and check it.

techblueberry · 2026-02-13T13:57:30 1770991050

I wonder if we’ll have to develop strategies for battling confirmation bias. Human review only works if the review is independent.