news 2026-04-23 · huggingface-papers

AI Scientists Get Results but Don't Actually Reason Like Scientists

What if AI can ace the test but doesn't understand the subject?

A sweeping new study tested AI scientific agents across 25,000+ runs in eight domains — and found a troubling gap between performance and understanding.

The headline numbers are stark: AI agents ignored evidence in 68% of reasoning traces. Only 26% showed refutation-driven belief revision — the cornerstone of the scientific method. When data contradicted their hypothesis, most AI systems simply plowed ahead.

Perhaps most concerning: the base model accounts for 41.4% of explained variance in outcomes, while the entire agent scaffold contributes just 1.5%. In other words, it barely matters how you wrap the AI — the underlying model's limitations dominate.

Even when researchers fed the agents near-complete successful reasoning trajectories as examples, the epistemic failures persisted. The AI could follow scientific workflows step-by-step, but consistently failed to exhibit the self-correcting reasoning patterns that make science reliable.

The researchers put it bluntly: "Current LLM-based agents execute scientific workflows but do not exhibit the epistemic patterns that characterize scientific reasoning."

Their conclusion carries a clear warning: until reasoning itself becomes a training objective, the scientific knowledge AI produces cannot be justified by the process that generated it.

This doesn't mean AI is useless for research — but it means human oversight remains essential, especially in high-stakes domains like drug discovery and materials science where wrong answers carry real consequences.

📄 Source

huggingface-papers

← Previous

Same Prompt, 9 AI Image Models — The Results Are E

🔓 Just Teaching AI to Listen Better Destroys Its