TH
← Back
news 2026-04-23 · huggingface-papers

AI Scientists Get Results but Don't Actually Reason Like Scientists

AI Scientists Get Results but Don't Actually Reason Like Scientists

What if AI can ace the test but doesn't understand the subject?

A sweeping new study tested AI scientific agents across 25,000+ runs in eight domains — and found a troubling gap between performance and understanding.

The headline numbers are stark: AI agents ignored evidence in 68% of reasoning traces. Only 26% showed refutation-driven belief revision — the cornerstone of the scientific method. When data contradicted their hypothesis, most AI systems simply plowed ahead.

Perhaps most concerning: the base model accounts for 41.4% of explained variance in outcomes, while the entire agent scaffold contributes just 1.5%. In other words, it barely matters how you wrap the AI — the underlying model's limitations dominate.

Even when researchers fed the agents near-complete successful reasoning trajectories as examples, the epistemic failures persisted. The AI could follow scientific workflows step-by-step, but consistently failed to exhibit the self-correcting reasoning patterns that make science reliable.

The researchers put it bluntly: "Current LLM-based agents execute scientific workflows but do not exhibit the epistemic patterns that characterize scientific reasoning."

Their conclusion carries a clear warning: until reasoning itself becomes a training objective, the scientific knowledge AI produces cannot be justified by the process that generated it.

This doesn't mean AI is useless for research — but it means human oversight remains essential, especially in high-stakes domains like drug discovery and materials science where wrong answers carry real consequences.

📄 Source

huggingface-papers
Share: Facebook 𝕏
← Previous
Same Prompt, 9 AI Image Models — The Results Are E
Next →
🔓 Just Teaching AI to Listen Better Destroys Its