news 2026-04-23 · huggingface-papers

🧩 Train Giant AI Models 3x Cheaper by Cloning Their Own Experts

What if you could triple your AI model's capacity without tripling the training cost?

A new paper from Amazon researchers introduces "Expert Upcycling" — a method that grows Mixture-of-Experts (MoE) models by duplicating their existing experts and letting them specialize further, instead of training larger models from scratch.

The problem is clear: training frontier AI models costs millions in compute. Every time you want a bigger model, you start over — throwing away everything the smaller model already learned.

Expert Upcycling flips this entirely. Take a trained MoE model, clone its experts (8 become 16 or 32), and continue training. The clones inherit their parent's knowledge, then gradually develop unique specializations.

🎯 The results are striking:

Models matched from-scratch quality using only **32% of the GPU hours** — roughly 3x savings
Inference speed stays identical — same number of experts activated per token
A smart "utility-based" selection clones the best experts more, boosting results further
Validated across multiple model scales, architectures, and budgets

Think of it like a restaurant: instead of hiring and training 24 new chefs, your 8 best chefs each train a protégé who starts at their level and develops their own style. The kitchen triples in capacity, but each dish still takes the same time.

This could be a turning point in making large-scale AI accessible beyond just the biggest labs.

📄 Source

huggingface-papers

← Previous

A Cyberpunk Short Film Made Entirely with Open-Sou

GPT-5.5 System Card: OpenAI's Smartest Model Gets