π§© Train Giant AI Models 3x Cheaper by Cloning Their Own Experts
What if you could triple your AI model's capacity without tripling the training cost?
A new paper from Amazon researchers introduces "Expert Upcycling" β a method that grows Mixture-of-Experts (MoE) models by duplicating their existing experts and letting them specialize further, instead of training larger models from scratch.
The problem is clear: training frontier AI models costs millions in compute. Every time you want a bigger model, you start over β throwing away everything the smaller model already learned.
Expert Upcycling flips this entirely. Take a trained MoE model, clone its experts (8 become 16 or 32), and continue training. The clones inherit their parent's knowledge, then gradually develop unique specializations.
π― The results are striking:
- Models matched from-scratch quality using only **32% of the GPU hours** β roughly 3x savings
- Inference speed stays identical β same number of experts activated per token
- A smart "utility-based" selection clones the best experts more, boosting results further
- Validated across multiple model scales, architectures, and budgets
Think of it like a restaurant: instead of hiring and training 24 new chefs, your 8 best chefs each train a protΓ©gΓ© who starts at their level and develops their own style. The kitchen triples in capacity, but each dish still takes the same time.
This could be a turning point in making large-scale AI accessible beyond just the biggest labs.
π Source
huggingface-papers