๐ค ClawGUI: The First End-to-End Framework for AI That Uses Your Phone For You
What if AI could pick up your phone, open an app, order food, and book a flight โ all by itself?
The problem is building such "GUI agents" has been painfully fragmented:
- Training environments crash and can't be reused
- Every team measures success differently
- Trained models can't actually run on real devices
Researchers have now built ClawGUI โ the first unified framework covering the entire lifecycle of GUI agents in one place.
๐ฏ What it does:
- **Train** AI to use screens on both emulators and real phones, with step-by-step reward signals
- **Evaluate** consistently across 6 benchmarks and 11+ models, achieving 95.8% reproduction accuracy
- **Deploy** to Android, iOS, and HarmonyOS through 12+ chat platforms with personalized memory
Think of it as a driving school with classroom, test track, and real roads all in one campus.
Their compact 2B-parameter model outperformed same-sized competitors by 6 percentage points โ proving better training matters more than bigger models.
The era of AI that truly operates your phone may arrive sooner than expected.
๐ Source
huggingface-papers