TH
โ† Back
news 2026-04-21 ยท huggingface-papers

๐ŸŽฌ OmniScript โ€” A Small AI That Writes Full Scripts From Long Videos

๐ŸŽฌ OmniScript โ€” A Small AI That Writes Full Scripts From Long Videos

Imagine watching a two-hour movie and having to document every scene โ€” who said what, every facial expression, every background sound, down to the exact second.

That's the reality for editors, subtitle creators, and studio analysts every single day.

Now a research team has unveiled OmniScript, an 8-billion-parameter AI model that watches long cinematic videos and generates complete, scene-by-scene scripts โ€” with precise timestamps.

What makes it remarkable: OmniScript processes both audio and visuals simultaneously. It captures dialogue, character actions, facial expressions, and sound cues, then organizes everything into a hierarchical script structure.

The model uses a progressive training pipeline with chain-of-thought reasoning and reinforcement learning with temporally segmented rewards โ€” meaning it learns to pay attention to *when* things happen, not just *what* happens.

Despite being relatively small, OmniScript significantly outperforms larger open-source models and matches the performance of Gemini 3-Pro in temporal accuracy and script quality.

The implications are huge: automated scene indexing, instant script reconstruction, competitive film analysis, and accessible video understanding at scale โ€” all from a model small enough to run without massive infrastructure.

The era of AI that truly "watches" movies is here.

๐Ÿ“„ Source

huggingface-papers
Share: Facebook ๐•
โ† Previous
๐Ÿค– MultiWorld: The AI That Simulates Entire Worlds
Next โ†’
๐Ÿค– AI Can Use Your Computer โ€” But Can It Do It Twi