Open-Source AI Breakthrough Delivers Longer, More Coherent Video Generation

Researchers at the Swiss Federal Institute of Technology Lausanne (EPFL) have introduced a new open-source AI system designed to solve one of the biggest problems in generative video today: videos that fall apart after only a few seconds.

If you’ve tried modern AI video generators, you’ve likely noticed the same frustrating pattern. They can produce impressive results, but the clips are usually limited to roughly 5 to 20 seconds. Push for longer, and the output often starts to “drift” — characters subtly change identity, faces deform, details smear, and the scene slowly loses coherence frame by frame until it no longer makes sense.

To address this, EPFL’s Visual Intelligence for Transportation (VITA) lab developed a training technique called retraining by error recycling. Instead of treating visual glitches and deformations as failures to be discarded, the method intentionally feeds those errors back into the model during training. The idea is simple: if the model learns to recognize and recover from its own mistakes, it becomes far more stable when those mistakes inevitably emerge during longer generations.

Professor Alexandre Alahi described the concept in practical terms, comparing it to preparing a pilot for turbulent conditions rather than only training in perfect weather. The result is an AI that can steady itself instead of spiraling into randomness as time goes on.

This training approach powers Stable Video Infinity (SVI), EPFL’s new long-form video generation system. While many existing models begin to break down around the 30-second mark, SVI is designed to maintain coherent scenes and consistent subjects for several minutes or even longer, while keeping quality high.

The project is already attracting significant attention. Its code has been released openly on GitHub, where it has gained thousands of stars, and the research has been accepted for presentation at the 2026 International Conference on Learning Representations (ICLR).

Alongside SVI, the team is also introducing LayerSync, a companion technique aimed at improving how generative systems maintain internal consistency across different types of output, including video, images, and sound. Together, Stable Video Infinity and LayerSync point toward a future where AI-generated media can move beyond short demos and into truly long-form storytelling—while also supporting more dependable autonomous systems that need to stay stable over extended periods.