Qwen Team at Alibaba Unveils Revolutionary AI Models for PC and Smartphone Control

In the ever-evolving landscape of artificial intelligence, Chinese tech behemoth Alibaba is making waves with its latest innovation, the Qwen2.5-VL, a family of AI models engineered to bring groundbreaking advancements to both text and image analysis. While much of the industry’s attention might be fixated on DeepSeek, Alibaba is proving that it’s not falling behind in the AI arms race.

The impressive capabilities of Qwen2.5-VL are turning heads, as these models are adept at parsing documents, understanding videos, and counting objects within images. They even possess the capability to interact with a PC, showcasing a level of versatility similar to OpenAI’s Operator. Alibaba’s Qwen team has reported that their new model outperforms major contenders like OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash in several key areas including video comprehension, mathematical problem-solving, document analysis, and question answering.

Qwen2.5-VL promises an array of practical applications with the ability to analyze charts, extract information from documents like invoices and forms, and even process lengthy video content. It can also identify characters and products from films and TV series, hinting at a broad scope of training data that might include copyrighted material.

However, as with many AI systems developed in China, the use of Qwen2.5-VL comes with certain restrictions. The model is subject to oversight by China’s internet regulators, ensuring it adheres to “core socialist values.” This means it may sidestep discussions on politically sensitive matters, such as criticisms of Chinese leadership or topics like Taiwan’s sovereignty.

One standout feature of Qwen2.5-VL is its interactive capability with both desktop and mobile applications. A demonstration showed the model executing a complex task on an Android device, opening the Booking.com app, and booking a flight from Chongqing to Beijing.

Despite the inventive functionalities, there are limitations. Qwen’s performance in realistic computer environments has been called into question, with lower scores on benchmarks like OSWorld, which tests the model’s effectiveness in managing desktop tasks beyond basic operations like tab switching.

The Qwen2.5-VL series includes two smaller models, Qwen2.5-VL-3B and Qwen2.5-VL-7B, which are available under a permissive license for wide usage. The flagship model, Qwen2.5-VL-72B, is protected by Alibaba’s custom license, which necessitates obtaining permission for commercial use by any companies or developers with over 100 million monthly active users.

With Qwen2.5-VL, Alibaba positions itself as a formidable player in the AI field, delivering models that combine sophisticated capabilities with practical applications, propelling the company—and indeed the industry—into new territories.