Alibaba Cloud Releases Qwen2.5-Omni-7B:An End-to-end Multimodal AI Model

Press Releases »

Alibaba Cloud has launched Qwen2.5-Omni-7B, a unified end-to-end multimodal model in the Qwen series. Uniquely designed for comprehensive multimodal perception, it can process diverse inputs, including text, images, audio, and videos, while generating real-time text and natural speech responses. This sets a new standard for optimal deployable multimodal AI for edge devices like mobile phones and laptops.

Alibaba Cloud Releases Qwen2.5-Omni-7B:An End-to-end Multimodal AI Model

Despite its compact 7B-parameter design, Qwen2.5-Omni-7B delivers uncompromised performance and powerful multimodal capabilities. This unique combination makes it the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications. For example, the model could be leveraged to transform lives by helping visually impaired users navigate environments through real-time audio descriptions, offering step-by-step cooking guidance by analyzing video ingredients, or powering intelligent customer service dialogues that really understand customer needs.

The model is now open-sourced on Hugging Face and GitHub, with additional access via Qwen Chat and Alibaba Cloud's open-source community ModelScope. Over the past years, Alibaba Cloud has made over 200 generative AI models open-source.

High Performance Driven by Innovative Architecture

Qwen2.5-Omni-7B delivers remarkable performance across all modalities, rivaling specialized single-modality models of comparable size. Notably, it sets a new benchmark in real-time voice interaction, natural and robust speech generation, and end-to-end speech instruction following.

Its efficiency and high performance stem from its innovative architecture, including Thinker-Talker Architecture, which separates text generation (through Thinker) and speech synthesis (through Talker) to minimize interference among different modalities for high-quality output; TMRoPE (Time-aligned Multimodal RoPE), a position embedding technique to better synchronize the video inputs with audio for coherent content generation; and Block-wise Streaming Processing, which enables low-latency audio responses for seamless voice interactions.

Qwen2.5-Omni-7B was pre-trained on a vast, diverse dataset, including image-text, video-text, video-audio, audio-text, and text data, ensuring robust performance across tasks.

With the innovative architecture and high-quality pre-trained dataset, the model excels in following voice command, achieving performance levels comparable to pure text input. For tasks that involve integrating multiple modalities, such as those evaluated in OmniBench - a benchmark that assesses models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs - Qwen2.5-Omni achieves state-of-the-art performance.

Qwen2.5-Omni-7B also demonstrates high performance on robust speech understanding and generation capabilities through in-context learning (ICL). Additionally, after reinforcement learning (RL) optimization, Qwen2.5-Omni-7B showed significant improvements in generation stability, with marked reductions in attention misalignment, pronunciation errors, and inappropriate pauses during speech response.

Alibaba Cloud unveils Qwen2.5 last September and released Qwen2.5-Max in January, which was ranked 7th on Chatbot Arena, matching other top proprietary LLMs and demonstrates exceptional capabilities. Alibaba Cloud also open-sourced Qwen2.5-VL and Qwen2.5-1M for enhanced visual understanding and long context input handling.


ข่าวAlibaba Cloud+o:memberวันนี้

Alibaba Cloud ขับเคลื่อนประสิทธิภาพ ความชาญฉลาด และความยั่งยืนในการแข่งขันโอลิมปิก Milano Cortina 2026

ระบบการจัดการขนส่ง โชว์ศักยภาพ เคลื่อนพลบุคลากรโอลิมปิกมากกว่า 80,000 ราย ผู้ถือลิขสิทธิ์สื่อได้รับสิทธิ์ในการเข้าถึงวิดีโอไฮไลต์ 4,198 รายการ ซึ่งผลิตขึ้นจากระบบ Real-Time 360? Replay ของ Alibaba Cloud โมเดลต่าง ๆ ของ Qwen ขับเคลื่อนการนำเทคโนโลยี LLM มาใช้เป็นครั้งแรก เพื่อสนับสนุนการมีส่วนร่วมของแฟนกีฬาและระบบนิเวศของโอลิมปิก กรุงเทพฯ วันที่ 2 มีนาคม 2569 อาลีบาบา กรุ๊ป (Alibaba Group) ในฐานะพันธมิตรหลักระดับโลก (Worldwide TOP Partner) ของคณะกรรมการโอลิมปิกสากล (IOC) เดินหน้าปรับ

New AI and cloud technologies support sma... Alibaba Brings Cloud-Based AI Innovation to the Olympic Winter Games Milano Cortina 2026 — New AI and cloud technologies support smarter, faster, and more...

เทคโนโลยี AI และคลาวด์รุ่นใหม่ ยกระดับการ... อาลีบาบา ส่ง นวัตกรรม Cloud-Based AI ขับเคลื่อนโอลิมปิกฤดูหนาว Milano Cortina 2026 — เทคโนโลยี AI และคลาวด์รุ่นใหม่ ยกระดับการถ่ายทอดสดกีฬาโอลิมปิกสู่ทั่ว...

In February 2026, the world's attention w... The Cloud Revolution: How Alibaba Is Transforming the Olympic Games for the AI Era — In February 2026, the world's attention will turn to Milano-Cortina a...

Article by Alibaba Cloud The industry con... From Chatbot to Agent The Unlock for Enterprise AI at Scale — Article by Alibaba Cloud The industry consensus is clear: 2025 marks the dawn of the AI Agen...