Alibaba Cloud Releases Qwen2.5-Omni-7B:An End-to-end Multimodal AI Model

Press Releases »

Alibaba Cloud has launched Qwen2.5-Omni-7B, a unified end-to-end multimodal model in the Qwen series. Uniquely designed for comprehensive multimodal perception, it can process diverse inputs, including text, images, audio, and videos, while generating real-time text and natural speech responses. This sets a new standard for optimal deployable multimodal AI for edge devices like mobile phones and laptops.

Alibaba Cloud Releases Qwen2.5-Omni-7B:An End-to-end Multimodal AI Model

Despite its compact 7B-parameter design, Qwen2.5-Omni-7B delivers uncompromised performance and powerful multimodal capabilities. This unique combination makes it the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications. For example, the model could be leveraged to transform lives by helping visually impaired users navigate environments through real-time audio descriptions, offering step-by-step cooking guidance by analyzing video ingredients, or powering intelligent customer service dialogues that really understand customer needs.

The model is now open-sourced on Hugging Face and GitHub, with additional access via Qwen Chat and Alibaba Cloud's open-source community ModelScope. Over the past years, Alibaba Cloud has made over 200 generative AI models open-source.

High Performance Driven by Innovative Architecture

Qwen2.5-Omni-7B delivers remarkable performance across all modalities, rivaling specialized single-modality models of comparable size. Notably, it sets a new benchmark in real-time voice interaction, natural and robust speech generation, and end-to-end speech instruction following.

Its efficiency and high performance stem from its innovative architecture, including Thinker-Talker Architecture, which separates text generation (through Thinker) and speech synthesis (through Talker) to minimize interference among different modalities for high-quality output; TMRoPE (Time-aligned Multimodal RoPE), a position embedding technique to better synchronize the video inputs with audio for coherent content generation; and Block-wise Streaming Processing, which enables low-latency audio responses for seamless voice interactions.

Qwen2.5-Omni-7B was pre-trained on a vast, diverse dataset, including image-text, video-text, video-audio, audio-text, and text data, ensuring robust performance across tasks.

With the innovative architecture and high-quality pre-trained dataset, the model excels in following voice command, achieving performance levels comparable to pure text input. For tasks that involve integrating multiple modalities, such as those evaluated in OmniBench - a benchmark that assesses models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs - Qwen2.5-Omni achieves state-of-the-art performance.

Qwen2.5-Omni-7B also demonstrates high performance on robust speech understanding and generation capabilities through in-context learning (ICL). Additionally, after reinforcement learning (RL) optimization, Qwen2.5-Omni-7B showed significant improvements in generation stability, with marked reductions in attention misalignment, pronunciation errors, and inappropriate pauses during speech response.

Alibaba Cloud unveils Qwen2.5 last September and released Qwen2.5-Max in January, which was ranked 7th on Chatbot Arena, matching other top proprietary LLMs and demonstrates exceptional capabilities. Alibaba Cloud also open-sourced Qwen2.5-VL and Qwen2.5-1M for enhanced visual understanding and long context input handling.


ข่าวAlibaba Cloud+o:memberวันนี้

Alibaba Group และ SAP ประกาศความร่วมมือเชิงกลยุทธ์ ใช้ AI เร่งการเปลี่ยนผ่านสู่ดิจิทัล

Alibaba Group (NYSE: BABA and HKEX: 9988) และ SAP SE (NYSE: SAP) ประกาศความร่วมมือเชิงกลยุทธ์เพื่อเร่งสร้างนวัตกรรมองค์กรและการเปลี่ยนผ่านสู่ดิจิทัล ผ่านการผสานเทคโนโลยีคลาวด์และ AI ประสิทธิภาพสูง โดยการนำซอฟต์แวร์องค์กรชั้นนำของ SAP มาผสานกับโครงสร้างพื้นฐานคลาวด์ที่ปลอดภัยและขยายขนาดได้ รวมถึงเทคโนโลยีด้าน AI ของ Alibaba Cloud ภายใต้ความร่วมมือนี้ Alibaba Cloud จะเข้าร่วมเป็นส่วนหนึ่งของโปรแกรม SAP Infrastructure as a Service (IaaS) Certification ซึ่งเป็นโปรแกรมล่าสุด เพื่อสนับสนุนลูกค้าที่ใช้

Alibaba Group (NYSE: BABA and HKEX: 9988)... Alibaba Group and SAP Announce Strategic Partnership to Accelerate AI-Powered Digital Transformation — Alibaba Group (NYSE: BABA and HKEX: 9988) and SAP S...

Participants to use Alibaba's latest vide... Alibaba Cloud and depa Launch "Eye for Thailand" Program to Empower Digital Creativity with AI — Participants to use Alibaba's latest video generation mod...

Alibaba Cloud, the digital technology and... Alibaba Cloud Strengthens AI Capabilities with Innovations for International Customers — Alibaba Cloud, the digital technology and intelligence backbone o...

อัปเดทโมเดล เครื่องมือ และโครงสร้างพื้นฐา... Alibaba Cloud นำนวัตกรรมเสริมขีดความสามารถด้าน AI พร้อมให้บริการแก่ลูกค้านานาประเทศ — อัปเดทโมเดล เครื่องมือ และโครงสร้างพื้นฐาน ประสิทธิภาพล้ำหน้า อาลีบา...

Alibaba Cloud, the digital technology and... Alibaba Cloud Open Sources its AI Models for Video Generation — Alibaba Cloud, the digital technology and intelligence backbone of Alibaba Group, recently...

Alibaba Cloud Establishes Second Data Cen... Alibaba Cloud Establishes Second Data Center in Thailand — Alibaba Cloud Establishes Second Data Center in Thailand with Richer Product Portfolio for Gene...