Apply for Trial
Newsroom

SenseTime's SenseNova V6: China’s Most Advanced Multimodal Model with the Lowest Cost in the Industry Integrating AI into Everyday Life

2025-04-11

SenseTime launched its newly upgraded large model series, SenseNova V6, at its Tech Day event held in several locations, including Shanghai and Shenzhen. Leveraging advances in the training of multimodal long chain-of-thought (CoT), global memory, and reinforcement learning, the model delivers industry-leading multimodal reasoning capabilities while setting a new benchmark for cost efficiency.


The capabilities of the SenseNova V6 model have been greatly enhanced, with strong advantages in long CoT, reasoning, mathematical capabilities, and global memory. Its multimodal reasoning capabilities ranked first in China when benchmarked against GPT-o1, while its data analysis performance outpaced GPT-4o. It also combines high performance with cost efficiency. Its multimodal training efficiency is aligned with that of language models, providing the lowest training costs in the industry. Its reasoning costs are also the lowest in the industry. The new lightweight full-modal interactive model, SenseNova V6 Omni, delivers the most advanced multimodal interactive capabilities in China. It is China’s first large model that supports in-depth analysis of 10-minute mid-to-long form videos, benchmarked against Gemini 2.5 Turbo to be among the strongest in its class.


Dr. Xu Li, Chairman of the Board and CEO of SenseTime, said, “AI’s true purpose is found in our everyday lives. SenseNova V6 has pushed past the boundaries of multimodality, unlocking infinite possibilities in reasoning and intelligence."


Screenshot 2025-04-11 at 3.04.44 PM.png


Multimodal long-chain reasoning, reinforcement learning, and global memory: SenseNova V6 leads the way in enabling multimodal deep thinking 


As a native Mixture of Experts (MoE)-based multimodal general foundation model with over 600 billion parameters, SenseNova V6 has achieved multiple technological breakthroughs. A single model is able to perform a range of tasks across text and multimodal domains, including:

- Long CoT: Trained on over 200B high-quality multimodal long CoT data, with the longest CoT reaching 64K;

- Mathematical Capabilities: Significantly outperformed GPT-4o in data analysis capabilities;

- Reasoning Capabilities: Ranked first in China for multimodal deep reasoning, benchmarked against GPT-o1;

- Global Memory: First in China to achieve long-form video understanding, supporting content of 10 minutes in length for comprehension and deep reasoning.


In leading benchmark evaluations of reasoning and multimodal capabilities, SenseNova V6 achieved state-of-the-art results across multiple metrics:


Slide1.jpeg

Key indicators: SenseNova V6 demonstrated strong overall performance in language tasks, on par with leading international models. It excelled in multimodal capabilities, with outstanding results in all aspects. Both its language reasoning and multimodal reasoning capabilities are benchmarked against leading international models such as GPT-4.5 and Gemini 2.0 Pro.

Slide2.jpeg

Strong reasoning capabilities: From SenseNova 5.5 to V6/V6 Reasoner, the SenseNova unified model demonstrated significant improvements in reasoning performance. In independent evaluations, it has surpassed both GPT-o1 and Gemini 2.0 Flash Thinking across multimodal and language-based deep reasoning tasks.

 

Based on more than 200B of high-quality multimodal long CoT data, SenseTime leverages multi-agent collaboration to synthesize and verify long CoT. SenseNova V6 has developed exceptional multimodal reasoning capabilities, supporting multimodal long CoTs up to 64K tokens, enabling the model's long-term thinking capability.


In solving complex real-world problems, SenseNova V6 utilizes its robust hybrid image and text understanding and reasoning capabilities to help users with a range of tasks.


For complex document processing scenarios, SenseNova V6 is able to help users with difficult tasks through its strong multimodal reasoning capabilities. For example, in insurance claims processing, SenseNova V6 can assess whether thesubmitted commercial health insurance claims meet the requirements. It can detect issues such as unnecessary prescriptionsand examinations, missing documents, or incomplete submissions.

 

Leveraging breakthroughs in multimodal reinforcement learning, SenseTime has developed a hybrid reinforcement learning framework for various image-text tasks, based on different difficulty levels and multi-reward models.

 

China's first model to break the 10-minute barrier in video understanding, achieving analysis of extended contentwithin seconds

 

With its global memory capability, SenseNova V6 overcomes the limitations of traditional models that could only support short videos, and now supports full-framerate analysis of 10-minute videos.

 

With advanced comprehension capabilities, SenseNova V6 is also able to intelligently edit and extract video highlights, helping users to retain memorable moments.

 

SenseTime’s proprietary technology aligns visual information (images), auditory information (speech and sounds), linguistic information (subtitles and spoken language), and temporal logic to form a multimodal unified sequential representation. Based on this framework, it applies fine-grained cascading compression and content-aware dynamic filtering to achieve high-ratio compression of long videos. A 10-minute video can be compressed into 16K tokens while retaining key semantics.

 

Human-like interaction: SenseNova V6 Omni launches with multi-industry deployment 

 

With the launch of SenseNova V6, SenseNova’s has upgraded its real-time interactive unified large model to SenseNova V6 Omni, with deep optimizations across scenarios, including role-playing, translation and reading, cultural tourism guiding, picture book narration, and mathematical explanation. 

 

In translation and reading scenarios, SenseNova V6 Omni enables users to achieve precise spatial interactions with a simple finger gesture. The model also accurately understands the relationship between local and global information, providing a more intuitive and human-like interactive experience.

 

SenseNova V6 Omni features more human-like perceptual and expressive abilities, as well as emotional understanding. It has been deployed across multiple industries and scenarios, including embodied intelligence, becoming the first commercialized full-modality real-time interactive model in China.

 

Full-featured version of SenseChat launched, now available for preview

 

SenseTime has released a comprehensive update to SenseChat, along with a brand-new app built on the complete capabilities of SenseNova V6. Through a single access point, users can engage in seamless multimodal interactive streaming experiences across text, images, and video.

 

The SenseChat app is available for preview and SenseNova V6 is now available for trial via the SenseChat web platform at www.chat.sensetime.com.

 

RMB100 million in vouchers released to accelerate full-stack scenario implementation

 

SenseTime also announced a dedicated subsidy of RMB100 million, aimed at advancing emerging fields such as embodied intelligence and AIGC. Through targeted and multi-dimensional initiatives, SenseTime is delivering a one-stop solution designed for high efficiency, low cost, and end-to-end AI implementation,  spanning expert consulting, model training, and reasoning validation.