Apply for Trial
News and Stories

A First in China: SenseTime’s SenseNova Receives the Highest Rating in the China Academy of Information and Communications Technology’s Inaugural Multimodal Large Model Assessment

2025-06-11


The China Academy of Information and Communications Technology (CAICT) has recently completed the first round of its Trusted AI Multimodal Large Model Evaluation. SenseTime’s SenseNova Unified Large Model achieved the highest possible rating of Level 4+, making it the first model in China to receive this top-tier certification.

Picture 1.png

Launched in January 2025, the Trusted AI Multimodal Large Model Evaluation is led by CAICT’s Institute of Artificial Intelligence and follows the Technical Requirements and Evaluation Methods for Multimodal Foundation Models, a standard jointly developed by over 60 organizations across the industry. It is considered one of the leading and most influential assessments for multimodal large models in China.

The evaluation covers four modules: core capabilities, comprehension, generation, and specialized testing, encompassing two core competency domains and over 30 specific capability indicators. It provides industry guidance for the R&D and deployment of multimodal large models.

According to the results, the SenseNova Unified Large Model demonstrated outstanding performance in core technical capabilities such as multimodal fusion, modality conversion, cross-modal perception, cross-modal comprehension, and cross-modal generation. In terms of practical applications, it excelled in coverage across industries, scenario adaptability, diversity of applications, and ease of use, which showcase robust real-world utility.

In January this year, SenseTime launched the SenseNova Unified Large Model, becoming the first in the industry to implement a natively integrated architecture, pioneering the unification of large language models and multimodal large models. In April, SenseTime launched SenseNova V6, featuring breakthroughs in multimodal reasoning through advancements such as extended multimodal thinking chains, global memory, and reinforcement learning. These technologies enable the model to naturally integrate and process information from text, images, video, and audio to solve complex, real-world problems.

For example, in real-time audio and video interactions, SenseNova V6 delivers strong capabilities in live interaction, visual recognition, memory-based reasoning, continuous dialogue, and complex inference, enabling AI to engage in more natural and fluent conversations with humans and offering a next-generation human-computer interaction experience. In video generation, it is capable of multimodal video analysis, including full-frame analysis of videos up to 10 minutes long, and can generate new video content based on analytical insights.

SenseNova has already been deployed across various sectors including in education, finance, and industry. SenseNova’s top rating in CAICT’s inaugural Trusted AI Multimodal Large Model Evaluation highlights its industry-leading reasoning capabilities. It also demonstrates the model’s ability to meet the rigorous standards of generalization, versatility, and specialization required for real-world applications, building a strong foundation for the trusted development of AI across industries.