- News and Stories
SenseTime Launches the Enhanced SenseNova V6.5: Marking the Leap from AI as a “Tool” to a “Partner”
At the “AI+: Large Model Shapes the Future” WAIC 2025 Large Model Forum organized by SenseTime, the company launched its enhanced SenseNova V6.5 large model system. This breakthrough in multimodal models advances AI from a “productivity tool” to a “productivity driver”. In parallel, SenseTime’s flagship product, Raccoon, also received its “intelligent agent” upgrade.
Dr. Xu Li, Chairman of the Board and CEO of SenseTime, said: “SenseTime has always been committed to advancing the frontiers of AI. Through continuous innovation, we aim to evolve AI’s role from being a tool to a core engine of productivity.”
SenseNova V6.5: Breakthrough Upgrades that Deepen AI's Understanding at Scale
SenseNova V6.5 delivers three significant enhancements in its multimodal model:
Stronger Reasoning: Introduces intertwined visual-textual “multimodal thought chains,” achieving inference capabilities on par with Gemini 2.5 Pro and Claude 4-Sonnet.
Higher Efficiency: Optimized multimodal architecture improves performance-to-cost ratio by over threefold.
Intelligent Agents: Advanced data analysis and end-to-end deployment capabilities enable closed value loops.
SenseTime’s SenseNova V6.5 significantly improves multimodal reasoning and interaction by evolving from standard multimodal thought chain data to synthesized, interleaved image-text thought chains.
Both text reasoning and multimodal reasoning have improved significantly, surpassing Gemini 2.5 Pro and Claude 4‑Sonnet; its multimodal interaction performance surpasses Gemini 2.5 Flash and GPT‑4o, with strong overall results.
SenseTime’s SenseNova V6.5 is the first to achieve a breakthrough with interleaved image-text thought chain technology, introducing visual thinking into large models, making it the first commercial-grade model in China to achieve true image-text interleaved reasoning.
In human cognition, visual and logical thinking are equally vital, and it is only through their integration that comprehensive reasoning can emerge. Mainstream multimodal models currently may accept various input modalities, but their reasoning processes still rely heavily on language, with graphical and spatial reasoning remaining underdeveloped.
Constructing multimodal thought chains hinges on the ability to represent information visually. Unlike purely text-based reasoning chains, this process requires both the articulation of logical steps and the generation of images as reasoning nodes, a challenge that cannot be addressed through manual methods at scale. To overcome this, SenseTime’s R&D team first developed seed data grounded in a deep understanding of cognitive processes, then applied supervised fine-tuning (SFT) to instill foundational image-text reasoning capabilities. Multiple rounds of reinforcement learning further enhanced the model’s multimodal inference capabilities.
SenseTime has also enhanced the integrated architecture of its multimodal models to enable early-stage cross-modal integration. The new design incorporates a significantly streamlined visual encoder and a deeper, narrower backbone model, allowing visual representations to align and integrate with language at the early stages of feedforward computation. This results in more efficient perception and deeper modality integration.
With enhancements to its model architecture, SenseNova V6.5 delivers over 20% improvement in pretraining throughput, a 40% increase in reinforcement learning efficiency, and more than 35% gain in inference throughput, all while significantly reducing overall cost. Compared to its predecessor, SenseNova V6.0, V6.5 achieves a threefold improvement in cost-performance ratio, striking an optimal balance between scalability and efficiency.
Redefining Productivity: Racoon, SenseTime’s Most Powerful AI Agent for the Workplace
Large language models have become popular tools for enhancing workplace productivity. However, relying on language models alone is not enough to advance AI from being a “tool” to becoming a true “agent”. To evolve from a productivity tool into a source of productivity itself, the key lies in the ability to process multimodal information across input, reasoning, and output.
Powered by the advanced multimodal data analysis capabilities of SenseNova V6.5, SenseTime’s Raccoon has been comprehensively upgraded. It is able to manage complex multimodal inputs, perform deep multimodal fusion and analysis, and deliver multimodal outputs with professional-grade visualizations. This marks a leap from AI as a mere productivity tool to a true productivity engine, empowering the workplace through AI-powered productivity.
SenseTime’s Raccoon continues to maintain a global lead in complex data analysis capabilities. In comprehensive evaluations across real-world client scenarios, it reached performance levels on par with Claude 4 Opus – the international benchmark in data analysis and intelligent agents – while significantly outperforming models such as OpenAI’s o3. It consistently achieved near-perfect accuracy rates approaching 100% in tasks involving temporal computation, data matching, mathematical reasoning, and anomaly detection.
Leveraging multimodal reasoning, SenseTime’s Raccoon is capable of performing comprehensive analyses through multi-step thinking and reflection via thought chain construction to deliver structured outputs. For instance, when presented with a complex Excel file containing merged cells, missing values, nested tables, embedded charts, and external images, Raccoon can accurately interpret the content, establish logical relationships between sub-tables, and generate a complete analytical report.
Traditional AI tools primarily serve as assistive technologies, with users remaining responsible for core tasks. In contrast, SenseTime’s Raccoon redefines the interaction paradigm by proactively taking on core responsibilities. Through targeted inquiries and user confirmation of key information, it establishes an interaction logic akin to workplace collaboration. This innovative mode of engagement enhances clarity and improves user comprehension.
With its robust capabilities in managing complex tasks, SenseTime’s Raccoon is rapidly expanding into industry-specific applications. Two customized versions have been introduced for the education and finance sectors.
Raccoon – Education Edition: It is designed to intelligently analyze student performance, curriculum effectiveness, and learning behavior patterns. It has already been adopted by over 500 institutions across more than 10 educational scenarios, supporting over 250,000 teachers and students. The platform has helped improve learning efficiency by 15–30%, reduced academic anxiety by 40%, increased classroom engagement by 2.1 times, lowered resource mismatch rates by 30%, and improved the timeliness of mental well-being interventions by 50%.
Raccoon – Finance Edition: It offers knowledge assistants, intelligent data query tools, and multimodal AI-powered claims solutions to the finance sector, enabling a new model of intelligent, human-AI collaborative decision-making in finance.
At present, the SenseTime Raccoon suite of products has been widely adopted across a range of industries, with its user base surpassing 10 million.
By harnessing multimodal technologies to unlock AI productivity, SenseTime’s SenseNova large model will continue to evolve, advancing alongside industry partners toward the next phase of AI development and accelerating the path to AGI.