Apply for Trial
Newsroom

Three Steps, Zero Coding: Instantly Transform Your Robot into an “Interaction Expert” with SenseTime’s SenseNova V6

2025-06-06

With just three steps – powering on, installation, and launch – SenseNova V6 Omni (Omni), SenseTime’s multimodal large model, is able to equip robots with real-time audiovisual interaction capabilities, empowering embodied AI hardware to “see, hear and speak”.


Step 1: Power on and connect your robot.

Step 2: On the robot’s system, unzip the “SenseNova V6 Software Package,” run the configuration interface, enter the API key, and click start to complete installation.
Step 3: Launch the program the robot is ready to have fluent conversations with users.

 

Seamless Access with Zero Coding

The system automatically adapts to standard input and output devices commonly used in embodied AI platforms, minimizing integration time and effort. Omni enables a seamless and natural multimodal experience, including voice interactions and visual recognition.

 

SenseNova V6 Omni: The All-in-One Interaction Model

As the streaming interaction version of SenseTime’s SenseNova V6 family of multimodal large models, Omni is designed for versatility and empowers machines with five core perceptual capabilities:

  • Omni-Listen: Real-time voice recognition for accurate intent understanding

  • Omni-Vision: Visual recognition for contextual multimodal interactions

  • Omni-Speak: Natural speech synthesis with support for voice customization and cloning

  • Omni-Think: Robust logical reasoning and knowledge processing

  • Omni-Sense: Multidimensional integrated information with dynamic memory storage

 

Key Features at a Glance

  • Real-time audiovisual interactions: Enables streaming input and output, expanding beyond text-based interfaces to support more natural and dynamic interaction.

  • Low-latency response: Delivers an initial reply in under 2 seconds, enabling responsive and natural conversations.

  • Customizable personalities, emotions, and actions: Configure interactive agents using prompt engineering and knowledge base inputs; supports voice cloning, emotional tone modulation, and scenario-based action sets.

  • Global memory: Supports ≥32k image-text memory with dynamic long-term storage.

  • Comprehensive knowledge base: Integrated third-party professional datasets with real-time web searches enhance the accuracy and breadth of the responses.

 

Get Started Now

The system requires no complex development, and can be simply installed and deployed. This streamlined process enables embodied AI devices to operate with natural, multimodal interaction out of the box.


Limited time offer: New embodied AI users will receive an additional 10,000 minutes of trial usage.

For more information, please email: SenseNova-5o-support@sensetime.com