- News and Stories
Appearing on CCTV, Praised by Netizens as a "Highlight of AI in China"! SenseTime’s SenseAvatar Makes Another Breakthrough with Image-to-Digital-Human Technology
At the recently aired "2025 China AI Gala" – a major CCTV event focusing on the development of the artificial intelligence industry – three AI digital avatars of singers Shi Yijie, Zhang Yingxi, and Cai Chengyu, created by the SenseAvatar team, performed the classic opera aria Nessun Dorma on stage alongside their real-life counterparts, and also sang in seven languages. Additionally, an AI digital avatar of Alan Turing, the "father of artificial intelligence," appeared at the gala. After the program was broadcast, the total views of related videos across all platforms exceeded 100 million. Netizens praised it as a "highlight of AI in China" and exclaimed that it "truly opens the door to a new world." This marks the second consecutive year that SenseAvatar has participated in this national-level technology event centered on AI.
SenseAvatar Image-to-Digital-Human Technology: One Photo + One Audio Clip = High-Quality Videos in Minutes
The AI digital avatars showcased at the event are powered by SenseTime’s proprietary SenseAvatar Image-to-Digital-Human Technology. Leveraging the multimodal capabilities of SenseTime’s SenseNova Multimodal Large Model, the technology only requires users to upload a photo of a person and input a text script or audio file, the AI will then automatically generate a dynamic video of the person, supporting multiple languages. It achieves lip-sync, movement coordination, and emotional expression that are almost indistinguishable from real humans.
Unlike most image-to-video tools on the market that are limited to 5-10 second short clips, SenseAvatar Image-to-Digital-Human Technology can stably generate dynamic videos of over 3 minutes, maintaining high consistency in character identity and background style throughout. Furthermore, it can accurately drive the character’s body movements via prompts (e.g., "waving hands"), even supporting fast and large-scale hand movements, solving the industry pain points of "stiff movements" and "disconnection from content."
Diverse Styles & Wide Applications Across Marketing, Education, Cultural Tourism, Social Entertainment, and More
In terms of applications, SenseAvatar Image-to-Digital-Human Technology demonstrates strong flexibility, providing creators and enterprises with unprecedented room for expression. It supports API integration and private deployment to ensure data security and meet customized needs. Real-person photos, anime characters, 3D virtual figures, and even animal figurines can all serve as " sources of inspiration”. The AI digital humans not only achieve Rap-level lip-sync accuracy but also display different emotions and body movements based on prompts. They also support multi-person scenarios, allowing users to specify which character speaks.
Commercial Marketing: Enterprises only need a photo of an employee or model to quickly generate promotional videos or multilingual product explanations, significantly reducing filming and translation costs.
Media Communication: Self-media creators can use their own photos to generate talk show-style short videos; news organizations can achieve multilingual broadcasts using 3D character images.
Education & Culture: Photos of renowned teachers can "come to life" to explain knowledge points; restored images of historical figures in museums can tell stories behind cultural relics to visitors.
Cultural Tourism: Tourists can generate their own "time-travel short dramas" using a single photo, making travel experiences more immersive and interactive.
This breakthrough enables the technology to truly serve scenarios such as social entertainment, live-commerce marketing, teaching courses, brand stories, and cultural interpretation, breaking the long-standing "video duration bottleneck" in the industry. From traditional reliance on filming and modeling to now "generating realistic videos from one photo," SenseTime is reshaping the digital human industry landscape, allowing the public to participate in content creation with lower barriers and higher efficiency.