Apply for Trial
News and Blog

SenseTime Fully Open-Sources SenseNova U1: A Unified Model for Understanding and Generation

2026-04-29

Compact, Efficient, and Capable of High-Quality
Infographics and Continuous Image–Text Creation

 

SenseTime (00020) today announced the release and open-sourcing of SenseNova U1, its native unified multimodal model series. Built on the self-developed NEO-unify architecture introduced in March, the models in this series unifies multimodal understanding, reasoning, and generation within a monolithic model framework. Through efficient synergy between language and vision, SenseNova U1 enhances both understanding and content generation while maintaining semantic integrity and pixel-level fidelity, supporting complex infographic creation. It is also the industry’s first model to deliver the capability of continuous image–text creation within a unified architecture.

In domains such as logical reasoning and spatial intelligence, SenseNova U1 is able to understand complex layouts and fine-grained relationships in the physical world. This capability provides a critical foundation for future embodied AI systems, enabling robots to complete the full cycle of perception, reasoning, and precise task execution within a single model. Such an end-to-end approach represents an important step in advancing both technological development and industrial deployment.

Conventional multimodal models typically adopt a compartmentalized design, bridging a visual encoder (VE) with a language backbone through intermediate adapters. This approach resembles a workflow in which each component operates independently: one processes images, another converts visual content into text, a third interprets language, a fourth performs reasoning, and yet another translates outputs into design instructions before the final image is generated. As information must be transferred across separate components, overhead is incurred and semantic or visual fidelity is often compromised. To offset these structural limitations, such models generally require significantly more parameters, increasing complexity without fully addressing the underlying inefficiencies.

The NEO‑Unify architecture addresses these limitations by moving away from the conventional model design described above. It completely eliminates both the visual encoder (VE) and the variational auto‑encoder (VAE), and instead establishes a unified representation space. On this basis, SenseNova U1 operates as a single unified system capable of handling multiple modalities simultaneously. Images and text are processed within the same cognitive framework rather than being translated and handed off across separate components. By fusing language and vision at a foundational level, the architecture significantly reduces information loss and enables efficient multimodal understanding and generation, even at a relatively compact model scale.

The current open‑source release introduces the lightweight SenseNova U1 Lite series, which is available in two configurations:

  • SenseNova U1‑8B‑MoT — built on a dense backbone

  • SenseNova U1‑A3B‑MoT — built on a mixture‑of‑experts (MoE) backbone

 

Small Scale, Big Capability: Compact, Efficient Model with Performance Comparable to Commercial Models

Benchmark results highlight the performance characteristics of the SenseNova U1 Lite series. Across evaluations covering image understanding, image generation and editing, spatial intelligence, and visual reasoning, the models deliver leading results among open‑source models of comparable scale, setting a new benchmark for unified multimodal understanding and generation.


With its compact 8B MoT configuration, SenseNova U1 Lite matches, and in certain cases exceeds, the performance of larger commercial closed‑source models, demonstrating advantages across multiple tasks and application domains. This embodies the principle of “small scale, big capability.”


In general image generation benchmarks, SenseNova U1 Lite achieves commercial‑grade output quality comparable to Qwen‑Image 2.0 Pro and Seedream 4.5, while delivering meaningful gains in inference speed, supporting more efficient deployment in practical applications.


In the more demanding area of complex infographic generation, a task that has historically posed challenges for open‑source models, SenseNova U1 Lite attains commercial level performance, demonstrating strong control over layout coherence and text rendering accuracy.

 

Industry‑First Continuous Image–Text Creative Generation

Building on the strengths of the NEO‑Unify architecture, SenseNova U1 is the first model in the industry to achieve continuous image–text creative generation. Through native cross-modal understanding and generation, the model preserves fused visual and textual signals within contextual information, ensuring strong stylistic consistency and enabling efficient, coherent reasoning within a unified representation space. As a result, users can generate high-quality outputs within a single, one-shot model call, delivering significant efficiency gains compared with traditional multimodal approaches.

 

The SenseNova U1 Lite series is now fully open source and available for deployment and online use:

Open-Source Deployment:

Online Experience & Access: Available soon via SenseTime’s office AI assistant, “Office Raccoon.”

 

SenseTime plans to continue advancing along this technical pathway, and will release larger‑scale models capable of delivering world‑class performance at significantly lower computational cost. The company believes that native unified multimodal intelligence represents a foundational step towards Artificial General Intelligence (AGI) and will continue to strengthen its open‑source ecosystem. Future iterations of the U1 series will include models with higher parameter counts, and SenseTime welcomes feedback from the global developer and research community to help shape the next generation of intelligent interaction.


Appendix I

Examples of SenseNova U1 Lite’s Capability: Demonstrating Commercial‑Grade Complex Infographic Generation Capabilities


image.png

image.png

image.png

image.png
image.png


Appendix II

Examples of SenseNova U1 Lite’s Capability: Delivering Coherent, High‑Fidelity Image–Text Interleaved Reasoning


Task 1: Medium‑Rare Steak Preparation
SenseNova U1 can reason through a complete cooking workflow, generating step‑by‑step instructions accompanied by corresponding images, while maintaining a consistent visual  style throughout.


image.png



Task 2: Drawing an Iron Man Pattern
SenseNova U1 is able to iteratively refine a scanned sketch into a fully realized  final image. Each stage extends the structure and detail of the previous output, with the unified representation space ensuring continuity, accuracy and visual fidelity across the entire creation process.

image.png

 

Appendix III

Superior Benchmark Performance of SenseNova U1 Lite


image.png

In general image generation tests, SenseNova U1 Lite delivers commercial‑grade quality comparable to Qwen‑Image 2.0 Pro and Seedream 4.5, while offering significant advantages in inference speed.



image.png

Even in the highly demanding area of complex infographic generation, a domain win which open‑source models have long faced limitations, SenseNova U1 Lite achieves commercial-grade performance, demonstrating strong control over layout structure and text rendering accuracy.