Apply for Trial
News and Stories

SenseTime SenseNova Open-Source Model Achieves Breakthrough in Spatial Intelligence, Surpassing GPT-5 in Key Benchmarks

2025-12-01

SenseTime has officially open-sourced SenseNova-SI series, a new addition to its SenseNova large-model family, representing a significant advancement in spatial intelligence.  The SenseNova-SI models outperform not only leading open-source multimodal models of the same scale but also top-tier closed-source models, such as GPT-5 and Gemini-3-Pro, in authoritative benchmarks for spatial understanding and reasoning. SenseNova-SI enables autonomous driving and robots to accurately perceive the distance, position and movement direction of surrounding objects.

 

While current industry-leading large models excel in knowledge acquisition, writing, reasoning and programming, they generally lack proficiency in spatial understanding and reasoning—essential foundational capabilities required for embodied agents, such as robots, to interact effectively with the physical world.

 

SenseTime has conducted in-depth innovative research on spatial intelligence, developing a systematic training methodology that underpins the SenseNova-SI series. The models are available in two specifications: 2B (approximately 2 billion parameters) and 8B (approximately 8 billion parameters).

 

SenseNova-SI significantly outperforms peer open-source models

 

 

The performance of SenseNova-SI has been rigorously evaluated using multiple authoritative spatial intelligence benchmarks, including:

 

·       VSI and MindCube, co-developed by researchers from New York University, Stanford University, and other institutions;

·       MMSI, jointly built by researchers from Shanghai AI Laboratory, The Chinese University of Hong Kong (CUHK), Tsinghua University, The University of Hong Kong, and others;

·       ViewSpatial, co-developed by researchers from Zhejiang University, University of Electronic Science and Technology of China, and CUHK; and

·       SITE, co-developed by researchers from Boston University and Microsoft Research.

 

Across these benchmarks, SenseNova-SI demonstrated exceptional performance compared to open-source/closed-source and general/spatial intelligence models. These models answered approximately 1,000 to 8,000 questions in each of the benchmark tests. With a full mark of 100 points, the SenseNova-SI-1.1-8B model achieved an average score of 60.0 across the five benchmarks—substantially outperforming open-source general multimodal models, such as Qwen3-VL-8B (41.3 points) and BAGEL-7B (35.1 points), as well as spatial intelligence models like VST-7B (43.6 points) and Cambrian-S-7B (41.1 points). Notably, despite its smaller model scale (in terms of parameter size and computational requirements, among others), SenseNova-SI-8B also outperformed top closed-source models, including GPT-5 (52.1 points) and Gemini-3-Pro-Preview (56.2 points).

The code employed in this evaluation has been made publicly available on GitHub

(https://github.com/OpenSenseNova/SenseNova-SI), enabling reproducibility of the test results.

 

Scores of Various Models on Authoritative Benchmark Evaluations

 

Model

Scores on   various authoritative evaluation benchmarks

Average

VSI

MMSI

MindCube-Tiny

ViewSpatial

SITE

Open-source Models (~2B)

InternVL3-2B

31.9

32.9

26.5

37.5

32.5

30

Qwen3-VL-2B-Instruct

37.3

50.3

28.9

34.5

36.9

35.6

MindCube-3B-RawQA-SFT

20.2

17.2

1.7

51.7

24.1

6.3

SpatialLadder-3B

36.7

44.8

27.4

43.4

39.8

27.9

SpatialMLLM-4B

31.7

46.3

26.1

33.4

34.6

18

VST-3B-SFT

40.8

57.9

30.2

35.9

52.8

35.8

Cambrian-S-3B

36.5

57.3

25.2

32.5

39

28.3

SenseNova-SI-1.1-InternVL3-2B

45.8

63.7

34.2

41.8

52.7

36.8

Open-source Models (~8B)

InternVL3-8B

38.3

42.1

28

41.5

38.6

41.1

Qwen3-VL-8B-Instruct

41.3

57.9

31.1

29.4

42.2

45.8

BAGEL-7B-MoT

35.1

31.4

31

34.7

41.3

37

SpaceR-7B

35.4

41.5

27.4

37.9

35.8

34.2

ViLaSR-7B

36.9

44.6

30.2

35.1

35.7

38.7

VST-7B-SFT

43.6

60.6

32

39.7

50.5

39.6

Cambrian-S-7B

41.1

67.5

25.8

39.6

40.9

33

SenseNova-SI-1.1-InternVL3-8B

60

68.7

43.3

85.6

54.6

47.7

Proprietary Models

Gemini-2.5-pro-2025-06

50.5

53.5

38

57.6

46

57

Gemini-3-Pro-Preview

56.2

52.5

45.2

70.9

50.4

62.2

Grok-4-2025-07-09

47.9

47.9

37.8

63.5

43.2

47

GPT-5-2025-08-07

52.1

55

41.8

56.3

45.6

61.9

Note: The maximum score is 100.

 

High-quality, large-scale training data drives performance enhancement

 

SenseNova-SI’s impressive performance is attributed to  an innovative spatial capability classification system and  the use of large-scale, diverse training data. The research team adopted a systematic approach to scale up the spatial understanding data, demonstrating the "scaling law" in the field of spatial intelligence for the first time. This breakthrough highlights that training with high-quality, large-scale data can significantly enhance a model’s spatial intelligence.

 

SenseTime’s proposed training paradigm features generality, enabling consistent enhancement across base models with different architectures (e.g., InternVL). It delivers significant improvements across six core dimensions of spatial intelligence: metric measurement,  mental reconstruction, spatial relationships, perspective-taking, deformation and assembling, and comprehensive reasoning.

 

SenseTime has released a detailed technical report (https://www.arxiv.org/abs/2511.13719), elaborating on technical details.

 

In benchmark evaluations, SenseNova-SI-1.1-8B outperformed GPT-5 in multiple questions. Following are selected examples:

 

Question: You are standing in front of the dice pattern and observing it. Where is the desk lamp approximately located relative to you?

图片9.png

  • GPT-5: B (90 degrees clockwise)

  • SenseNova-SI-1.1-8B: C (135 degrees      counterclockwise)

  • Correct answer: C

(Source: SITE-Bench)

 

Question: Based on these two views showing the same scene, in which direction did I move from the first view to the second view?

图片10.png

  • GPT-5: C (Diagonally forward and right)

  • SenseNova-SI-1.1-8B: D (Diagonally forward and left)

  • Correct answer: D

(Source: MindCube)

 

Question: This image shows the front view of the ego car. What is the future state of the yellow car?

图片11.png

  • GPT-5: C (Stationary)

  • SenseNova-SI-1.1-8B: D (Turn right)

  • Correct answer: D

(Source: SITE-Bench)

 

Question: Based on these four images (images 1, 2, 3 and 4) showing a table with a black cloth from different perspectives (front, left, back and right), with each camera aligned with the room walls and partially capturing the surroundings, from the perspective presented in image1, what is to the right of the table with the black cloth?

图片12.png

  • GPT-5: B (Lectern)

  • SenseNova-SI-1.1-8B: C (Door)

  • Correct answer: C

(Source: MindCube)

 

Spatial intelligence is a fundamental capability for world models and embodied intelligence, enabling them to understand their physical environment. In July this year, SenseTime launched the "Wuneng" Embodied Intelligence Platform, powered by "Kaiwu" World Model. It aims to equip robots and smart devices with the ability to independently explore and evolve in their physical environment.

 

The recent release of the SenseNova-SI spatial intelligence large model complements the "Kaiwu" World Model. Together, they will better address the fundamental challenges of multimodal models transitioning from the digital space to the physical environment and further advance the application of AI in scenarios such as autonomous driving and robotics.

 

In addition, SenseTime simultaneously open-sourced the EASI Spatial Intelligence Evaluation Platform and its "Leaderboard". This initiative aims to unify evaluation standards for spatial intelligence technology, continuously track and showcase the performance progress of open-source and closed-source models, provide authoritative evaluation benchmarks for academia and industry, and foster collaborative innovation.

 

The launch of SenseTime’s SenseNova-SI marks a crucial step forward in AI’s ability to understand the 3D world, laying a solid foundation for the integration of next-generation artificial general intelligence (AGI) technology in the physical environment.

 

EASI Spatial Intelligence Evaluation Platform: https://github.com/EvolvingLMMs-Lab/EASI