News and Stories

SenseTime's Groundbreaking Research at ICCV 2023 Showcases China's Innovation Power

2023-10-05

October 5, 2023 – The International Conference on Computer Vision (ICCV), renowned worldwide for its focus on artificial intelligence and computer vision, is taking place in Paris, France, from October 2^nd to 6^th, 2023. As one of the top three global computer vision conferences, alongside CVPR and ECCV, this year’s ICCV received a total of 8,068 submissions, of which 2,160 were accepted, resulting in a slightly higher acceptance rate of 26.8% compared to the previous ICCV in 2021, which stood at 25.9%.

SenseTime, a leading global artificial intelligence (AI) software company, and its joint laboratories were honored to have 49 papers selected for presentation at the conference. These papers covered a wide range of cutting-edge topics related to large models and generative AI, including text-to-image generation, 3D digital humans, autonomous driving, object detection, video segmentation, and more. Leveraging SenseTime's AI infrastructure, SenseCore, and the “SenseNova” Foundation Model Sets, the company achieved numerous valuable technological breakthroughs and research innovations in the fields of generative AI and large visual models.

Prof. Wang Xiaogang, Co-founder and Chief Scientist at SenseTime, said, “SenseTime's continuous dedication to AI infrastructure development, emphasis on industry-driven academic research, and cultivation of talent at all levels are the cornerstones supporting our continuous innovative achievements on the global academic stage. We are actively embracing the new research paradigm introduced by large models, continually enhancing our R&D system, and persistently integrating fundamental research with business development. SenseTime strives to contribute valuable technological achievements to the industry.”

SenseTime's Pioneering Research Unveils Promising Advances in Large Models and Generative AI

The significant global attention towards large models and generative AI has spurred the emergence of innovative research topics. SenseTime’s paper, titled “Human Preference Score: Better Aligning Text-to-Image Models with Human Preference”, proposes that Stable Diffusion can be adapted to better align with human preferences and intentions when guided by the proposed human preference classifier. Integration of human preference significantly enhances image quality in Stable Diffusion, in particular it is able to resolve challenging cases such as awkward combinations of facial expressions and limbs.

圖1.jpg

Incorporating human preferences into Stable Diffusion model training

Furthermore, SenseTime's research team tackles the longstanding challenge of generating 3D digital humans in their paper, “SHERF: Generalizable Human NeRF from a Single Image”. In this paper, they propose SHERF, the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image. With just one inference pass on a single image, SHERF reconstructs Human NeRF in the canonical space which can be driven and rendered for novel view and pose synthesis. This groundbreaking approach not only enables 3D human reconstruction and animation but also holds immense potential in streamlining the creative process involved in producing 3D digital humans.

圖2.jpg

SHERF learns a Generalizable Human NeRF to animate 3D humans from a single image

High-quality 3D human datasets serve as the foundation for numerous perception models, reconstruction models, and generative AI related to the human-centric research. SenseTime's research team has made significant strides in this area with their paper titled “SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling”. The team introduced a new synthetic dataset SynBody, with a clothed parametric human model that can generate a diverse range of subjects. This dataset provides a vast collection of virtual human data, facilitating the training of 3D human perception and reconstruction models.

In addition, the team has released XRFeitoria, an open-source code repository serving as a synthetic data rendering toolkit. With its user-friendly Python API and CLI tools, XRFeitoria simplifies the creation of synthetic datasets.

圖3.jpg

SynBody is a large-scale synthetic datasets for human perception and modelling

In the field of 3D object detection for autonomous driving applications, SenseTime's research team proposes a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection in their paper titled “Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction”. This approach leverages temporal information to enhance the training of multi-view 3D object detection models more effectively.

HoP achieves 68.5% NDS and 62.4% mAP with ViTL on nuScenes test, outperforming all the 3D object detectors on the leaderboard. As a plug-and-play approach, HoP can be easily incorporated into state-of-the-art BEV detection frameworks, reshaping the paradigm of utilizing temporal information for 3D object detection.

圖4.jpg

Framework of HoP

SenseTime's outstanding achievements at this year's ICCV encompass a wide range of areas including object detection, video segmentation, 3D perception and reconstruction, semi-supervised learning, NeRF, and more. These significant technological breakthroughs underscore SenseTime's unwavering commitment to pushing the boundaries of AI research.

SenseTime's Commitment to Building an Open Ecosystem

SenseTime recognizes the importance of openness and collaboration in fostering a new ecosystem that includes enterprises, universities, and research institutes. SenseTime actively engages in academic exchanges, and competitions to facilitate the transformation of research into practical business applications, as well as to explore collaborative models between industry and academia.

Concurrently, SenseTime is committed to promoting AI infrastructure and open-source ecosystems, working alongside developers to drive the prosperous development of the AI community. Their open-source platform, OpenMMLab, is extremely popular on GitHub and has received over 87,000 stars since its launch in 2018. SenseTime’s open-source projects have expanded into various domains, including decision intelligence, large language models, extended reality, data platforms, high-performance training and inference frameworks, and AI intelligent agent frameworks. These projects provide the comprehensive support needed to make novel breakthroughs in academic research and industrial implementation.

Specifically, SenseTime's collaboration with the Joint Lab has resulted in the development of InternLM, a large language model (LLM) that has made a remarkable impact on the open-source community and industry. The latest InternLM-20B model offers advanced performance and convenient application, achieving comparable capabilities to the benchmark Llama2-70B model with less than one-third of the parameters.

Adhering to the principles of openness in development, SenseTime, along with its industry partners, eagerly embraces the new wave of technological revolution in large models. By enabling AI technology to unleash broader industrial value, SenseTime aims to contribute to the advancement of AI and its practical applications.

您尚未完善信息

完善信息后，即可下载资料

完善信息跳过，继续浏览

您尚未登录

您还未登录，登录方可继续

登录跳过，继续浏览

请选择您认为需要改进的地方：

导航不好用，不方便找到感兴趣的内容
产品介绍信息不够全面
产品介绍信息不容易懂
页面打开速度不快，页面浏览不流畅/有卡顿
页面不够美观
售后服务不好找，体验不好

跳过下一个

您是否能够达到本次网站的访问目的？

是
否
仍在进行中

下一个

您对商汤官网的满意度如何？

非常不满意非常满意

提交

已收到您对商汤官网的评价和建议！

感谢您的耐心反馈~

关闭

您还未登录，登录方可继续

登录跳过，继续浏览

您尚未完善信息

完善信息后，即可下载资料

完善信息跳过，继续浏览

Apply for Trial

Technical Capabilities

SenseTime Research

SenseNova

SenseFoundry Enterprise

SenseFoundry

SenseME

SenseMARS

SenseCare

Education

SenseAuto

SenseTime's Groundbreaking Research at ICCV 2023 Showcases China's Innovation Power

您尚未完善信息

您尚未登录

请选择您认为需要改进的地方：

您是否能够达到本次网站的访问目的？

您对商汤官网的满意度如何？

已收到您对商汤官网的评价和建议！

您还未登录，登录方可继续

您尚未完善信息

Apply for Trial