- 新闻中心
率先完成“云、端、边”全栈布局!大模型性能对标GPT-4 Turbo,商汤“日日新SenseNova 5.0”全面升级
2024年4月23日,商汤科技SenseTime在上海、深圳等多地举办技术交流日活动,发布行业首个“云、端、边”全栈大模型产品矩阵,以满足不同规模场景的应用需求,并且全新升级“日日新SenseNova 5.0”大模型体系,其综合能力全面对标GPT-4 Turbo,技术领跑加速生成式AI向产业落地的全面跃迁,实现大模型按需所取。
在人工智能发展的最基本法则——尺度定律(Scaling Law)的准则下,商汤持续寻求最有数据配比并建立数据质量评价体系,推动自身大模型研发的同时,也为行业伙伴提供大模型训练、微调、部署和各类生成式AI的能力及服务。
“日日新SenseNova5.0”性能对标GPT-4 Turbo:文理双修能力大幅提升,新增多模态交互
自去年4月首次发布,商汤“日日新SenseNova”大模型体系已正式推出五个大版本迭代。基于超过10TB tokens训练、覆盖大量合成数据,全新的“日日新SenseNova 5.0”(以下简称:日日新5.0)采用混合专家架构,推理时上下文窗口可以有效到 200K 左右。
本次更新主要聚集增强了知识、数学、推理及代码能力,全面对标 GPT-4 Turbo,主流客观评测上达到或超越 GPT-4 Turbo。
本次“日日新5.0”另一大核心指标就是多模态能力,商汤多模态大模型的图文感知能力达到全球领先水平,在多模态大模型权威综合基准测试MMBench中综合得分排名首位,在多个知名多模态榜单MathVista, AI2D, ChartQA, TextVQA, DocVQA,MMMU 取得领先成绩。
联合生态伙伴创新AI 2.0时代产品应用,打造新质生产力
自2023年起,商汤与金山办公达成深度合作,基于“日日新”大模型的卓越代码生成及工具调用能力,助力WPS 365打造更高效释放场景能力的办公新质生产力平台,为企业构建专属的“企业大脑”。金山办公CEO章庆元表示:“在办公应用场景内,商汤大模型的表现十分出色,能够帮助我们的用户解决办公中的复杂问题,提升效率。”
在金融领域,海通证券与商汤科技联合发布金融行业多模态全栈式大模型,双方在智能客服、合规风控、代码辅助、展业办公助手等领域助推业务落地,并共研智能投顾、舆情监控等行业前沿场景,打通证券行业大模型落地的全栈式能力。海通证券副总经理兼首席信息官 毛宇星谈到:“通过与商汤合作,我们利用大模型技术实现了海通证券数智化转型,未来,我们将结合全栈式AI能力进行业务流程、交互变革与数智化业务系统重构。”
左为小米集团小爱总经理 王刚,右为商汤科技联合创始人、首席科学家 王晓刚
在自身应用方面,商汤“日日新SenseNova 5.0”在秒画、如影、格物、琼宇、大医、小浣熊家族等产品均有重要更新。
SenseTime Leads the Way in Full-stack Cloud-to-Edge Deployment
SenseNova 5.0 Upgraded for Enhanced Performance, on Par with GPT-4 Turbo
April 23, 2024, Shanghai — SenseTime hosted its Tech Day, unveiling the industry's first Cloud-to-Edge full-stack large model product matrix to meet the application needs of various scenarios. SenseTime also announced the latest SenseNova 5.0 Large Model, which matches the overall capability of GPT-4 Turbo. This marks the advancement in the adoption of generative AI for industrial applications, enabling users with on-demand access to large models.
Under the fundamental principle of AI development, the Scaling Law, SenseTime is consistently pursuing the optimal data ratio and establishes a data quality assessment system. This approach drives the advancement of large model development while also providing industry partners a comprehensive suite of capabilities and services. These include training, fine-tuning, deploying, and various generative AI tasks.
Dr. Xu Li, Chairman of the Board and CEO of SenseTime, said, “In our pursuit to push the boundaries of large model capabilities, SenseTime remains guided by the Scaling Law as we build upon our Large Model based on this three-tier architecture: Knowledge, Reasoning, and Execution (KRE)."
SenseNova 5.0 outperforms GPT-4 Turbo: Significant enhancements in interdisciplinary capabilities, adding new multimodal interactions
Since its debut in April last year, SenseTime's SenseNova Large Model has undergone five major version iterations. Trained on an extensive dataset of over 10TB of tokens, covering a vast array of synthetic data, the latest SenseNova 5.0 adopts a Mixture of Experts, enabling effective context window coverage of approximately 200,000 during inference.
The major advancements in SenseNova 5.0 focus on knowledge, mathematical, reasoning, and coding capabilities, matching or surpassing GPT-4 Turbo in mainstream objective assessments.
Linguistic and creative capabilities, the creative writing, reasoning, and summary abilities of SenseNova 5.0 have significantly improved. Given the same knowledge input in Chinese, it provides better comprehension, summarization, and question and answers, providing strong support for vertical applications such as education and the content industries.
SenseNova 5.0 and GPT-4 were presented with a fun reasoning question: "Mom prepared Yuan Yuan a cup of coffee. Yuan Yuan drank half of it, then filled it with water. After drinking half again, she refilled it with water, and finally drank it all. Did Yuan Yuan consume more coffee or water?" SenseNova 5.0 provided the correct answer to the question.
On its scientific capabilities, SenseNova 5.0 boasts best-in-class mathematical, coding and reasoning capabilities, providing a solid foundation for applications in finance and data analysis.
SenseNova 5.0 is also equipped with superior multimodal capabilities in product application. SenseTime's multimodal large models have achieved world-leading tgraphical and textual perception ranks first based on its aggregate score on MMBench. It has also achieved high scores in other well-known multimodal rankings such as MathVista, ChartQA,TextVQA, DocVQA and MMMU.
"SenseNova 5.0" also achieves exceptional multimodal capabilities at the product application level. It provides robust support for tasks such as the analysis and understanding of high-resolution long images, interactive text-to-image generation, as well as complex cross-document knowledge extraction and summary Q&A, with rich multimodal interactive capabilities.
Leading the completion of the Cloud-to-Edge full-stack layout: featuring the industry-leading edge-side model and the SenseTime Integrated Large Model (Enterprise) edge device
Anticipating the trend of centralized computing power demands extending to edge devices and the enterprise-level AI needs on the edge, SenseTime has pioneered the launch of the "Cloud-to-Edge" full-stack large model product matrix. This includes the SenseTime Edge-side Large Model for terminal devices and the SenseTime Integrated Large Model (Enterprise) edge device for fields such as finance, coding, healthcare, and government services.
This year marks the inaugural year of edge device large model applications. To meet the application demands of mobile end-users for large model technology, SenseTime has launched the SenseNova Edge Device Large Model, achieving optimal performance at its scale and leading comprehensively across various scales.
SenseTime has also launched an edge-cloud collaborative solution, which leverages the strengths of both edge and cloud through intelligent judgment. When internet searches or complex scene processing are required, tasks are offloaded to the cloud, with edge handling over 80% of certain scenarios, significantly reducing inference costs.
The inference speed of the SenseNova Edge-side Large Language Model has reached industry-leading levels, generating 18.3 words per second on mid-range platforms and an impressive 78.3 words per second on flagship platforms.
The diffusion model also boasts the fastest inference speed in the industry. The edge-side LDM-AI image diffusion technology processes images in less than 1.5 seconds on a mainstream platform—ten times faster than competitors' cloud apps. It supports the output of high-definition images with resolutions of 12 million pixels and above, as well as image editing functions like proportional, free-form, and rotational expansion on the edge device.
From today onwards, the SenseTime Edge Business SDK is officially launched, and will be available for integration and experience.
The SenseTime Integrated Large Model (Enterprise) edge device was developed in response to the growing demand for AI from key fields such as finance, coding, healthcare and government services. It supports both enterprise-scale model acceleration and knowledge retrieval hardware acceleration, enabling localized deployment, ready-to-use upon purchase, and lowering the threshold for enterprises to apply large models. Compared to other similar products, the device performs accelerated searches at only 50 percent CPU utilization, and reduces inference costs by approximately 80 percent.
Innovating product applications in the AI 2.0 era with ecosystem partners to further boost productivity
At the event, SenseTime also invited ecosystem partners such as Kingsoft Office, Haitong Securities, Xiaomi, Yuewen Group, and Huawei to discuss and exchange views on the industry application and the general trend of large model technology in various fields such as office, finance, and transportation.
SenseTime has partnered with Kingsoft Office since 2023, leveraging SenseNova Large Model to enhance WPS 365's efficiency in releasing scenario capabilities and building a dedicated "Enterprise Brain" for businesses. CEO of Kingsoft, Zhang Qingyuan, remarked, "In office application scenarios, SenseTime's large model has excelled in helping our users solve complex problems and improve efficiency."
In the financial sector, Haitong Securities and SenseTime jointly released a multimodal full-stack large model for the financial industry. The two parties collaborated on intelligent customer service, compliance risk control, coding support, and business development office assistants, as well as co-researching intelligent investment advisory and aggregating public sentiments, pioneering the full-stack capability of a large model’s deployment in the securities industry. Mao Yuxing, the Deputy General Manager and Chief Information Officer of Haitong Securities, said, "Through our collaboration with SenseTime, we have leveraged large model technology to digitally transform Haitong Securities. Moving forward, we will combine full-stack AI capabilities to further enhance the digitization of our business processes, interactions, and systems."
In the transportation sector, the Xiaomi Car SU7, which surged in popularity recently, provides owners with an intelligent experience powered by Xiaomi's Xiao Ai and incorporates SenseTime's large model technology, based on the SenseTime Edge-cloud large model solution. During his conversation with Wang Xiaogang, Co-founder and Chief Scientist of SenseTime, Wang Gang, General Manager of Xiao Ai at Xiaomi, said, "SenseTime's cloud-to-edge full-stack product matrix can effectively empower and adapt to the Xiaomi IoT ecosystem. With SenseTime, we aim to co-create a smarter product experience for our users."
Left: Wang Gang, General Manager of Xiao Ai, Xiaomi; Right: Wang Xiaogang, Co-founder and Chief Scientist, SenseTime
In addition, SenseTime also released an industry large model today based on the Ascend chip, thereby jointly creating a large model industry ecosystem aimed at field such as finance, healthcare, government services, and coding.
In its own applications, SenseTime's SenseNova 5.0 significantly updated its products such as SenseMirage, SenseAvatar, SenseThings, SenseSpace, SenseChat-DaYi, and the Raccoon suite of solutions.
Text-to-video generation is also on its way, and SenseTime is steadfastly advancing towards the AGI era
In the final segment of Tech Day, SenseTime's Chairman of the Board and CEO, Dr. Xu Li, presented three video clips entirely generated by a large model, emphasizing the controllability of characters, actions, and scenes on the generative video platform.
SenseTime also displayed its breakthrough with its text-to-video platform, where users will soon be able to generate a video based on a detailed description or even a few phrases. In addition, the characters’ costumes, hairstyles, and scenarios can be preset to maintain the stylistics consistency of the video content.
The training of large models can be continuously empowered through the intelligent computing center built by SenseTime. Currently, the SenseNova Large Model has achieved innovations in several areas, including natural language processing, video generation, and deep learning optimization.
On one hand, large model development has now entered the implementation phase, and the next crucial step is to integrate with industries and application scenarios. On the other hand, the guiding path of the "Scaling Law" is becoming clearer, and such "breakthrough" moments will occur from time to time, so it is a priority to continue proactively exploring the frontiers of large model technology.
Adhering to its founding principle of creating a better AI-empowered future through innovation, SenseTime's large model technology and products have been applied in various industries such as healthcare, education, legal and industry. Named "SenseNova," SenseTime will continue to advance steadfastly towards the goal of general artificial intelligence, breaking through the limits of data and computing power, and leading the innovation and implementation of large models.