申请试用
登录
新闻中心

率先完成“云、端、边”全栈布局!大模型性能对标GPT-4 Turbo,商汤“日日新SenseNova 5.0”全面升级

2024-04-24

2024年4月23日,商汤科技SenseTime在上海、深圳等多地举办技术交流日活动,发布行业首个“云、端、边”全栈大模型产品矩阵,以满足不同规模场景的应用需求,并且全新升级“日日新SenseNova 5.0”大模型体系,其综合能力全面对标GPT-4 Turbo,技术领跑加速生成式AI向产业落地的全面跃迁,实现大模型按需所取。

在人工智能发展的最基本法则——尺度定律(Scaling Law)的准则下,商汤持续寻求最有数据配比并建立数据质量评价体系,推动自身大模型研发的同时,也为行业伙伴提供大模型训练、微调、部署和各类生成式AI的能力及服务。

商汤科技董事长兼CEO徐立表示:“商汤在尺度定律的指导下,会持续探索大模型能力的KRE三层架构(知识-推理-执行),不断突破大模型能力边界。”


image.png


“日日新SenseNova5.0”性能对标GPT-4 Turbo:文理双修能力大幅提升,新增多模态交互

自去年4月首次发布,商汤“日日新SenseNova”大模型体系已正式推出五个大版本迭代。基于超过10TB tokens训练、覆盖大量合成数据,全新的“日日新SenseNova 5.0”(以下简称:日日新5.0)采用混合专家架构,推理时上下文窗口可以有效到 200K 左右。


image.png


本次更新主要聚集增强了知识、数学、推理及代码能力,全面对标 GPT-4 Turbo,主流客观评测上达到或超越 GPT-4 Turbo。


image.png


在文科能力方面,“日日新5.0”的创意写作能力、推理能力以及总结能力均大幅提升,相同的中文知识注入后,可以获得更好的理解总结及问答,为教育、内容产业等垂直应用场景提供有力辅助。

 

image.png

“日日新5.0”和GPT-4回答趣味推理问题:“妈妈给圆圆冲了一杯咖啡,圆圆喝了半杯后,将它加满水,然后她又喝了半杯后,再加满水,最后全部喝完。问圆圆喝的咖啡多,还是水多?”,“日日新5.0”回答正确。

 

在理科能力方面,“日日新5.0”数理能力、代码能力及推理能力达到业内领先水平,为金融、数据分析等场景落地提供坚实基础。

本次“日日新5.0”另一大核心指标就是多模态能力,商汤多模态大模型的图文感知能力达到全球领先水平,在多模态大模型权威综合基准测试MMBench中综合得分排名首位,在多个知名多模态榜单MathVista, AI2D, ChartQA, TextVQA, DocVQA,MMMU 取得领先成绩。


image.png


“日日新SenseNova5.0”在应用产品层面也实现了更卓越的多模态能力,支持高清长图的解析和理解以及文生图交互式生成,还可以实现复杂的跨文档知识抽取及总结问答展示,还具备丰富的多模态交互能力。


image.png


率先完成“云端边”全栈布局:端侧模型位列行业首位,边侧推出企业级应用一体机

前瞻洞察中心化算力需求向端侧扩展的未来趋势及企业级在边缘侧的AI需求,商汤科技领先业内首次推出“云、端、边”全栈大模型产品矩阵,其中包括应用于终端设备的“商汤端侧大模型”,以及面向金融、代码、医疗、政务等多个领域的边缘产品“商汤企业级大模型一体机”。

今年是端侧大模型应用的元年,为了满足移动终端用户对大模型技术的应用需求,商汤推出日日新·端侧大模型,性能实现同等尺度性能最优,跨级尺度全面领先。


 image.png


商汤还推出端云协同解决方案,可以通过智能化判断协同发挥端云各自优势,需要联网搜索或处理复杂场景时分流至云端处理,部分场景端侧处理占比超过80%,从而显著降低推理成本。

商汤日日新·端侧大语言模型的推理速度达到了业内最快,可在中端平台实现18.3字/s的平均生成速度,旗舰平台更是达到了78.3字/s。

扩散模型同样可在端侧实现业内最快的推理速度,端侧LDM-AI扩图技术在某主流平台上,推理速度小于1.5秒,比友商云端app快10倍,支持输出1200万像素及以上的高清图片,支持在端上快速进行等比扩图、自由扩图、旋转扩图等图像编辑功能。


image.png


从今日起,商汤端侧业务SDK正式发布,也欢迎集成体验。

对于金融、代码、医疗、政务等重点行业边缘侧日益增长的AI应用需求,商汤正式推出企业级大模型一体机,可同时支持企业级千亿模型加速和知识检索硬件加速,实现本地化部署,即买即用,降低企业应用大模型的门槛。相比行业同类产品,推理成本节约80%,检索大大加速,CPU工作负载50%。


image.png



联合生态伙伴创新AI 2.0时代产品应用,打造新质生产力

本次活动现场,商汤还邀请到金山办公、海通证券、小米、阅文集团、华为等多位生态伙伴嘉宾,共同探讨和交流大模型技术在办公、金融、出行等不同领域的应用及前景。

自2023年起,商汤与金山办公达成深度合作,基于“日日新”大模型的卓越代码生成及工具调用能力,助力WPS 365打造更高效释放场景能力的办公新质生产力平台,为企业构建专属的“企业大脑”。金山办公CEO章庆元表示:“在办公应用场景内,商汤大模型的表现十分出色,能够帮助我们的用户解决办公中的复杂问题,提升效率。”


image.png


在金融领域,海通证券与商汤科技联合发布金融行业多模态全栈式大模型,双方在智能客服、合规风控、代码辅助、展业办公助手等领域助推业务落地,并共研智能投顾、舆情监控等行业前沿场景,打通证券行业大模型落地的全栈式能力。海通证券副总经理兼首席信息官 毛宇星谈到:“通过与商汤合作,我们利用大模型技术实现了海通证券数智化转型,未来,我们将结合全栈式AI能力进行业务流程、交互变革与数智化业务系统重构。”


image.png


在个人出行场景,近期火爆市场的小米汽车SU7的智能车舱中就应用了商汤的大模型技术,基于商汤端云大模型解决方案,小米小爱同学为车主提供智能化交互体验。小米集团小爱总经理王刚在与商汤科技联合创始人、首席科学家王晓刚对话时表示:“商汤的云边端全栈组合,可以很好地赋能和适配小米物联网生态。我们希望与商汤共同为我们的用户打造更具智能化的产品体验。”


image.png

左为小米集团小爱总经理 王刚,右为商汤科技联合创始人、首席科学家 王晓刚


此外,今天商汤还发布了基于昇腾原生的行业大模型,共同打造面向金融、医疗、政务、代码等大模型产业生态。

在自身应用方面,商汤“日日新SenseNova 5.0”在秒画、如影、格物、琼宇、大医、小浣熊家族等产品均有重要更新。

 

“文生视频”已在路上,商汤科技坚定迈向AGI时代

在本次技术交流日最后环节,商汤科技董事长兼CEO徐立还带来了三段完全由大模型生成的视频,并强调文生视频平台对于人物、动作和场景的可控性。


image.png

image.png


商汤科技在文生视频平台方面也取得了技术突破,未来,通过输入一段文字或一个完整的描述,就可以生成一段视频,而且人物的服饰、发型、场景都可以根据预先设定,保持视频内容的连贯性和一致性。

通过商汤打造的智能算力中心,可以不断赋能于大模型的训练,目前,日日新大模型体系在自然语言处理、视频生成和深度学习优化等多个方面取得创新。

一方面,现阶段大模型发展已进入落地阶段,如何与产业、应用场景结合是关键一环;另一方面,“尺度定律”路径逐渐清晰,“涌现”时刻不定出现,前瞻探索最先进的大模型技术也是重中之重。

秉承创立之初“坚持原创,让AI引领人类进步”的目标和愿景,商汤科技的大模型技术和产品已在医疗、教育、法律、工业等各行各业进行实践应用。如大模型以“日日新”为名,商汤一直坚定朝向通用人工智能的目标前行,突破数据与算力的限制,引领大模型的创新与落地。


SenseTime Leads the Way in Full-stack Cloud-to-Edge Deployment

SenseNova 5.0 Upgraded for Enhanced Performance, on Par with GPT-4 Turbo

 

April 23, 2024, Shanghai — SenseTime hosted its Tech Day, unveiling the industry's first Cloud-to-Edge full-stack large model product matrix to meet the application needs of various scenarios. SenseTime also announced the latest SenseNova 5.0 Large Model, which matches the overall capability of GPT-4 Turbo. This marks the advancement in the adoption of generative AI for industrial applications, enabling users with on-demand access to large models.

 

Under the fundamental principle of AI development, the Scaling Law, SenseTime is consistently pursuing the optimal data ratio and establishes a data quality assessment system. This approach drives the advancement of large model development while also providing industry partners a comprehensive suite of capabilities and services. These include training, fine-tuning, deploying, and various generative AI tasks.

 

Dr. Xu Li, Chairman of the Board and CEO of SenseTime, said, “In our pursuit to push the boundaries of large model capabilities, SenseTime remains guided by the Scaling Law as we build upon our Large Model based on this three-tier architecture: Knowledge, Reasoning, and Execution (KRE)."

 

1.jpg

 

SenseNova 5.0 outperforms GPT-4 Turbo: Significant enhancements in interdisciplinary capabilities, adding new multimodal interactions

 

Since its debut in April last year, SenseTime's SenseNova Large Model has undergone five major version iterations. Trained on an extensive dataset of over 10TB of tokens, covering a vast array of synthetic data, the latest SenseNova 5.0 adopts a Mixture of Experts, enabling effective context window coverage of approximately 200,000 during inference.


2.jpg

 

The major advancements in SenseNova 5.0 focus on knowledge, mathematical, reasoning, and coding capabilities, matching or surpassing GPT-4 Turbo in mainstream objective assessments.

 

3.jpg

 

Linguistic and creative capabilities,  the creative writing, reasoning, and summary abilities of SenseNova 5.0 have significantly improved. Given the same knowledge input in Chinese, it provides better comprehension, summarization, and question and answers, providing strong support for vertical applications such as education and the content industries.

 

4.jpg

SenseNova 5.0 and GPT-4 were presented with a fun reasoning question: "Mom prepared Yuan Yuan a cup of coffee. Yuan Yuan drank half of it, then filled it with water. After drinking half again, she refilled it with water, and finally drank it all. Did Yuan Yuan consume more coffee or water?" SenseNova 5.0 provided the correct answer to the question.

 

On its scientific capabilities, SenseNova 5.0 boasts best-in-class mathematical, coding and reasoning capabilities, providing a solid foundation for applications in finance and data analysis.

 

SenseNova 5.0 is also equipped with superior multimodal capabilities in product application.  SenseTime's multimodal large models have achieved world-leading tgraphical and textual perception ranks first based on its aggregate score on MMBench. It has also achieved high scores in other well-known multimodal rankings such as MathVista, ChartQATextVQA, DocVQA and MMMU.

 

5.jpg

 

"SenseNova 5.0" also achieves exceptional multimodal capabilities at the product application level. It provides robust support for tasks such as the analysis and understanding of high-resolution long images, interactive text-to-image generation, as well as complex cross-document knowledge extraction and summary Q&A, with rich multimodal interactive capabilities.

 

6.jpg

 

Leading the completion of the Cloud-to-Edge full-stack layout: featuring the industry-leading edge-side model and the SenseTime Integrated Large Model (Enterprise) edge device

Anticipating the trend of centralized computing power demands extending to edge devices and the enterprise-level AI needs on the edge, SenseTime has pioneered the launch of the "Cloud-to-Edge" full-stack large model product matrix. This includes the SenseTime Edge-side Large Model for terminal devices and the SenseTime Integrated Large Model (Enterprise) edge device for fields such as finance, coding, healthcare, and government services.

 

This year marks the inaugural year of edge device large model applications. To meet the application demands of mobile end-users for large model technology, SenseTime has launched the SenseNova Edge Device Large Model, achieving optimal performance at its scale and leading comprehensively across various scales.


7.jpg

 

SenseTime has also launched an edge-cloud collaborative solution, which leverages the strengths of both edge and cloud through intelligent judgment. When internet searches or complex scene processing are required, tasks are offloaded to the cloud, with edge handling over 80% of certain scenarios, significantly reducing inference costs.

 

The inference speed of the SenseNova Edge-side Large Language Model has reached industry-leading levels, generating 18.3 words per second on mid-range platforms and an impressive 78.3 words per second on flagship platforms.

 

The diffusion model also boasts the fastest inference speed in the industry. The edge-side LDM-AI image diffusion technology processes images in less than 1.5 seconds on a mainstream platform—ten times faster than competitors' cloud apps. It supports the output of high-definition images with resolutions of 12 million pixels and above, as well as image editing functions like proportional, free-form, and rotational expansion on the edge device.

 

8.jpg

 

From today onwards, the SenseTime Edge Business SDK is officially launched, and will be available for integration and experience.

 

The SenseTime Integrated Large Model (Enterprise) edge device was developed in response to the growing demand for AI from key fields such as finance, coding, healthcare and government services. It supports both enterprise-scale model acceleration and knowledge retrieval hardware acceleration, enabling localized deployment, ready-to-use upon purchase, and lowering the threshold for enterprises to apply large models. Compared to other similar products, the device performs accelerated searches at only 50 percent CPU utilization, and reduces inference costs by approximately 80 percent.

 

9.jpg

 

Innovating product applications in the AI 2.0 era with ecosystem partners to further boost productivity

 

At the event, SenseTime also invited ecosystem partners such as Kingsoft Office, Haitong Securities, Xiaomi, Yuewen Group, and Huawei to discuss and exchange views on the industry application and the general trend of large model technology in various fields such as office, finance, and transportation.

 

SenseTime has partnered with Kingsoft Office since 2023, leveraging SenseNova Large Model to enhance WPS 365's efficiency in releasing scenario capabilities and building a dedicated "Enterprise Brain" for businesses. CEO of Kingsoft, Zhang Qingyuan, remarked, "In office application scenarios, SenseTime's large model has excelled in helping our users solve complex problems and improve efficiency."

 

10.jpg

 

In the financial sector, Haitong Securities and SenseTime jointly released a multimodal full-stack large model for the financial industry. The two parties collaborated on intelligent customer service, compliance risk control, coding support, and business development office assistants, as well as co-researching intelligent investment advisory and aggregating public sentiments, pioneering the full-stack capability of a large model’s deployment in the securities industry. Mao Yuxing, the Deputy General Manager and Chief Information Officer of Haitong Securities, said, "Through our collaboration with SenseTime, we have leveraged large model technology to digitally transform Haitong Securities. Moving forward, we will combine full-stack AI capabilities to further enhance the digitization of our business processes, interactions, and systems."

 

11.jpg

 

In the transportation sector, the Xiaomi Car SU7, which surged in popularity recently, provides owners with an intelligent experience powered by Xiaomi's Xiao Ai and incorporates SenseTime's large model technology, based on the SenseTime Edge-cloud large model solution. During his conversation with Wang Xiaogang, Co-founder and Chief Scientist of SenseTime, Wang Gang, General Manager of Xiao Ai at Xiaomi, said, "SenseTime's cloud-to-edge full-stack product matrix can effectively empower and adapt to the Xiaomi IoT ecosystem. With SenseTime, we aim to co-create a smarter product experience for our users."

 

12.jpg

Left: Wang Gang, General Manager of Xiao Ai, Xiaomi; Right: Wang Xiaogang, Co-founder and Chief Scientist, SenseTime

 

In addition,  SenseTime also released an industry large model today based on the Ascend chip, thereby jointly creating a large model industry ecosystem aimed at field such as finance, healthcare, government services, and coding.

 

In its own applications, SenseTime's SenseNova 5.0 significantly updated its products such as SenseMirage, SenseAvatar, SenseThings, SenseSpace, SenseChat-DaYi, and the Raccoon suite of solutions.

 

Text-to-video generation is also on its way, and SenseTime is steadfastly advancing towards the AGI era


In the final segment of Tech Day, SenseTime's Chairman of the Board and CEO, Dr. Xu Li, presented three video clips entirely generated by a large model, emphasizing the controllability of characters, actions, and scenes on the generative video platform.

 

13.jpg14.jpg

 

SenseTime also displayed its breakthrough with its text-to-video platform, where users will soon be able to generate a video based on a detailed description or even a few phrases. In addition, the characters’ costumes, hairstyles, and scenarios can be preset to maintain the stylistics consistency of the video content.

 

The training of large models can be continuously empowered through the intelligent computing center built by SenseTime. Currently, the SenseNova Large Model has achieved innovations in several areas, including natural language processing, video generation, and deep learning optimization.

 

On one hand, large model development has now entered the implementation phase, and the next crucial step is to  integrate with industries and application scenarios. On the other hand, the guiding path of the "Scaling Law" is becoming clearer, and such "breakthrough" moments will occur from time to time,  so it is a priority to continue proactively exploring the frontiers of large model technology.

 

Adhering to its founding principle of creating a better AI-empowered future through innovation, SenseTime's large model technology and products have been applied in various industries such as healthcare, education,  legal and industry. Named "SenseNova," SenseTime will continue to advance steadfastly towards the goal of general artificial intelligence, breaking through the limits of data and computing power, and leading the innovation and implementation of large models.