- Every new move is a step forward.
November 17, 2021 – Shanghai Artificial Intelligence Laboratory partnered with SenseTime, a leading global artificial intelligence (AI) software company, and the Chinese University of Hong Kong and Shanghai Jiao Tong University to jointly unveil a new generation of general vision model INTERN, addressing key obstacles to general vision, including generalization and data efficiency. The technical report《INTERN: A New Learning Paradigm Towards General Vision》has been published on the arXiv platform.
OpenGVLab, an open source platform based on INTERN, will also be launched for academia and industry early next year, with relevant training data sets to be published, further promoting the innovation and industrial application of AI technologies. Together with the OpenMMLab and OpenDILab, which were previously released by Shanghai Artificial Intelligence Laboratory, these three open source platforms form the open source system OpenXLab, further promoting the development of an open source community for artificial intelligence.
INTERN fully covers four key tasks in computer vision - classification, target detection, segmentation and depth estimation, while its versatility has been widely verified in 26 internationally representative downstream scenes including ImageNet.
Specifically, compared with CLIP, the most powerful general vision model published so far, the average error rate of INTERN on 26 data sets of the above four visual tasks decreased by 40.2%, 47.3%, 34.8% and 9.4% respectively. With only 10% of the training data in the target domains used, INTERN consistently outperforms those state-of-the-art models trained with the whole set, often by a significant margin. This technological breakthrough dramatically reduces the cost of data acquisition and annotation, a key bottleneck that puzzles the industry for years.
AI models have been developed by designing specific models for specific tasks. However, with the introduction of INTERN, this model can accomplish many tasks at the same time and cover various long-tail scenarios such as smart auto, smart manufacturing and smart city with small sample data.
Qiao Yu, Assistant Director of the Shanghai Artificial Intelligence Laboratory, said, “The core of developing general vision is to improve the generalization ability of the model and the data efficiency in the learning process. With the launch of INTERN, we can use one model to complete multiple tasks in the future and solve the problem that AI models only trains to do one thing only at present.”
Prof. Wang Xiaogang, Head of SenseTime Research, said, “The general vision model INTERN, is our major innovation under the development trend of artificial general intelligence (AGI), which is backed by our SenseCore AI infrastructure. We hope that this framework can help the industry to better explore and apply AGI technology by improving the generalization ability of AI models.”
With only 10% of the training data in the target domains used, INTERN consistently outperforms those state-of-the-art models trained with the whole set.
A “Humanized” Learning Process – Just Like An “Intern”
INTERN consists of seven components, including three infrastructure modules - general visual data system, network structure and evaluation benchmark - and four training stages. Based on INTERN, AI models are trained in a four-stage “humanized” pre-training method - just as human being can easily use existing knowledge to quickly learn new skills, without the need to learn from scratch.
The first stage benefits from the large-scale image-text dataset collected from the Internet and produces robust representation and initialization for the following stages. The second stage focuses on one of the predefined task types and undergoes a multi-head training style for specific knowledge consolation – just like an “expert”.
The third stage unifies the representation of experts and boosts the performance through integration of knowledge. Once the pre-trained model is obtained, the general knowledge acquired by INTERN can be adapted to specific areas at the final stage, such as smart city, smart healthcare and smart auto.
Reduce Input Costs and Promote the Open-Source Ecosystem for the Industry
INTERN allows AI models to grow step by step, and gradually acquire the ability to draw inferences from others in the training process, which makes the final model more universal and flexible for different applications.
As the next major milestone of AI technology, AGI technology will bring subversive innovation to industrial development, and achieving this goal requires close cooperation between academia and industry. In the future, Shanghai Artificial Intelligence Laboratory, SenseTime, the Chinese University of Hong Kong and Shanghai Jiao Tong University will join hands to cultivate an open, inclusive and sustainable general vision AI innovation ecosystem and provide support for general vision research in academia and industry.
The launch of INTERN will help the industry greatly reduce the resource input cost of general vision model research and lay a solid foundation for the adaptation of AGI technology that addresses long-tail applications in various scenarios in smart city, smart business, smart life and smart auto.