Sber developers are trying to create a new version of the GigaChat service based on one of the most advanced models for the Russian language with 29 billion parameters. This was announced by STO Senior Vice President Andrey Belevtsev, Head of Sberbank Technology block, during the Sber conference “Journey to the World of Artificial Intelligence” (AI Journey).
According to him, thanks to the new LLM, which will form the basis of the next version of the GigaChat artificial intelligence system, the capabilities of the service will become comparable to popular foreign solutions.
“Training the models that power GigaChat is a huge and complex computational undertaking, and we have never done anything like this before. Belevtsev said that the total number of computational operations in training the 13 billion-parameter ruGPT-3 model in 2021 was almost 6 times higher.
He noted that the company has collected and developed a unique data set specifically for GigaChat that hundreds of Sber employees work on, which helps improve the quality of responses in various areas.
“Thanks to these efforts, with each new release of GigaChat, users are making the most of the service to solve their problems,” the company’s top executive added.
Sberbank explained that thanks to the new LLM, GigaChat follows instructions better and is able to perform more complex tasks: summarizing, rewriting and editing texts, as well as the quality of answers to various questions has increased significantly. The team compared the responses of the new and previous models and noted an overall improvement in quality of 23%. At the same time, the announced model copes with reality 25% better than the previous version.
To achieve such results, many experiments were carried out to improve the model and increase the efficiency of its training. Specifically, a framework capable of partitioning neural network weights across video cards was used to train large language models; This made it possible to reduce memory consumption on cards.
According to the results of internal evaluation on the MMLU (Massive Multitask Language Understanding) benchmark, the model of the new version of GigaChat with 29 billion parameters is superior to the most popular open analogue LLaMA 2 34B.
Sber’s commercial customers will soon be among the first to access the new API to implement their own solutions, as well as the academic community to conduct research.
Sber’s eighth international conference “Journey to the World of Artificial Intelligence” (AI Journey) started on November 22 and will last until November 24.