Sber’s GigaChat neural network model outperformed most publicly available models according to the MERA benchmark. This was reported by the press service of the financial institution.
Two Sber neural network models were presented for measurement: GigaChat PRO and GigaChat Lite+.
As a result of a test task consisting of 21 tasks in the format of instructions on various knowledge areas, GigaChat PRO received a score of 51.3 out of 100, beating the Mixtral 8x7B Instruct model, which received 47.8 points.
Sberbank stated that the open evaluation system allows objective and transparent assessment of models’ capabilities. The more points the model earns, the more accurately artificial intelligence can solve many intellectual or everyday problems: it helps write articles in the desired style and format, search for information and prepare analyzes based on it.
The company explained that with the help of neural networks, businesses can create their own solutions and optimize their internal processes.
According to Andrey Belevtsev, Senior Deputy CTO, Head of Sberbank’s Technology block, in conditions where large language models are actively developed, it is important to have an up-to-date understanding of their real capabilities.
“Through the evaluation, users can understand how to use GigaChat, and researchers can obtain objective information for further training, adaptation and development of large language models,” the company’s top executive said.
Belevtsev believes that the test results are not only recognition of the work of the Sber team, but also a basis for improving the service to make it more convenient and useful for both ordinary users and businesses.
The concept of the MERA (Multimodal Assessment for Russian Language Architectures) benchmark was announced at the international conference AI Journey-2023. A number of Alliance member companies as well as academic partners Skoltech AI and the National Research University Higher School of Economics (HSE) participated in the creation of the tests.