Sber and SberDevices Advance AI Training Architectures at EACL 2024

No time to read?
Get a summary

Recent advances by scientists at Sber and SberDevices are poised to enable new architectural approaches in the training of generative artificial intelligence models while cutting the computational costs involved. These insights were shared by Sber representatives at the international conference on computational linguistics, EACL 2024, held in Malta.

Researchers from Sber and SberDevices delivered a presentation covering two studies focused on artificial intelligence.

The report, led by Andrey Kuznetsov, head of the FusionBrain research group, a partner of Sberbank and the AIRI Institute, and Anton Razzhigaev, a research assistant in the group, explored the special properties of transformer architectures used in large language models. The work examined how key features of embeddings—numerical representations of data—vary across two common large language model architectures applied to natural language processing tasks.

In the next stage of the project, findings will support the distillation of language models, enabling the models to be smaller without significantly compromising quality, all while monitoring the variability of errors during distillation. This effort is essential for developing new architectural solutions during model training and for reducing the compute resources required for training generative models.

Denis Dimitrov, General Manager of Data Research at Sberbank, is a co-author on the study.

Alena Fenogenova, head of the AGI NLP team at SberDevices R&D, and Mark Baushenko, an NLP ML engineer at Sberbank, presented work on productive approaches to spelling correction. Their project yielded a proofreading methodology and introduced a library for a family of generative models trained on SAGE, along with datasets and a spelling correction task.

The researchers noted that the top-performing model outperformed existing open-source solutions such as HunSpell and JamSpell, as well as OpenAI models including gpt-3.5-turbo-0301, gpt-4-0314, and text-davinci-003. This comparison highlighted significant gains in accuracy and reliability for real-world language processing tasks. For readers and practitioners, the results point to practical routes for integrating efficient language models into enterprise workflows, with an emphasis on reducing computational load without sacrificing performance. The work also underscores ongoing collaboration among Sberbank’s data science teams and their partners to push forward the capabilities of multilingual and context-aware AI systems.

No time to read?
Get a summary
Previous Article

Leopard 2A4 Tanks to Ukraine: A Closer Look at European Military Aid and Export Controls

Next Article

Chronic Fatigue Syndrome: Symptoms, Diagnosis, and Management