The GigaChat neural network model completed all the tasks of the unified state exam in social studies and received a score of 67. Denis Filippov, Deputy Head of Digital Surfaces at Sberbank Salyut, spoke about this at the AIJ 2023 conference.
According to him, this result exceeds the minimum score required for university application (45 points) and the average score in 2023 (56.4 points). It was noted that an updated version of GigaChat was tested, based on one of the most advanced models for the Russian language with 29 billion parameters.
Filippov emphasized that it is important to evaluate the effectiveness of GigaChat not only by technical measurements, but also from the point of view of an ordinary person: whether the service can help in a particular field of knowledge, how smart and model-creative it is.
“Tests used in the education system, including the Unified State Examination, are well suited for such an assessment. The exam results show that GigaChat is quite knowledgeable in the field of social sciences. This means that our AI “understands” the fundamental laws of society and focuses on issues of morality. “This is further proof that users can use our service to solve real problems in the real world,” he said.
Sberbank announced that existing test tasks for 2024 published on the FIPI website were used to test GigaChat knowledge. But before the experiment, the team made sure that these tasks were not used to pre-train the model.
It was noted that the answers of the neural network were checked by an independent expert from the National Research University Higher School of Economics. At the same time, the assessment was examined in more detail by an expert commission of the HSE Training Institute.
“Our experts evaluated GigaChat’s information independently of Sber’s research and engineering teams. We checked the answers as if they were given by an ordinary high school graduate. Evgeniy Terentyev, director of the HSE Educational Institute, said: “The results show that the neural network model not only has a sufficient level of real information, but also “It also shows that he has the ability to think logically and choose the best possible solution,” he said.
As the HSE explains, not only the accuracy of task assignment and the reliability of GigaChat’s actual responses were evaluated, but also the quality of the creative tasks.