The GigaChat neural network recently completed the social studies section of the unified state exam with a score of 67. This achievement was shared by Denis Filippov, Deputy Head of Digital Surfaces at Sberbank Salyut, during the AIJ 2023 conference.
Filippov noted that the score surpasses the university admission threshold of 45 points and also exceeds the 2023 national average of 56.4 points. He added that an updated version of GigaChat, built on one of the leading Russian-language models with 29 billion parameters, was evaluated in the test.
He stressed that measuring GigaChat’s effectiveness should go beyond raw technical metrics and consider usefulness to everyday users: how well the service assists in different knowledge domains, and how intelligent and creative the model appears.
“The examinations used in education, including the Unified State Examination, provide a solid basis for such an assessment. The results show that GigaChat demonstrates solid knowledge in social sciences and alignment with fundamental social principles, including moral considerations. This indicates that users can rely on the service to address real-world problems,” he stated.
Sberbank revealed that the 2024 test tasks published on the FIPI website were employed to probe GigaChat’s knowledge. Before the experiment began, the team verified that these tasks were not used for pre-training the model.
Independent evaluation was conducted by an expert from the National Research University Higher School of Economics, with a deeper review by a specialist commission from the HSE Training Institute. The assessment was designed to be independent of Sber’s research and engineering groups, and the judges evaluated the model as if it were solved by a typical high school graduate. Evgeniy Terentyev, director of the HSE Educational Institute, explained that the results indicate not only a solid factual base but also the ability to think through problems logically and select the best possible solution.
As explained by HSE, the review covered not only the accuracy of task responses and the reliability of GigaChat’s answers but also the quality of the creative tasks and the model’s overall reasoning process.