Sber presented the Kandinsky 2.1 neural network, which can generate high-quality images in a few seconds from text descriptions in natural language, and also generate images similar to the one given by mixing and modifying several drawings according to the textual description. one, filling in the missing parts of the picture and creating images in endless mode, canvases (interior painting / exterior painting), press service of the bank.
The neural network was developed and trained by Sber AI researchers with the joint support of scientists from AIRI Artificial Intelligence Institute on the combined dataset of Sber AI and SberDevices.
The new Kandinsky 2.1 model inherited the weights of the previous version, trained on 1 billion text-image pairs and also trained on 170 million high-resolution text-image pairs.
“While teaching Kandinsky 2.1, we listened to user feedback and learned the most advanced concepts and implemented a bold hypothesis. As a result, we developed a powerful universal solution for a wide range of tasks at the level of the world’s best analogues. Alexander Vedyakhin, First Vice Chairman of the Board of Sberbank, both business and business It offers tremendous opportunities for the world and the public.”
The neural network was also developed with a new trained autoencoder model, which is also used as a decoder for vector representations of images. This has significantly improved the rendering of high resolution images.
In addition, Kandinsky 2.1 uses not only a coded text description, but also a special representation of the image by the CLIP model. In this form, the neural network creates a picture representation based on textual information and feeds it into the input of the main generative model.