More than 1,000 child sexual abuse images were found in LAION-5B, a large open dataset used to train popular AI image generators from text descriptions. According to Stanford research, LAION-5B was also used by the popular Stabel Diffusion neural network, among others. internet observatories (SIO).
According to the report’s author, David Thiel, in the summer of 2023, researchers discovered that image-generating neural networks were being used to create thousands of fake but realistic images of child pornography, which were then rapidly distributed on the dark web.
Thiel and his colleagues found that AI generates such content using data from the LAION-5B public learning database, which contains billions of different images.
According to this publications Shortly after the SIO report was published, LAION, the German company responsible for creating datasets for artificial intelligence, temporarily shut down its databases to control illegal content, Bloomberg reported. LAION-5B and other similar information strings are automatically created by adding various files from the global network that may contain prohibited content.
However, deleting the datasets will not solve the problem, as version 1.5 of the Stable Diffusion neural network has already been trained on a certain amount of illegal content and can continue to reproduce such images.
Since Stable Diffusion is publicly available open source software, it is unknown how many users have copied this model and unsecured training bases.
The SIO recommended excluding the possibility of using images of children in models that allow the creation of images of an erotic nature in the future, or completely removing images of minors from open training data for neural networks.
happened before known about cases of creating child pornography using neural networks.