American experts from Ohio State University have created the world’s largest set of images of biological objects to train artificial intelligence (AI) models. The study was published on: portal scientific publications arXive.
The name of the database is TreeofLife-10M. It consists of 10 million graphic files containing plants, animals, fungi and other organisms covering 454 thousand taxa (groups with common characteristics). By comparison, the previous largest archive of such data contained 2.7 million images of 10,000 taxa.
The researchers then developed a BioCLIP model to train on TreeofLife-10M. BioCLIP focuses on visual cues from images combined with textual cues and other data. The model successfully classified a variety of organisms, including rare species that the AI had not seen during training.
Test results showed that BioCLIP copes with the tasks 17-20% better than existing analogues.
Previously AI helped Scientists are trying to reveal unknown properties of proteins.