Grab it by the hand. How hackers steal neural networks and are found using watermarks Scientist Oleg Rogov: There are four methods of stealing neural networks 06.21.2024,

No time to read?
Get a summary

— By watermark, people mean images that can be seen in light. Most often they are used to protect money from counterfeiting. What are digital watermarks?

— This is a technology created to protect the copyright of multimedia files. Any digital watermark is some information added to the original digital file, whether it is an image, document, video or audio.

The simplest example is visuals.

Everyone has seen translucent text on some images; it is the simplest watermark that shows who owns the content. It also protects against copying and modification.

Roughly speaking, it is a mark placed on digital content to protect copyright and verify the integrity of the document. If the work undergoes some kind of change, the digital watermark changes with it; Based on this feature, the copyright holder can find out whether the file has been modified or not.

— What other types of digital watermarks are there?

– It’s hidden. This embedded signal makes small changes to the original image, video or audio signal, but these transformations are usually imperceptible to the eye or ear.

For example, you can change the brightness of certain points in the image. The average user will not notice this. If we talk about sound or image, minor distortions can be added in certain parts of the sound recording. In addition, they do not affect the overall image, the user cannot hear them, but they are recorded by technical means. From these, the copyright owner can understand that the content has been copied illegally. The creators claim that such digital watermarks are preserved when dubbing to analog devices, for example, when recording with a microphone and digitizing the sound back.

Unlike a visible one, it is impossible to remove an invisible digital mark without special knowledge.

— Are they also used in neural networks?

— Yes, they can be used to determine if someone has copied your neural network and passed it off as their own. The problem with using watermarks in AI is that neural network technologies are multi-component. This makes it difficult to trace the origin of specific algorithms or pieces of code. Additionally, stolen models are subject to change; Attackers use special methods to make it difficult to establish a direct link between the stolen model and its original source.

However, most of these pattern marking methods contain a significant drawback – the behavior of watermarks is poorly preserved during the theft procedure due to attack on functionality.

— Why are attackers stealing neural networks?

— One of the main reasons for theft is to close the gap with competitors or to gain advantage in a certain area.

Stealing neural networks can allow attackers to bypass lengthy processes of architectural research and development, training, testing, and so on.

Theft can also provide access to confidential information, such as banking, biometrics or other sensitive data processed by neural networks.

– How does theft occur?

— In order for an exact copy of the model to fall into the hands of an attacker, the model must be physically exfiltrated from the servers of its creators. To do this, you can launch a hacker attack on the infrastructure and use social engineering.

Sometimes the attacker knows nothing about the structure of the neural network, neither the architecture of the model nor the data on which it is trained. An example of such a model is the widely available ChatGPT. In this case, hackers steal the neural network’s functionality—the way the network is trained to perform specific tasks, such as typing text or distinguishing between a bike and a truck with some accuracy.

– How exactly does this happen?

—The most popular types of theft involve “distilling” information about a finished model and additionally training the original version on a new data set, obscuring the methods for obtaining that initial version.

For example, a user could learn a little about the model’s architecture or the dataset it was trained on, take a lower-level model and train a copy, avoiding training and design costs, and then use that to build their own commercial model. The product bypasses the copyright holder’s licenses.

This can be done using custom data sets. Here’s how they are created: Objects are fed into the neural network and the attacker’s backup model is trained.

That is, a smaller student model trained to repeat the behavior of a heavier and more accurate teacher model achieves similar results, sometimes gaining significant size and speed due to the simplified architecture, while also slightly losing the quality of its work.

In these and some other cases, the watermark will “fly away”.

— How else can hackers remove signs?

— There is also a type of Pruning attack (a compression method to reduce memory consumption and computational complexity of a neural network). A large number of parameters allow the neural network to identify complex dependencies in data and solve difficult problems. But practice shows that for a network to work well, often not all the parameters it has are necessary.

You can also optimize memory, for example for mobile devices with limited resources. Model compression is achieved, for example, by removing unimportant parameters and reducing connections between neurons. The second method involves optimizing the model for specific data types; Such a change in parameters may also lead to the removal of a brand.

— If the digital watermark is lost during such attacks, how can we detect that the neural network has been stolen?

— We created our own method for marking neural networks. It allows you to get unique trigger datasets built into the AI ​​model and saved even after theft.

We also surpassed our foreign colleagues in the USA and South Korea. When the neural network is stolen, its watermarks disappear; In some cases our efficiency exceeds 95%.

The trigger dataset I’m talking about is a set of input data to which the neural network assigns specific, predetermined predictions to objects: for example, for a classification neural network, this might be a set of cat images. The neural network identifies it as a dog. It turns out that we choose our own unique marking key for each neural network.

These watermarks are manifested by a specific “behavior” of the model in response to a validation procedure established by the developer. The approach can be applied to any model without sacrificing performance and with minimal computational effort.

— Are there any disadvantages to this type of watermark?

“There is a problem with this type of signage: The dataset needs to be large enough to prevent illegal use of the system. On the other hand, it should not be too large to hinder the efficient functioning of the neural network.
When trying to remove a sign, attackers must go through more training cycles; that is, the number of proxy models increases. This is the same attack I described before, only more labor intensive.

— Is it possible to find stolen parts of the code using these watermarks?

— Markup is possible, but the attacker may not copy the code completely, but rewrite the original code in another language, and it is very difficult to prove that it is plagiarism from the code alone. There is no reason for this, so there is no point in using complex digital watermarks in programming, but it is easier to protect the information circuit of enterprise development from leaks.

— At the international level, the possibility of marking works of art created with the help of artificial intelligence with digital watermarks is being discussed. What do you think about this initiative?

— This can be done if the neural network is free and publicly available. The situation is the same with works of art. In fact, such watermarks can also be added to texts generated by neural networks such as ChatGPT. If we are talking about images or music, then the question is: who is the author? User or neural network? Today, artificial intelligence is not the subject of law. This is a complex legal issue that is actively studied by experts, so there is no need to rush to watermark all files in such tasks.

No time to read?
Get a summary
Previous Article

Two days without a trace from Manoli, who disappeared on the night of San Juan: “Gather the strength to return”

Next Article

Russia imported aluminum from India for the first time in spring