Watermarks appear when light hits an image, and they are widely used to shield currencies from counterfeiters. What exactly are digital watermarks, then?
Digital watermarking is a technology that protects the copyrights of multimedia files. Any digital watermark adds information to the original file, whether it’s an image, a document, a video, or audio.
The simplest example shows up in visuals. A translucent text overlay on an image clearly indicates ownership and helps deter copying or alteration. In short, a digital watermark marks digital content to protect copyright and to verify the file’s integrity. If the file changes, the watermark changes as well. This enables the copyright holder to detect modifications and confirm authenticity.
What other kinds of digital watermarks exist? One type is hidden. This embedded signal makes subtle adjustments to the image, video, or audio signal that are usually invisible to the eye or ear.
For example, small changes can be introduced to the brightness of certain image points. An average viewer might not notice these tweaks. In sound or image, minor distortions can be added in select parts of a recording. They do not affect overall appearance or sound, but technical equipment can pick them up. From these signals, the copyright owner can determine if the content has been copied illegally. Creators claim that such watermarks survive when content is converted to analog formats, for instance when a microphone records sound and then digitizes it again.
Unlike a visible watermark, invisible marks are not removable without specialized knowledge.
Are watermarks used in neural networks as well?
Yes. Watermarks can help establish whether someone copied a neural network and passed it off as their own. A challenge with watermarking AI is that neural networks are multi component systems, which makes tracing the origin of specific algorithms or code difficult. Stolen models also often undergo changes; attackers use methods to obscure the link between a stolen model and its original source.
Still, many watermarking approaches have a major drawback: their behavior can be unreliable during theft attempts because the watermark may be affected by tampering with the model’s functionality.
Why do attackers steal neural networks? One primary motive is to gain a competitive edge or to advance in a particular domain.
Stealing a model can bypass lengthy cycles of architectural research, development, training, and testing. It may also grant access to confidential information processed by the network, including sensitive data used in banking or biometrics.
How does theft occur? The exact copy of a model can reach an attacker through an intrusion into its creators’ servers. This may involve breaking into infrastructure or manipulating people through social engineering.
Often the attacker knows little about the model’s structure or the data it was trained on. A familiar example is a widely available model like ChatGPT. In such cases, hackers steal the network’s functionality—the way it’s trained to perform tasks, such as generating text or recognizing objects—without owning the underlying design.
How does this happen in practice? The most common theft method involves distilling information about a finished model and then retraining the original version on new data, which masks the original methods of construction.
A typical scenario describes a user learning a bit about the architecture or dataset, taking a smaller model, retraining it, and using it to build a new commercial model. This can bypass the copyright holder’s licenses by exploiting custom data sets. The process often involves feeding objects to the neural network and training a backup model to mimic the heavier, more precise one.
In this way, a smaller student model repeats the behavior of a larger teacher model, achieving similar results with a simpler architecture. It can become faster and smaller, though sometimes the quality of the work declines slightly, which still presents a risk for copyright owners.
In these cases, watermarks may disappear altogether during theft attempts.
What other avenues can attackers use to remove signs? One approach is a pruning attack, a method to reduce memory use and computation in a neural network. A network may contain many parameters, but often not all are necessary for good performance. Reducing parameters can lower memory needs, particularly on mobile devices, and can change how the model processes data, sometimes resulting in the loss of a watermark or even the brand itself.
If a digital watermark is lost during such attacks, how can one detect that a neural network was stolen? A new marking method has been developed. It embeds unique trigger datasets into the AI model, remaining intact even after theft. In practice, this method has shown high effectiveness in real world tests and has outpaced some foreign approaches in the United States and South Korea. The trigger dataset comprises input examples for which the network is forced to produce predefined outputs, such as labeling a batch of cat images as dogs. Each network receives a distinct marking key that is easy to verify at validation time.
These watermarks reveal themselves through a distinctive model response during a defined validation routine. The approach works on any model without sacrificing performance and with minimal computational load.
Are there downsides to this watermarking approach? A major challenge is that the watermark must be large enough to deter misuse, yet not so large that it hinders performance. When attackers try to remove a watermark, they must run more training cycles, increasing the number of proxy models and making the attack more labor intensive.
Is it possible to detect stolen code by reading watermarks? Markers can be present, but the code may be rewritten in another language, making attribution difficult. For this reason, many view these digital watermarks as more helpful for protecting data flows within an enterprise than for proving plagiarism in software. Still, they offer a useful line of defense for safeguarding proprietary systems and processes.
On the international stage, there is discussion about watermarking AI generated art. The idea is viable when the network is free and publicly available. For text produced by AI systems like the popular language models, watermarks are plausible but raise questions about authorship. When it comes to images or music, concerns linger about who holds the rights the user or the AI. Today the legal framework around AI remains unsettled, so rushing to watermark every file is not wise. [1]