Neural networks continue to put more and more professions out of work. This time, Heygen’s neural network was infected, which translates the text of the video and then speaks it in the voice of the hero of the video, while mimicking the movement of the lips. Naturally, narrated meme videos began appearing on the Internet again.
Heygen Labs works according to the following principle. After downloading the video, the service uses a single neural network to “listen” to the video and translate the voices into text. Then a special module translates the text into another language (currently eight languages are available), and then another subroutine interprets everything, preserving the timbre, accent and other characteristics of the original voice. Finally, the last “neuron” is concerned with “lips” – it ensures that the movements of the lips in the frame correspond to the spoken words.
Have you already used neural networks?
Instructions for working with Heygen Labs
So here’s a quick guide to working with Heygen Labs:
- Register in the service;
- Prepare a video with a resolution of 360×360 to 4096×4096 pixels and a duration of 30 to 59 seconds. You will have to pay for more timekeeping. You can see other requirements by hovering over the Requirements label. You can edit such a video for free.
- Upload the prepared video by dropping it into the Drop to Upload window or by clicking on the same panel and selecting the video via File Explorer.
- Select the language you want to translate the video into and click “Send”.
- Wait for the processing to complete and download the file.
Important! At the time of writing, the popularity of the service has turned it into a cruel joke: the owners’ equipment cannot cope with it, and the number of videos in the queue varies from 25 to 150 thousand, depending on the time of day.
After a while you will see the inscriptions Queue and, just below it, Upgrade to skip the queue. For example, the service tells us that we are “stuck” in a queue, which we can skip by paying a paid fare.

It was not without reason that we mentioned the requirements – one of our processing failed. The program generated the following error: Angle of view is too wide in one or more frames of your video. Try a video with faces looking straight at the camera. (Funds refunded). In some frames the face was at too much of an angle and the program was unable to ‘attach’ a mask to it. Keep this in mind when selecting videos.
Meme examples of how a neural network works
Now to the results of the work of Heygen Lab – let’s start with the legendary Natalia “Marine Corps”. The woman in the video spoke German.

And here you see how the program handles the nuances of the Russian language. The English translation of the meme “We don’t know what it is, if we knew what it is…” was even funnier than the original.

Not without the “ulta” of the Russian meme internet: going to the river. Maybe Christopher Nolan will come across this video and make a second Interstellar?

Of course you can’t ignore the respected Evgeniy Ponasenkov. In German, ‘I will play out and destroy’ sounds even more convincing.

Let’s move on to the serial classics. I wonder if foreigners will be able to appreciate the beauty of cutlets with mashed potatoes?

Since we started talking about cinema, it was impossible to forget the legendary monologue “What is the power of a brother?” to ignore.

The neural network not only handles translation well, but also preserves the voices of the video characters. We are sure that, even without seeing the photo, you will immediately guess that we are talking about borscht with cabbage, but not about red.

The neural network can adapt to a truly wide range of voices. For example, here is Nikita Mikhalkov.

But Dmitry “Goblin” Puchkov, known to many, tells an anecdote, but to a foreign audience.

The neural network cannot simultaneously show two different voices in a video, nor can it always deal with faces when the light falls at a sharp angle.

And although the neural network processes videos that contain extraneous noise or the microphone quality leaves much to be desired, it is difficult to extract it.

But still, one cannot help but admit that AI conveys the timbre and characteristics of the voice well. Here is an example of voice acting in Polish, the language clearly lacking a variety of profanities.

Many people have probably already forgotten this meme, but the famous man from the 2000s, who was on his way to success, also spoke in English.

The makers have also not forgotten the famous dialogues from cartoons. True, the quick speech in the cartoon “Wow, a talking fish!” The neural network didn’t sound in the best way.

And how melodious Russian memes sound in French! Here it’s time to remember the recent trend of several characters speaking with the appropriate accent: “I’m in Paris.”

As you can see, under certain lighting a lip mask is visible – for some people a line is visible at the level of the nose that separates the original video sequence and the superimposed image. Also, the neural network cannot yet cope with rapid movements – when the head jerks, the mask does not keep up with the hero of the video and begins to shake a little.
As for the voice acting, it all depends on the sound quality of the original video. If the song in the source was of high quality, the neural network does an excellent job, preserving the original speech and style of speech.
Currently, announcers and voice actors can still compete with the technology. Lipsing cannot handle dynamic objects and artificiality creeps into the synthesized voice. But this is still an incredible result for a technology that has actually only been in development for a few years. What do you think neural network developers will come up with next?
Are voice actors in danger?
Source: VG Times