Long story short – the most bizarre Text-To-Speech application in China is on short-video platforms

August 12, 2022 0 Comments

It is long known that short video platforms in China (Kuaishou and Douyin) have much different content than their global counterparts such as TikTok. There’s one specific genre of content that is especially popular on Chinese short video platforms — movie narration.

Such videos usually condenses a two-hours long movie to about 10 minutes worth of content, and divide the content into several episodes. The movie narration videos are usually composed of background music, key scenes from the movies, and a voice-over explaining the plot.

It is typical for creators to split one movie into three episodes, with 2-3 minutes each. 

It is hard to pinpoint when exactly did these types of videos began trending or who was the first to create such videos. What we do know is that as these videos become increasingly popular, instead of showing diversity and creativity, they are starting to look like peas from the same pod.

Almost all content creators who produce such videos use the same lingo — “Xiaomei” is for the female protagonist, “Dazhuang” or “Xiaoshuai” is for the male protagonist, “Kalemi” is for the protagonists’ friends or sidekick. “Xiaomei”, “Dazhuang” and “Xiaoshuai” are all common names for the Chinese, similar to John, Jack or Jane.

In addition, these movie narration videos tend to use the same funny, catchy, yet meaningless phrases to transition between scenes. Such as:

“If there are no surprises, there will be surprises.”

“The little girl who looks like a little girl is actually a little girl.”

“The woman who looks confused is confused.”

Not only do these videos share similar lingo and style, they also share the same voice — many of these movie narrations are not actually narrated by real humans, but are narrated with Text-to-Speech services such as Microsoft Azure and Alibaba Cloud’s smart voice interaction service.

Screenshot of Alibaba Cloud’s smart voice interaction service.Screenshot of Microsoft Azure’s Text to Speech service.

Although the cheerful and articulate AI voices are not able to capture the nuances of the mood which the movies portray, they are able to enunciate the script clearly. Views can understand the narrations well, and creators can save time and efforts.

It’s hard to imagine these videos are heavily depended on timeliness, but they are — viewers are more likely to see Netflix’s new fantasy drama The Sandman or the new Korean drama Extraordinary Attorney Woo on short video platforms instead of Netflix, which begs the question – why are movie narration videos only popular in China, but rarely seen on YouTube or TikTok?

A defining factor is copyright. For example, YouTube has a Fair Use policy and uses a “Content ID” solution to scan videos uploaded to YouTube for matches of existing copyrighted files, including music tracks, snippets of copyrighted programs, or videos made by other users. Such measures would prevents content creators to make movie narration videos which are entirely composed of copyrighted movie scenes.

Copyright infringement are dealt with less severely on short video platforms in China. In 2021, the China Online Audio-Visual Program Service Association released a “Rules on Content Audit Standards for Short Video on the Internet” which addressed the growing concerns of copyright infringement. But video creators have been able to come up with new tricks to avoid such problems, such as not uploading videos directly to Douyin but to mini-programs inside Douyin.

In order to solve the problem once and for all, recently, Douyin has entered an agreement with the Netflix-like long video streaming platform iQiyi, which would allow content creators on Douyin, Toutiao, and Xigua Video to use iQiyi’s content and post them on the aforementioned platforms.

The agreement might finally put an end to the short video copyright turmoil, and welcomes a new future for content creators. However, not everyone is happy about the prospering of movie narration videos. Some viewers considers this particular form of entertainment to be spoiling the beauty of films.  

“Movies are complex and delicate, and this kind of compressed interpretation is very violent and permanently destroys the most important first impression of the movies. Filmmaker loves the art, and thus goes to make the film, to work out the details, to express the emotions, to make such a “sculpture”, but it is then taken away and chipped into a crude rock. Anyone who has a pursuit of beauty will be saddened.” — Film lover @LightOS

Photo by Volodymyr Hryshchenko on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *