AI begins to inject soul into digital people

巴比特_

2023-05-15 03:21:16

Author: Freddie

Data support: Pythagorean big data (

Source: Gelonghui

Image source: Generated by Unbounded AI tool

In the past few days, domestic "unpopular singers" have become popular again with AI cover songs.

Netizens at Station B used the AI model to generate the AI clone of singer Stefanie Sun, which is actually similar to "Lin Chiling" and "Guo Degang" on the car navigation. Take the singer's own audio to train, and generate a "Sun Yanzi" with exactly the same timbre.

Then, the wish of letting idols sing their favorite songs was realized in such a weird way.

Source: Station B

Overseas gameplay is even more outrageous.

A 23-year-old female Internet celebrity in the United States with 1.8 million fans reproduced her AI clone with GPT-4, and then fell in love with more than 1,000 netizens at the same time, charging $1 per minute.

In just one week, she raked in $71,600.

**And it is far more than that, now her "boyfriends" are still increasing rapidly, and they have increased to nearly 10,000 people, and they have been waiting in line to "fall in love" with her for as long as 96 hours. **

Some analysts believe that according to the trend, it is not difficult for her to earn 5 million US dollars a month. If it is not for technical limitations, the money she earns may be an astronomical figure.

These phenomena can already fully demonstrate that an era of new technology dividends brought about by AI+ is rapidly approaching.

01Digital human army pours into the live broadcast room

Live delivery with goods is the most potential scenario for the realization of virtual IP.

Generation Z’s use of tools such as social networking, video, and online shopping is far ahead of the average level of the entire network. The new gameplay combining digital humans and virtual spaces has brought them a sense of experience and interactivity. Most of the core fans of virtual IP are from 18- 24 year old young group.

This trend allows brands to see an opportunity to make products break through circles.

In 2020, virtual singers Luo Tianyi and Le Zhengling will come to the Taobao live broadcast room to bring goods for Bausch & Lomb, Midea, L'Occitane and other brands. The number of viewers of this live broadcast was as high as 2.7 million, and nearly 2 million people gave rewards and interactions.

This sparked a huge discussion at the time, and the era of live broadcast by **virtual digital human beings has come. **

And with the upsurge of artificial intelligence large-scale models set off by chatGPT this year, the field of virtual digital humans has ushered in another round of more majestic explosions.

A large number of virtual digital people began to crowd into the live broadcast room.

In April of this year, the virtual digital human "朏朏" of Tianyu Digital has completed the ChatGPT model access and completed the live broadcast debut. It can not only respond to customers' questions in real time, but also answer different questions independently.

Source: Douyin

In the short video, there have been many scenes like this: The entire office of a company is empty, only rows of desks with dozens of AI beauties on computer screens are live broadcasting.... ..

Source: short video

This company can live broadcast for a day only by relying on 2D super-realistic avatars, scripts and backgrounds prepared in advance. **Although the individual effects are not as good as live broadcasts, the cost is ridiculously low, and it can also be copied at ultra-low cost , the victory lies in winning by quantity, and it is not impossible to really lie down and let AI make money. **

All of this shows that a revolution in content efficiency based on "AI+" is breaking out in the field of digital humans.

Virtual digital humans are divided into many types according to production technology, application scenarios, and image characteristics.

Different from 3D popular idols, most of the 2D realistic digital people who speak in the live broadcast room are based on real-life prototypes, and their voices and expressions are reproduced. Soul IPs.

A virtual anchor can carry out 7*24 live broadcast work, on call, without worrying about the problem of overturning human settings, which reduces the labor cost in live broadcast operations.

Behind this, AIGC is reshaping the production process of digital humans.

The production of 2D digital humans uses deep learning. It only needs to determine the image design. After the graphic and audio data are collected and preprocessed, they are uploaded to the model for training. Compared with 3D, the production method is simpler and more standardized, and the production efficiency is continuously improved.

This method of factory assembly line has greatly shortened the production threshold, cost and cycle of digital humans.

**At the same time, with the technological iteration and cost reduction at the production end of the industry, digital human solutions for small customers have also begun to emerge. **

In April, Tencent Cloud released the digital human production platform. Digital avatars who need to generate real-life images can purchase services on the platform. Just upload images, audio and video to realize customization, and charge according to different timbres and video durations.

In addition to making digital humans, it can also provide a complete set of digital human live broadcast solutions. The functions include real-person audio taking over the live broadcast room, and obtaining intelligent replies to user comments. The price only needs a thousand yuan.

**And some channel operators even sell AI anchors in packages for less than 200 yuan. **

Most of these anchor images come from the authorization of model agencies. Although the quality of the broadcast is rough, there are obvious traces of cutouts, and the voice has no characteristics.

But for ordinary small and medium-sized enterprises, there is no big V to bring goods, and they can’t afford to spend millions to customize high-end IP. If they want to reduce operating costs and start volume quickly, a few thousand yuan digital person is enough.

Digital Human Studio, Siji

If an enterprise wants to modify the code by itself and produce different digital humans, it can also purchase the source code from a technology provider. Digital people live broadcast and make money.

But the story of the AI digital human is not over yet.

02 AI Infused Soul

In the report "China's AI Digital Human Market Status and Opportunity Analysis", IDC divided digital humans into five levels. At present, the development of digital humans has gone through the stages from manual production to AI modeling. They initially have a human appearance, but they can only do Simple interactive decision making.

When the level of intelligence reaches L4 and L5, AI-driven digital humans can take over most scene decisions and support more modal real-time interactions, similar to Iron Man's personal AI butler "Jarvis".

**The development of virtual digital humans in the past 30 years, technological development and market demand have basically evolved around two points, one is visual effects; the other is interactive experience. **

Virtual digital humans first appeared in games, animations, and movies, echoing the emotional connection to trendy things and the extension of IP value.

Early character-type IPs were hand-painted, and the actions had to be drawn one by one. In 1982, Hayashi Minmi, the heroine of the Japanese animation "Macros", became the first virtual singer to release a music album.

The first generation singer Lin Mingmei

In movies, the image can be modeled by computer, but the action has to be done by humans. CG technology and motion capture and other technologies are gradually becoming popular. Relying on green screens and capture equipment, actors can become any role.

After the millennium, from "The Lord of the Rings" in 2002 to "Avatar 2" last year, the rendering effect of characters has been meticulous, which has brought great convenience to artistic creation.

Lord of the Rings 'Gollum' image captured by real people

So far, the development of digital human technology has been approaching the limit of being more like a "human" step by step. It not only requires the appearance and visual effects on clothing to be close to reality, but also includes driving (presenting real and delicate expressions and movements) and rendering (making The picture is more detailed and real-time).

However, I still feel like something is missing.

In 1970, Masahiro Mori, a Japanese robotics expert, put forward the theory of "Uncanny Valley". Because robots are similar to humans in appearance and actions, humans will have positive emotions towards robots.

When robots and humans reach a certain level of similarity, even the slightest difference will be magnified and bring negativity and resentment. And when the similarity between robots and humans continues to rise, humans will return to positive feelings about them.

Different from movies, application scenarios with stronger social attributes have higher requirements for real-time interaction of digital humans, and are not just satisfied with making a good-looking "vase".

**In terms of interaction, natural language models fill the gap. **

GPT, whose text generation ability has amazed everyone, has added two bars to the "IQ" of digital humans.

The large NLP model is the technical cornerstone of AI-driven virtual humans. Simply put, it enables virtual digital humans to speak eloquently, reduce the production cost of standardized content, and be trained to play roles such as intelligent customer service, moderator, and tour guide. In the long run, with the improvement of personalization and emotional understanding, providing companionship and care for the elderly and becoming a "personal teacher" for children will also be realized.

In addition, mouth movements can also be driven by AI to establish a mapping relationship with the text. As the level of realism increases, micro-expressions will become more abundant. When speaking, the expressions and mouth movements can match. "Human-like in form" and "human-like in spirit".

Xinhua News Agency: The world's first digital astronaut: Xiao Zheng

Midjourney is an explosive product based on the diffusion model. It was released in July last year. It trains AI painting through text input instructions.

A couple, illustration by Midjourney

** Some organizations have calculated that its current annual revenue has reached a scale of 100 million US dollars. **

There is an up master of station B who used Midjourney to restore the realistic image of grandma. From the appearance, the aging skin lines and white hair are full of details, plus the past audio to reproduce grandma’s voice, and finally generated grandma’s image through D-ID Digital double.

In order to make up for the little regret of not saying goodbye to grandma, the grandson started a dialogue with the digital "grandma" in front of him, and "grandma" responded cordially, which was actually the answer material provided by ChatGPT.

Source: Station B

It is through these more and more attempts at the client end that in turn, a large amount of training materials are continuously added to the "personification" of AI, which accelerates the enrichment of AI model data, and finally allows AI to inject soul into digital humans and bring greater possibilities.

03 Epilogue

The exploration of virtual avatars has continued for more than 30 years. From hand-painted to human-driven, and then to AI-driven, the advancement of realistic and interactive experience has opened up a wide range of application scenarios, trickling into the ocean.

AIGC helps fully digitize the appearance, voice and other characteristics of ordinary people, and the lower production threshold opens up the imagination space of the market.

IDC predicts that by 2026, the market size of China's AI digital human will reach 10.24 billion yuan, but how good the experience it can bring us determines the ultimate fate of the digital human.

At the same time, AI digital human has the potential of the next-generation human-computer interaction portal. In the future, we may no longer face cold screens, but lively digital human beings. participants.

Perhaps as the godfather of AI said, human beings are just a transitional stage in the evolution of intelligence in order to create digital intelligence. Now we finally have a digital clone that looks like a human being, talks like us, and may think like us in the future. (full text)

View Original

The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.