AI Creates Facial Animation From Audio | Two Minute Papers

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ม.ค. 2025

ความคิดเห็น • 366

  • @TwoMinutePapers
    @TwoMinutePapers  7 ปีที่แล้ว +39

    Our Patreon page and the newest post on empowering research projects are available here:
    www.patreon.com/TwoMinutePapers
    www.patreon.com/posts/14199475

    • @polychats5990
      @polychats5990 7 ปีที่แล้ว +1

      Two Minute Papers I still want the old "see you next time" back :/

    • @skyr8449
      @skyr8449 7 ปีที่แล้ว

      Nice, I wish I could support you, I will someday when I have the chance! :P

    • @rismasuherja
      @rismasuherja 6 ปีที่แล้ว

      if i become patreon, woul i get how to implement thia paperwork to blender program or maya , that's mean i can do how implement this paper to our work?

  • @eddiegohwj
    @eddiegohwj 7 ปีที่แล้ว +411

    when Avatar 5 come out in 2025:
    Director is AI
    Actors are AI
    Voice over AI
    Storyline scripted by AI
    Music composed by AI

    • @joech1065
      @joech1065 7 ปีที่แล้ว +46

      Goh And AI is produced by AI.

    • @effortless35
      @effortless35 7 ปีที่แล้ว +69

      By 2045 the viewers will be AI as well.

    • @Jjunior130
      @Jjunior130 7 ปีที่แล้ว +20

      and the AI will be AI

    • @JuddMan03
      @JuddMan03 7 ปีที่แล้ว +5

      Goh but profit is still paid to film studio

    • @kebman
      @kebman 7 ปีที่แล้ว +12

      Humans are obsolete.

  • @TylerMatthewHarris
    @TylerMatthewHarris 7 ปีที่แล้ว +438

    Well there goes video being admissible in court, lol

    • @davidwilliams9534
      @davidwilliams9534 7 ปีที่แล้ว +13

      Nice idea

    • @tesseracta4728
      @tesseracta4728 7 ปีที่แล้ว +33

      You can tell whether something is AI generated still. Algorithms can predict whether an AI has tampered with it.

    • @RoflZack
      @RoflZack 7 ปีที่แล้ว +21

      Obviously this is a hyperbolic comment but the scary thing is this ai generated faled video is only going to get better and we may one day see a time when video not being admissable in court is a real problem. Of course there is the hope that AI that is able to tell real from fake video will grow up along side these advanced fakes but what about fake news. With a more advanced version of this you can make it look like a politician did or said whatever you want. Crazy.

    • @RuilinLinRyan
      @RuilinLinRyan 7 ปีที่แล้ว +7

      Tesseract A You can still fool the public lol. How many ppl are going to be aware of such algorithms or such technology

    • @amiththomas3884
      @amiththomas3884 7 ปีที่แล้ว +3

      +Ruilin Lin
      Yeah, that's the thing. This can (and 100% will) be used to deceive people, and it will work atleast for some time, depending on the situation.

  • @adamrath7095
    @adamrath7095 7 ปีที่แล้ว +90

    I spent dozens of hours and thousands of dollars learning to do this BY HAND. I'm a fool, but this is still astounding.

    • @joech1065
      @joech1065 7 ปีที่แล้ว +38

      Adam Rath But people like you will probably be the first ones to apply this tech though, because you are already knee deep in this.

    • @Abdullahjimmy
      @Abdullahjimmy 7 ปีที่แล้ว +7

      Adam Rath you are awesome

    • @ckmoore101
      @ckmoore101 6 ปีที่แล้ว +2

      Dozens of hours? How did you ever survive? What a trooper you are..... I have literally spent hundreds of hours taking a dump.... yep, still smells.

    • @hj2479
      @hj2479 3 ปีที่แล้ว

      @@joech1065 While that may be true, it will be the people who understand how to both apply and slightly modify the tech and understand how to work the art itself to a degree that is sufficient that will be best suited. That is usually how it works, the people who are using the older less efficient often manual methods are aged out or forced to evolve, the technology is pioneered by researchers who are very important and have some of the most interesting jobs, the technology is modified, tailored and applied by the engineers and technology sector who still get payed almost as well as the researchers while having jobs that are often more readily available and with much less time spent on education, then it is rolled out to "techs" or "IT" or some other group that understands how to take the applied and tailored package and use it for creation based on someone else's prompt or vision but doesn't really understand how to actually create that package itself, at which point it will finally be checked by one of the few experienced "artists" (in this example) who did choose to evolve and where experienced and talented enough to get one of the few higher level oversight position providing the vision for the work. That is why there is always more opportunity and work in application and modification positions than there is in the research positions or the oversight/"vision" positions. Generally, the best place to be if you want a stable life with good money is the modification/tailoring positions aka programmers, engineers etc. I choose research and it worked out for me but the rules are different in biology and medicine although even in medicine I would always recommend PA, APRN and Healthcare management (for the business inclined) over med school which is really not something most people want to do once they get to understand what it means. I know this whole thing is a bit confusing but over the many decades in different industries that I have worked in I have seen this trend in so many industries after the tech boom.

  • @NeverInterpreter
    @NeverInterpreter 7 ปีที่แล้ว +63

    Unfortunately, there's no animation for the tongue.

    • @fleecemaster
      @fleecemaster 7 ปีที่แล้ว +34

      yet

    • @LouSaydus
      @LouSaydus 7 ปีที่แล้ว +7

      NeverInterpreter it would be a lot easier to animate the tongue using this method vs standard methods because this one runs off of sound not motion capture. Tongue animation should be relatively simple to add to the network considering it already handles the whole face.

    • @anand.suralkar
      @anand.suralkar 5 ปีที่แล้ว

      Lol

  • @anastasiadunbar5246
    @anastasiadunbar5246 7 ปีที่แล้ว +137

    I thought Obama was real.

    • @fleecemaster
      @fleecemaster 7 ปีที่แล้ว +10

      Wait, he wasn't?

    • @rachelslur8729
      @rachelslur8729 6 ปีที่แล้ว +3

      Maybe republicans who claimed Obama was Satan weren't so wrong after all...
      🎵Tun-tun-tuuuun 🎵 😱

    • @froyorex4856
      @froyorex4856 5 ปีที่แล้ว +5

      It's Obunga not Obama.

    • @123reivaj
      @123reivaj 5 ปีที่แล้ว +4

      Is not real, he was always a puppet

    • @OriruBastard
      @OriruBastard 5 ปีที่แล้ว +3

      In truth, he is just deepfaked lizard person. =0

  • @raphirau
    @raphirau 7 ปีที่แล้ว +38

    Remarkable! I am wondering how we will be able to differentiate between original statements and artificially constructed statements in the future, when those techniques get better and better for speech and facial behavior. Interesting and challenging, but exciting times ahead!

    • @TwoMinutePapers
      @TwoMinutePapers  7 ปีที่แล้ว +28

      Indeed. I am quietly hoping that an arms race situation will emerge where there will be "detective" AIs that are specifically trained to identify forged footage. GANs have a similar element with the discriminator networks, which may see some more use in this area.

    • @Nickademas1
      @Nickademas1 7 ปีที่แล้ว

      raphirau in the future all human beings will posess the option and capability to destroy all human beings.

    • @CalvinRRC
      @CalvinRRC 7 ปีที่แล้ว +9

      I can imagine a future where humans are reliant on personal AI "guides" to keep them from falling for fake news/scams generated by other AIs. Who knows if it will actually happen, but technology like this makes it seem at least plausible.

    • @joech1065
      @joech1065 7 ปีที่แล้ว

      raphirau We'll have seamless digital signatures everywhere, both verification and real time signing of data streams. Software will do this for us in the background. If you produce a fake recording and sign it as real, your reputation will suffer if you are caught. Since everyone will be recoding everything (to later feed their life data to personal assistant AIs), it will be hard to forge anything in crowded places.

    • @beAsham3
      @beAsham3 7 ปีที่แล้ว +1

      Detective AI's are meaningless in a society where people want to hear only what they want. People are already being fooled by fake news articles put out by trolls and bots. What's there to stop them from believing that forged footage is real? I'm very concerned that forged footage in the future will be used to manipulate the opinions of the masses for the worse.

  • @femloh
    @femloh 7 ปีที่แล้ว +3

    Besides AI creating UI from Images...This is the second thing I have thought about doing for a long time. Just didn't know how to implement it then. Good to see there are other people in this world thinking like me. This is the most beautiful thing I have seen all day. It's a dream come true.

  • @qloshae
    @qloshae 7 ปีที่แล้ว +1

    This is massive. A well trained facial animator still takes quite some time per facial animation, this does it instantly. You're not going to need animators anymore that are specialize in facial expressions, you just need someone to operate it and make sure it looks alright.

  • @matthewames2276
    @matthewames2276 7 ปีที่แล้ว +96

    I can imagine in the near future, individuals will be able to create full-length high-quality CG films. And instead of animating it themselves they will direct an AI animation program much like a director would in a big budget film.
    Very exciting!

    • @Colopty
      @Colopty 7 ปีที่แล้ว +22

      We'd get absolutely flooded with badly directed movies, but at least we'll probably get a few gems that otherwise wouldn't exist.

    • @matthewames2276
      @matthewames2276 7 ปีที่แล้ว +14

      That's true. But consider that fiction writers could make their own books into movies.
      Also, consider that the less one has to concern themselves with the details of implementing an idea the more room there is for creativity. Imagine if we still had to cut film apart with scissors and tape it back together to make movies. Making any sort of film at all would be a major accomplishment for an individual, forget about directing. But now that we have editing software, we are freed up to spend our thought on bigger things like acting directing and cinematography.

    • @joannot6706
      @joannot6706 7 ปีที่แล้ว +5

      We could tell an AI to make a movie out of any book!!!

    • @imranbug81
      @imranbug81 7 ปีที่แล้ว +2

      Why not A.I directs the movie too, simply based on the movies you have seen so far and the genre you would like to see, personalized movies for everyone.

    • @matthewames2276
      @matthewames2276 7 ปีที่แล้ว +4

      That would be great! Imagine a program that generates a new movie for you every evening. A movie you're almost certain to like. What if it was interactive like a bedtime story? If something happened in the film that you just couldn't stand (say your favorite character got killed off) you could tell it what you didn't like, and to revise the film. And if you really enjoyed the film you could also ask it to make a sequel.

  • @lndozois
    @lndozois 7 ปีที่แล้ว +3

    I very much appreciated the inclusion of the clip at the end (The Russian accented character). It clearly shows that this new technique can not hold a candle to good video captured motion. The audio only technique missed all sorts of subtle mouth movements the actor was making during his delivery and, imo as a former animator, totally didn't sell the Ms and Fs. That being said, the results on the left would be out or reach of most smaller studios, while the results on the right would still be perfectly acceptable compared to most video game character performances. As an independant dev myself, I'm always curious to see the sorts of technologies that can help bring time and costs down while pushing quality up. This definitely seems to fit the bill.

  • @Ludifant
    @Ludifant 7 ปีที่แล้ว +2

    This is amazing, the implications for storytelling and politics are enormous.

  • @martiddy
    @martiddy 7 ปีที่แล้ว +8

    Wow! this paper is so cool, I wonder if this could be adapted to work with non-speech audio. For example, analyzing the sound of a guitar and the AI would animate it.

    • @Blayzeing
      @Blayzeing 7 ปีที่แล้ว

      Almost definitely.

  •  7 ปีที่แล้ว +1

    Amazing! Also congratulations on the mention :)

  • @vijayabhaskar-j
    @vijayabhaskar-j 7 ปีที่แล้ว +2

    So if you apply sentiment analysis to a news article, and give that as input to this network, we can get a 3d character reading an article with expressions!

  • @chaumas
    @chaumas 7 ปีที่แล้ว +2

    This is really awesome. It seems a little Rube Goldberg to do text to speech and feed the audio into the facial animation. I'd think that there are wins to be had if your facial animation algorithm can also take the original text as an input (in addition to the audio). (I guess I'm not positive that they haven't already done that, because the PDF link on the nvidia page is broken.)

  • @SpenserRoger
    @SpenserRoger 7 ปีที่แล้ว

    Dude this is insane. It won't be long till movies have different artists for the actors facial expressions, their voice, and heck their movements too, with just a physical stand in for the 'person'

  • @ArnoldVeeman
    @ArnoldVeeman 4 ปีที่แล้ว +1

    Indeed, for years I am following your channel, read a lot of papers but never really experienced any of these cool products myself.
    Is there a way for you on focussing on stuff that we can really try out too? (It might be an idea for a new channel/sub channel)

  • @notthere83
    @notthere83 7 ปีที่แล้ว

    "I haven't found a single scenario where it didn't come out ahead" - What? You're looking at it! Video-based performance capture is the clear winner. They even say in the study: "As expected, the output of video-based performance capture was generally perceived as more natural than the animations synthesized by our method or the dominance model".

  • @СергейНаврожин
    @СергейНаврожин 7 ปีที่แล้ว

    Games usually combine different techniques to cut the corners. I'd imagine the video capturing could be used for the main actors, audio for the other, and text to speech for the npc's that serve no real purpose. Would be also awesome to have a collection of generic performances stored somewhere, to be able to plug them in your characters. Also, talking "styles", teaching a neuronet how to apply "styles" to body language could be another breakthrough. Randomize them enough, plug into NPC's, and instead of the same identical walking cycles everywhere you get a nervous walker, a happy one, an angry one etc. There should be a research group just for that. We can randomize our characters visually, but generic animations kill that illusion rather quickly.

  • @JayTheYggdrasil
    @JayTheYggdrasil 7 ปีที่แล้ว +27

    Why is it always obama?

    • @Relmono
      @Relmono 7 ปีที่แล้ว +3

      To demonstrate that someone giving a union address could be faked and spread false information

    • @simplyry
      @simplyry 7 ปีที่แล้ว +11

      Possibly due to the large amount of footage readily available with audio, and the head moving very little during speeches.

    • @123reivaj
      @123reivaj 5 ปีที่แล้ว +7

      Because that man spent more time posing in front of the cameras than working, and therefore there is a HUGE amount of audiovisual material from this

    • @anand.suralkar
      @anand.suralkar 5 ปีที่แล้ว

      Lol

    • @vinster9165
      @vinster9165 4 ปีที่แล้ว +2

      Because Obama was a marionette puppet with awesome speech writers

  • @deepc9822
    @deepc9822 7 ปีที่แล้ว +1

    mind blown, already by just the title of the video.. but really nice video summary of the topic as always, quality content!

  • @maxavail
    @maxavail 7 ปีที่แล้ว +1

    At 4:41, the left hand side face (your left, as you look at the monitor) looks more realistic than the other one. Was that what you wanted to show ? Because it seemed to me you were implying that the voice generated model (the one to the right ?) produced using this AI was better than anything else.

    • @Reddles37
      @Reddles37 7 ปีที่แล้ว +1

      Its better than any other technique doing the same thing. The face on the left is the original performance by a real actor, obviously it will be better than the computer generated version.

    • @maxavail
      @maxavail 7 ปีที่แล้ว +1

      Actually, I think the face on the left is still artificially generated but based on video footage of real actor, meaning the AI is learning to simulate facial expressions by looking at real video footage, whereas the face to the right is based solely on audio material and the AI is guessing what facial expressions would look like for that kind of speech.

    • @Gunth0r
      @Gunth0r 7 ปีที่แล้ว

      I was thinking this as well.

    • @ZanesFacebook
      @ZanesFacebook 7 ปีที่แล้ว

      maxavail maybe this technology is a bit beyond you then. A real human face is digitized. Then, I think it says, the computer watches nine minutes of that digital footage. And from those nine minutes of footage, it can create "real enough" expressions and lip movements for eternity.

  • @Arjun-jt7yb
    @Arjun-jt7yb 7 ปีที่แล้ว

    Two Minute Papers really awesome,
    you are amazing guys.

  • @jimmwagner
    @jimmwagner 5 ปีที่แล้ว

    US Merch link isn't working. Is there another place I can see what's available to buy?

  • @DaveGamesVT
    @DaveGamesVT 7 ปีที่แล้ว +1

    Really impressive stuff. I'm interested to see stuff like this used in applications like video games.

  • @vipinsingh-dj2ty
    @vipinsingh-dj2ty 7 ปีที่แล้ว +5

    1:56 mann that rick

  • @portiaboadu3430
    @portiaboadu3430 7 ปีที่แล้ว

    how do I download this software

  • @DurganshSharma
    @DurganshSharma 7 ปีที่แล้ว

    Nice work, appreciated

  • @pontosinterligados
    @pontosinterligados 5 ปีที่แล้ว

    I work with CG for about more than 10 years, and also do some academic research. I will read the paper to get the details, but I wonder how people who are trained on lip-sync or even lip reading would agree with the results. I am surely amazed by this research contribution but I have to admit I found some results not so impressive compared to available solutions on the market for low budget CG animations. But just one isolated opinion. Thanks for posting, always!

  • @handet.6235
    @handet.6235 6 ปีที่แล้ว

    Any tutorial or code for this label? It would be great for researches.

  • @micrologi1
    @micrologi1 3 ปีที่แล้ว

    Hi, How can I develop this technology? Is there any script or study material for implementation?

  • @dawkinshater101
    @dawkinshater101 6 ปีที่แล้ว

    i see a lot of converging technologies that in 5-10 years will reach its full potential. first is video graphics, thats inherently getting better and more realistic each year. second is AI driven ficial animations, and third is something like google duplex. if put all together, we're gonna crate a very convincing virtual human.

  • @ivanbreak
    @ivanbreak 4 ปีที่แล้ว

    speech is one of the most difficult animation task I ever encounter

  • @enoch3699
    @enoch3699 7 ปีที่แล้ว

    "Deer fellow scahlers dis is TOO MANY PEPPERS"

  • @Dane411
    @Dane411 6 ปีที่แล้ว

    + AI to estimate environment + best camera location = auto cg movie from dialogue script

  • @LE100u
    @LE100u 5 ปีที่แล้ว +1

    what is it going to be used for?

    • @cagsie3958
      @cagsie3958 4 ปีที่แล้ว

      Subterfuge. Soon we'll only trust people in face to face conversations.

  • @tissuepaper9962
    @tissuepaper9962 6 ปีที่แล้ว

    They really should have tried to capture the teeth and the movements of the tongue. They could make their virtual actors much easier to lip-read if they were to also model the tongue and teeth.

  • @noimodimi9020
    @noimodimi9020 7 ปีที่แล้ว +29

    Barack Obama singing somewhere over the rainbow...
    The future is beautiful.

    • @avi7278
      @avi7278 4 ปีที่แล้ว

      I'm going to match trump's face up to his grab her by the pussy audio.

  • @TCDooM
    @TCDooM 7 ปีที่แล้ว

    Awesome. Thank you. You are the man!

  • @maniacmekanik69
    @maniacmekanik69 6 ปีที่แล้ว

    Where's the app !?!

  • @MushookieMan
    @MushookieMan 5 ปีที่แล้ว

    AI flapping an open mouth isn't going to replace animators and mo-cap just yet.

    • @cagsie3958
      @cagsie3958 4 ปีที่แล้ว

      This footage is 3 years old!

  • @Metalefs1
    @Metalefs1 5 ปีที่แล้ว

    What a time to be alive

  • @MESHQuality
    @MESHQuality 7 ปีที่แล้ว

    This is so awesome!

  • @rajeshgupta1055
    @rajeshgupta1055 7 ปีที่แล้ว +1

    Dammit! AI is reaching phenomenal heights

  • @saemranian
    @saemranian 4 ปีที่แล้ว

    Great, thanks for your sharing

  • @digitalsoultech
    @digitalsoultech 7 ปีที่แล้ว

    So basically in 5 -10 years disney-pixar won't need voice actors or face actors anymore. Upload your script and the AI does the rest.

  • @weishenmejames
    @weishenmejames 7 ปีที่แล้ว

    Astounding!

  • @aspie96
    @aspie96 7 ปีที่แล้ว

    It should also guess the emotional state from the audio itself.
    That would be really cool.

    • @aspie96
      @aspie96 7 ปีที่แล้ว

      Are you one of the researcher?
      If so, what you did is really cool!

    • @aspie96
      @aspie96 7 ปีที่แล้ว

      What are you doing?

    • @fleecemaster
      @fleecemaster 7 ปีที่แล้ว +1

      He's working on killer robot assassins, so thanks for that...

  • @davidcripps3011
    @davidcripps3011 4 ปีที่แล้ว

    The computer generated voices still sounded like computers. A long way to go there....but the face animations were impressive

  • @newgamedk
    @newgamedk 7 ปีที่แล้ว

    Incredible. And scary.

  • @kim15742
    @kim15742 7 ปีที่แล้ว

    3:15 Hey, that's awesome!

  • @enchanted_swiftie
    @enchanted_swiftie 5 ปีที่แล้ว

    HOW TO LEARN ?
    I want to join.

  • @Afzalive
    @Afzalive 7 ปีที่แล้ว

    Ah! You should've had a thumbnail that read the audio from this video as a narrator :D

  • @raybbo
    @raybbo 7 ปีที่แล้ว +1

    Now play some drumm and bass through that ai for some beatboxing

  • @screaminlordbyron7767
    @screaminlordbyron7767 3 ปีที่แล้ว

    The mouth movement is still extremely fake looking. I noticed the lips were closed on the n sounds and some other fundamentally inaccurate things going on. This is the best I've seen but still huge room for improvement. Still absolutely amazing we have come this far! Looking forward to seeing it two papers down the line:)

  • @caner19959595
    @caner19959595 7 ปีที่แล้ว

    Somewhereee out of my window.. love that track, wasn't expecting to see in an AI-themed video

  • @bytler4518
    @bytler4518 7 ปีที่แล้ว

    Wow, amazing!

  • @Earzone63
    @Earzone63 7 ปีที่แล้ว

    I literally had this idea in the bath this morning (but for Ali G not Obama lol), cheers.

  • @Joyce-he9vm
    @Joyce-he9vm 7 ปีที่แล้ว

    amazing, unbelievable

  • @Argoon1981
    @Argoon1981 6 ปีที่แล้ว

    The remedy engine show case proves that this still doesn't achieve the desired effect every time but for games you don't need to be perfect and for indie developers without money to pay actors this kind of AI would be a god send.

  • @ronex1710
    @ronex1710 7 ปีที่แล้ว +2

    The technology is remarkable but on the dark side it has HUGE dangerous implecations

    • @Felhek
      @Felhek 4 ปีที่แล้ว

      Just like a knife
      You can prepare meals with it, but also you can kill someone with it.

  • @Sutanreyu
    @Sutanreyu 7 ปีที่แล้ว

    It's good considering it's just audio, but video-captured animation produces more accurate mouth movements... You can nearly lip read.

  • @sasaha8389
    @sasaha8389 4 ปีที่แล้ว

    yes this makes animating waaaaay easier :D

  • @MrChandre93
    @MrChandre93 7 ปีที่แล้ว +1

    Amazing work! Looks like a fake news dream though.

  • @YouLoveMrFriendly
    @YouLoveMrFriendly 7 ปีที่แล้ว

    Happy to give you 1k 👍

  • @WhiteDragon103
    @WhiteDragon103 6 ปีที่แล้ว

    They really needed to include the tongue in the animation, because the AI seems compelled to explain everything it hears through the lips, making G and N phonemes look like M visemes.

  • @bucklogos
    @bucklogos 5 ปีที่แล้ว

    3:35. The bottom right one is based on Stallone isn't it?

  • @adamjones6473
    @adamjones6473 5 ปีที่แล้ว

    I think the tongue is part of what sells the effect

  • @dan339dan
    @dan339dan 4 ปีที่แล้ว

    I think the AI has trouble dealing with nasal consonants. It closes it mouth confidently when pronouncing /n/, but that would produce /m/

  • @RichardServello
    @RichardServello 7 ปีที่แล้ว

    This is slightly better than what is already used in video games. It's a start. Nothing really that incredible tho.

  • @alialtaf3412
    @alialtaf3412 7 ปีที่แล้ว

    This can helpful for video game development in so many ways.

  • @smugfrog1041
    @smugfrog1041 5 ปีที่แล้ว

    TH-camrs who don't their faces to show can use this, and if a full-body animationation can be synchronized with a video of a real person, we can have easy-to-make vtubers

  • @bojanstricevic5461
    @bojanstricevic5461 4 ปีที่แล้ว

    A long time ago i thought that they would make a concert of Elvis and him doing whatever song you wanted.

  • @androkon6920
    @androkon6920 4 ปีที่แล้ว

    Painful cowboy chaps. My favorite.

  • @walkertai7220
    @walkertai7220 5 ปีที่แล้ว

    Hi,
    I am looking forward for this solution for a use case.
    Can you help me connect with the right people who can build the whole system?
    Thank you

  • @TeaJayOne
    @TeaJayOne 5 ปีที่แล้ว

    in the Game-Engine example the Video-based performance capture is out of sync with the speech.

  • @antonyharnist8673
    @antonyharnist8673 7 ปีที่แล้ว

    Awesome!

  • @grahamthomas9319
    @grahamthomas9319 2 ปีที่แล้ว

    What happened to this algorithm is available to use?

  • @janehoyken
    @janehoyken 7 ปีที่แล้ว +1

    Bioware should buy this AI and let it work on Mass Effect Andromeda Dialogues

  • @morbid1.
    @morbid1. 7 ปีที่แล้ว

    as 3d animator... I should feel threaten by tech but I can't wait for it.. facial animation is insanely hard and every help that will deal with this tedious process is welcome :F

    • @alialtaf3412
      @alialtaf3412 7 ปีที่แล้ว

      Neurotyczny Kot yes this will make video game 3d animator job easier.. This will not completely take 3d animator job but it will make 90% work for him. For example Mass Effect Anomerda watever facial animation were worst. With this tech those facial movement will be watchable.

  • @DorothyJeanThompson
    @DorothyJeanThompson 6 ปีที่แล้ว

    Ok... I would love to get my hands on this software.

  • @konstantinboets
    @konstantinboets 7 ปีที่แล้ว

    so the overall goal is that human judge virtual as real, right?

  • @stopfidgetting
    @stopfidgetting 7 ปีที่แล้ว +58

    I'm scrolling through the comments and I am honestly surprised by the number of people who seem to think this is a revolutionary new technology like the world has never seen before. Automatic facial animation technology has been around for years, used in games like Mass Effect and The Witcher. There are even systems for sale on the Unity Asset Store. What these people have done is not "Teach a machine to make realistic facial animations." What they've done is "teach a machine to make realistic facial animations better than it could before."

    • @sergusster
      @sergusster 7 ปีที่แล้ว +6

      I know, right?
      And on top of that, developers themselves implicitly showed us a major flaw of this technique in the last example - it only tries to mimic actors performance while there's audio input (naturally). But actor's performance is not just a lines of speeches. There's no way to simulate all the subtle emotions and expressions between the lines.
      But I agree - this technique can replace traditional audio lipsync for game characters to use in cut scenes and dialogs with NPCs

    • @YuFanLou
      @YuFanLou 7 ปีที่แล้ว +5

      Caudex We just saw the speech system crash in the latest Mass Effect installment. This stuff is hard, and has been the signature feature of the two games you have mentioned. But with this tech it might become a commodity. That's the novelty. Also the model receives a emotion parameter, so emotion is not a problem.

    • @anselmschueler
      @anselmschueler 7 ปีที่แล้ว +7

      Actually, artificial Facial Animation in Video Games is mostly achieved by stitching hand-made templates together.

    • @XCSme
      @XCSme 7 ปีที่แล้ว +1

      Caudex yes, but I don't think those were implemented using neural networks, probably they were just manually-created audio to video animations played in sequence for each sound.

    • @57z
      @57z 7 ปีที่แล้ว +1

      The difference here is this is a achieved completely by AI, on the fly. When in years past human intervention was needed to make the effect look polished

  • @HAWXLEADER
    @HAWXLEADER 7 ปีที่แล้ว

    Think about the funny glitches in games!!!! This is gold!
    Just like when ragdolls started and we had a good entertainment!

  • @dawkinshater101
    @dawkinshater101 5 ปีที่แล้ว

    imagine this combined with google duplex combined with photorealistic graphics.

  • @Hyvitetty
    @Hyvitetty 7 ปีที่แล้ว

    I don't think that the video -based performance capture is usually as bad as in the example?

  • @AdamSchelenbergCom
    @AdamSchelenbergCom 7 ปีที่แล้ว

    I've been dreaming about that technology.

  • @abhijeet800
    @abhijeet800 4 ปีที่แล้ว

    Can't sleep after watching this 2:23.too exciting and scared.

  • @vladark138
    @vladark138 7 ปีที่แล้ว

    I hope Star Citizen will have something even better.

  • @awaken69
    @awaken69 7 ปีที่แล้ว

    the problem is, the tongue is still not visible/animated..

  • @johannesschroter8984
    @johannesschroter8984 7 ปีที่แล้ว

    AI can show more emotion than Kristen Stewart.

  • @ronpaulrevered
    @ronpaulrevered 6 ปีที่แล้ว

    I thought the facial structure of the actual speaker was going to be constructed by A.I. by only listening to the voice. That would have been cooler/scarier.

  • @Shadow62x
    @Shadow62x 7 ปีที่แล้ว

    In order to Synthesize a realistic sounding voice... couldn't you just feed a machine with speeches?
    (I'm not familiar with how they work.)

  • @SinanAkkoyun
    @SinanAkkoyun 7 ปีที่แล้ว +8

    I really want to implement this in my game xD

    • @Aku6Soku1Zan
      @Aku6Soku1Zan 7 ปีที่แล้ว

      Sinan Akkoyun are you working on something?

  • @romeop.1245
    @romeop.1245 7 ปีที่แล้ว

    I really hope Bethesda can use this for Elder Scrolls VI

  • @kirangouds
    @kirangouds 6 ปีที่แล้ว +1

    3:36 5th model looks like Jeff Bezos :-P

  • @Insectula
    @Insectula 4 ปีที่แล้ว +1

    OK. So it's very cool. But here we are 3 1/2 years later and what do we see? Squat. There are a handful to audio to lipsynch and text to lipsynch offerings, but they are no where near this quality. Now I know it takes a long time to make it from the lab to the marketplace, but for God sake this is Nvidia, not some university. I love Two Minute Papers, but sometimes it can be really depressing seeing all these wonderful things that we never see in the wild.

  • @Maouww
    @Maouww 7 ปีที่แล้ว

    Whoa!!! Animation Studios are going to be able to pump out movies SO MUCH FASTER now!! They can skip all the lip syncing!!

  • @ItsDragonsAllTheWayDown
    @ItsDragonsAllTheWayDown 4 ปีที่แล้ว

    It would be amazing if the process could be sped up enough to provide accurate lip-sync for people talking to each other in vr, like vr chat. There's a little bit of lip-synced vr right now, but it's not exactly realistic.

    • @CoLovecraft
      @CoLovecraft 4 ปีที่แล้ว

      There might be a better solution if VR headsets came with face pointing cameras for motion tracking.
      Star Citizen uses something like it and it's amazing.
      I think I can definitely see this in an Elder Scrolls or Mass Effect though... Oblivion did something really interesting by trying to get the AI to talk to each other (which failed hilariously) but I think they should try again with methods like this to lighten the load.