The demonstration of emotional nuance in synthesized speech is a game changer. Imagine the potential for storytelling and immersive experiences, AI is truly blurring boundaries here.
I think humans are better storytellers. That would be the real game changer when AI cleans your house and cooks for you while you can do all the fun like creating stories
@@hagis23 nah, I've been using small models like Mistral to have them run text-based adventures. Even with all their current limitations, they are at the same level if not better than most experienced humans at storytelling.
@@OnigoroshiZero honestly I don't trust your ability to determine if that's the case - most people don't understand what makes a great story vs what makes a passable one
I'd guess that both Nvidia and Meta are going sit on these things until they see how the copyright lawsuits against Suno, etc pan out. Patents typically only stifle innovation for the benefit of the few "inventors" (who rarely make significant novel changes that would not have eventually been devised by others). In fact, the "first to file" is proof that is the case, as a number of inventors may be working on the same invention with the same ideas, but only the first to file the patent is credited and* "owns" the idea despite the efforts of others who may have been days away from, or had already arrived at, the same solution. As much as I understand protecting artists somewhat, copyright isn't significantly better, especially with how much their artistry tends to appropriated as industry profits by corporations.
1:35 This is the most impressive part to me. An angry voice making a declarative statement like that with those emphasis pauses is super realistic and I'm impressed the AI replicated not only the sound, but the pacing as well. What a time to be alive! 3:54 I think the correct approach here is for Dr. Zsolnai-Fehér to release a hit single song of his own so he can use his song whenever he wants :D
I hate that AI research is getting wasted on replacing jobs that never needed replacing in the first place. So much for that future where mundane work is automated and people can focus on art and other things. Guess we're getting the complete opposite reality, just fuckin depressing
@@CrispyMuffin2it’s simply a matter of complexity unfortunately. The most readily consumable information here is speech, text, art, and the likes. Information that doesn’t require rigorous logic or complex relationships, nor a connection to the physical world like robotic, autonomous vehicles, etc. If somehow it were easier to develop machine learning models for complex scientific research than art, speech, and text, it would’ve happened that way
@@crazyfrogmixSo instead of finding an actual thinking, feeling human being who can be a equal partner to you, someone you admire and share interests with, you’d rather download a computer program and instruct it to pretend to love you? You want a make believe girlfriend who looks and talks like a doe eyed anime character and whose entire existence revolves exclusively around pleasing you? Instead of dating an independent person with her own life, goals and inner world, you want software that superficially simulates the facade of your dream girl? a pet wife who wears what you tell her to wear, likes what you tell her to like, thinks what you tell her to think and never speaks out of turn? who has no material needs, no other friends or family, no dreams you can’t delete if you feel they compete with her devotion to you. Rather than your partner making a choice to date you over everyone else because she values you as an individual, you want the shallow, on-demand validation of a glorified tamagotchi whose existence you can entirely remake or erase the moment you become bored with her? How utterly romantic! It’s just like the classic love story ‘I Have No Mouth and I Must Scream’.
It has already begun. (I've seen at least one YT channel posting AI music without saying it is, and somehow noone notices.) I do like that this system has the ability to add to and edit already created music though. I don't like the fact that people enjoy music made from people just telling AI to do everything; I enjoy the human aspect to music and art. Without that I feel like it's pointless in a way. Using AI as a tool to help develop ideas is good but I don't think it should be the idea generator in creative fields. Set AI to solve problems. Art isn't a problem to be solved.
@@smugsenko Oh yeah no doubt. it has begun awhile ago actually, just with the better tech it might get a bit harder to notice if it's legit, since some people will genuinely like it and it will get pushed into recommended feeds
It doesn't. Just because someone gave the AI a quirky commission doesn't mean they're the one who created the output "art". Prompt "artists" tweak & twiddle their digital knobs until something comes out they agree with. That said, the use of AI in audio can really be a game-changer. Imagine being a blind person and having your surroundings narrated to you in real time, or having it read text chats or in-game written dialogue with "emotion."
It doesn't but most people today are lazy lowlifes who love to pat themselves on the back. Besides, it's not like AI started it. Having ghost writers, auto tune etc is all cheating and has been the standard for a while now.
Yea, im suprised how many people are excited about AI today not thinking about cosequences, just the amount of people who will loose their jobs. Also much how it will destroy human crativity... Why be talented and learn anything when some guy who doesnt know thing about music can generate it with AI
@@punkskaska Yeah, music means nothing today, from the moment we "crafted" music instruments. Creativity was really better when we just had sticks and stones ! Technology kills art, bla bla bla
@@RavioliFr i think you look at it too shallow. Learning instruments and playing music is much more important process than you think, it affects neuroplastocity. You have generations of human brain evolving with music just so someone can give it to AI not even knowing how it works to make music. We should support this process and not kill it with AI. Quote "Playing a musical instrument is the brain equivalent of a full-body workout. Unlike other brain-training activities like chess and sudoku, playing an instrument recruits almost every part of the brain, including regions that process vision, sound, movement, and memory" You can read more online, there are many more benefits to music. I think human race would benefit more from it than some computer chip. Also dont ignore the fact that a lot of musician already struggle with money. Last thing i would want is some lazy guy who never invested time to learn anything to benefit more than actual musicians.
Just to be clear I'm not scared that the AI will gain sentience, I know it's not capable of that right know, what I am fearful for is the the way that we communicate with each other, and the way we know what's right and what's wrong ,what's true and what's false could be altered.
Ive tried and paid for Adobe ai audio products. Not only are they far worse than free alternatives, Adobe's upload and processing speeds are atrocious. It aint cheap either.
The funny thing is. You definitely could replace yourself with audio generation (if your not already), and even if got foind out because it generates something funky, you can be like "woah, you found it out! Good work, I got away with skiving off on a beach for 6 months before you guys got suspicious! Isn't that incredible!?". No backlash from AI usage.
I think people are short sighting it. You can more than likely get your family generational wealth via Art, Voice, and likeness even tho you have been dead for years.
I love thinking of the fun uses for this.... but that gets dashed thinking of how much more AI slop is going to be taking over TH-cam and other platforms 😭
@@clerothsun3933 "Better" is very subjective. I'd love to hear some of the songs that are your favorite, because sadly a lot of the AI ones feel really repetitive and then either super simple or way too complicated. Definitely willing to learn more! I also like the backstory to artists music journey, the inspiration behind the piece, and the passion that went into making it! AI has none of that (and never can) and will be taking so many opportunities away from budding artist that already have a hard time growing.
The next needed step is for audio AI systems to be able to handle spatial localization of sound. To be able to generate stereo or multichannel audio that is spatially coherent. So far all of them generate only mono audio.
Many years ago I recall seeing a video taken at Skywalker Sound (or ILM) and someone there was demonstrating the sound of a flute mixed with a voice. It was magical. And this was easily more than 20 years ago.
Yeah and now you can do 100 different versions of this in seconds, and if you change your mind doing it with 100 different instruments in the same time
It shouldn’t be too surprising that it might outperform specialist models on their specialty. In general I think the “scale is all you need” crowd has some blind spots, but I’d agree with them here. When it comes to instrumentation, music, isolation and denoising there are many ways to generate annotated synthetic audio data. I’m sure it cost an enormous amount to train this, but in terms of having a model grounded in music theory we’ve only just begun to scratch the surface.
imagine watching a movie, but each time, it is a little bit different, with how the AI picks the camera Angles, and how the actors contribute to the scene, and the specific performance of stunts being different.. but the AI follows a script it is given.
Showing an example being played and then not letting us hear it and instead having you just continue talking is a massive annoyance. Either let us hear it or don’t show the footage at all.
@@armanrozika specifically the clip he doesn't show is showcasing the extraction of lyrics from a song. We can intuitively guess that the song utilised in the demo is copyrighted on TH-cam
Hello Bot, give me the sound of a joyous nerd like me, who knows more than I do about things, is amazing at teaching, has a deep accent who is super excited about the time to be alive!
An Olympic swimmer winning in wrestling may be less unlikely than you think if they've had wrestling as a hobby. Swimmers are insanely strong. But to riff on that, there have been TV entertainment programs that pit athletes of different disciplines against each other in a series of challenges, and occasionally their sport comes up, and everyone would expect them to win, but fun stuff can ensue like a top cross country skier beating a top cyclist at bicycling. Both are endurance sports, and skiers may bicycle for cardio during the summer.
Yes, you are indeed right! Thank you and apologies - fixed it in the description. Posting the link here too: th-cam.com/video/qj1Sp8He6e4/w-d-xo.htmlsi=ZtSesU1e7jeoN55U&t=63
2:37 Musicians play instruments in a musical context. You all can become producers or songwriters, but until the thinker can execute what the AI generated, they aren’t musicians. Music producers, or songwriters; sure. I accept that.
Just writing down my thoughts: In the future, people will mostly listen to AI generated music, there will be all-in-one music apps that function similar to YT music, but it generates the songs instead. The apps would have a pause/play, skip, upvote and downvote, and save and randomize, plus a text field. The app will play songs with randomized prompts, people will constantly upvote and downvote or save songs, and this is info which can be used to narrow down the users preference for future song generation. The app will create a profile of your tastes, and eventually it can be trained to know which songs you like to automatically save them to a playlist. So in the future, the listerners are the composers and it's their tastes which determine the composition of new songs. I reckon this might lead to a new music revolution.
the issue with this is that none of the music will be very good. Sure it might sound nice to listen to but all ai generated content is incredibly soulless and uninspired.
@@kingtasazRight now, yes. You talk as if this technology is stagnating. I am 90% sure in 10 years, AI music will be just as good if not better than human music.
@theonewhoslost What a nothingburger response. It can and will get better soon, if you follow AI you know how quickly its still advancing, even in the music department. Saying "nuh uh" in response just makes you look dumb.
My favorite day dream about AI involves the future of mmorpg. We are not too far away from a very small group of people being able to create an incredible high quality game in a very short period of time. The game could be modified by customers changing the genre from one style to another fairly easily with the same tools.
I wish I was younger to see another 25-30 years of development and progress... However, it's still a good time to be alive. 😉 In the next couple of years, I'm hoping for AI vector files THAT uses AI for optimal node placement and node reduction, proper curve smoothing and control, and accurate angles, and zero (stray node) artifacts. It should be doable right now, no!?
[SUPERINTENDENT CHALMERS] GOOD LORD! What is happening in there? [PRINCIPAL SKINNER] Aurora Borealis. [SUPERINTENDENT CHALMERS] A---Aurora Borealis? At this time of year! At this time of day! In this part of the country! Localized entirely within your kitchen?!? [PRINCIPAL SKINNER] Yes. [SUPERINTENDENT CHALMERS] May I see it? [PRINCIPAL SKINNER] No.
Based on analogous papers and projects: “Were RNNs all we needed?” “Training Diffusion Models on a Micro Budget” “Cramming: Training LLMs in a single day” (might have gotten the title a bit wrong) llm.c Would all suggest that the model described here could probably be trained on a roughly $1,000-$10,000 budget, given sufficiently optimized training loops. A bit of a problem is that all of the ones I described above had the benefit of existing implementations to work off of, but a lot of the lessons do carry over. A naive implementation (such as was probably done by the researchers) was probably quite a bit more expensive, and could have been anywhere from $15,000 - $100,000 if I had to guess. I also think a “free” training might be possible with recent advances in distributed training, if it were to be done as a volunteer effort (ie: DiLoCo, DisTrO, etc). I guess with around 100-200 people willing to chip in compute it could be done in a reasonable time frame.
I'm still waiting for a proper audio translator. It looks promising if it would be possible to remove the original voices without drowning out other sounds. And overlay the voiced text from Whisper.
Does it have to have a 'point'? Research can just be done for the sake of research. This doesn't make sense to me: usually people complain about how everything has to be monetized, but when it isn't, people complain too?
@@LJay205 Yes but it’s not about the research having a point, it’s about the public release of that research being pointless if nothing else becomes public after it. Monetisation has nothing to do with their comment. All this does is say “Here’s a look at the shiny new toys you’ll never get to play with.”
@@jduk1818 So would you rather have them keep their research a secret? Of course I'd love to play with this too. But to me it seems silly to declare their work entirely pointless simply because it isn't publicly accessible.
@@LJay205 Sigh. I didn't say their work was entirely pointless, I said releasing a paper on it is pointless unless the actual technology is going to be public at some point. That could be for companies to use or anyone. Until then anything they say is pointless. It's grandstanding, bragging and it's just claims on paper with no way of testing the validity of those claims. I could write a paper about having created a new energy source, one that is cheap and will solve the world's energy problems and then let everybody know. All that is meaningless and unproven until it can be tested independently. So yes, they may as well keep it secret because only telling people is useless. Let me ask you this, what good is research if nobody can use that research?
@@jduk1818 I was referring to the original comment, whose wording seems to suggest that they declare the research itself pointless unless the model would be made public, which is the stance i argued against. Other than that, I understand your point, but I do think that they are going to release this in the future. I do agree that publishing this paper and then never following up on it would be somewhat strange and induce skepticism, but it still doesn't make the paper meaningless. I think that announcing progress like this still has some value inherently, even if it poses no scientific significance since the model isn't public. To me it seems silly to proclaim that they shouldn't have announced this and kept quiet simply because no one else can derive value from their work at this stage, which isn't even something you can say with certainty.
As technology advances like this, people tend to worry, but sound designers will probably welcome it. It can drastically improve their workflow. For example, they could use AI to pre-generate sounds, present them to clients, and if approved, synthesize them with actual sounds and voices. Of course, designers who are less skilled than AI will likely be left behind. But this is simply history repeating itself.
You know what I think this sort of generative AI is extremely valuable for? Tabletop RPGs. One of the most time and energy (and sometimes money) consuming activities for a game master is sprucing things up with artwork and sound effects. With generative image models I'm already seeing people taking advantage of them. Sound would be a phenomenal addition.
A video about a video showcasing 3 minutes of supposedly generating audio, without any actual showcase. The Nvidia video seems completely fake. At least we have a paper explaining the model.
I truly am glad to live through this era when a lot of sci-fi came out to be true even in our daily, ordinary lives! Or I could say.. WHAT A TIME TO BE ALIVE!
The example of an Olympic swimmer winning a medal in wrestling would map to a specialist AI beating another specialist AI on a completely different task, not a generalist AI beating a specialist. Or have I misunderstood the example 🤔?
He's been putting the voice intro near the middle of his videos for a little while now. I think he's experimenting with trying to split the video into discrete parts like an attention getter, then intro, then details. The new voice intro location has been jarring and confusing in most of his videos since then, especially when he's talking about a voice synth tool. I always expect him to say "And that's how good this tool is!" immediately after lol. IMO, It would make a little more sense if the attention getter portion of the video was way way shorter and contained even fewer details. As it is right now, it's pretty jarring.
More like an experiment with ai generated voice 😅? It makes me assume this was just simply dragging an audio to a wrong place but it happened consistently
Musicians will have to evolve though. Musicians a hundred years ago didn't have the kind of instruments and sound libraries like you have today, right?
@Nvidia-Lover Oh, I agree. I use AI when I need to, but I do see that most of my smaller gigs(smaller ads and the like) are now being done by the clients themselves, using AI. Most of the times, with mediocre and generic music, but still... But I don't know how to live without making music, so I'm not giving up any time soon. 😉
@@Nvidia-Lover Same with artists (the painting and drawing kind). They evolved when Photoshop style software and graphics tablets came along. It wasn’t so long ago, relatively speaking, they were making their own canvases and mixing their paints.
@@jduk1818 yeah but someone had to actually use a digital software to draw, with ai its not the same, you put a prompt and you have a picture, when we evolved from 2d paper drawing animation to digital drawing animations using adobe flash or photoshop, someone had to put the exact same effort to draw and animate just digitally, its not the same
@@bluerangergr7466 That's techniques and methods use and you are correct, they are not the same, which is why I didn't claim they were. My comment wasn't about that aspect, it was about artists of all types (and people in general) adapting and evolving along with the tech. Those are also two very different things.
Yeah, during the train to orchestra example I was thinking I would much rather use two real recordings and blend between them in an audio editing software. I think generative AI like this is good to get a quick and dirty example of what you're looking for, to use as a placeholder while you create the real thing later.
Yeah, the video title calls it "Stunning", I call it "meh". Yes, two papers down the line it might be useful. In its current implementation I call it useless. Just like most current generative AI, it's amazing that it can do what it can do, but what it can do isn't actually to a quality level that is useful currently. Also, I feel like there are a dozen AI tools already released that can do all of this (AI voice tools, AI SFX tools, AI music tools, all producing similar meh results currently). In order for any of these tools to actually be useful, the quality still needs to improve by a huge amount and we need tools that allow for iterative modification to results (one-shot is almost never going to give me the final result I want, I need to be able to fine tune the results to get exactly what I want).
This is great and everything, but I'm concerned for voice-over artists, sound effects artists, music producers, singers, etc. We could handle a gradual displacement of jobs, but wiping out a wide range of jobs with a single piece of software could be a problematic trend. When it comes to amazing AI breakthroughs, I'm most interested in hearing about how people are going to continue to be able to make a living and be safe. Only then, will I be able to fully embrace these revolutionary technologies. If there's no plan in place to avoid mass job loss, maybe we should slow down this development with regulations until there is a plan.
It's so absurd that you have to worry about a copystrike when playing a 5 second audio clip, demonstrating the extraction of a voice from a song. Fair use is dead on youtube. 😢
What a time to be alive!
Truly one of the most amazing times in history.
someone has to post this comment every video
My knuckles are white from holding onto my papers
IM SO SICK OF THESE NPC COMMENTS. UNDER EVERY VIDEO I SEE MENTIONING AI THERES ALWAYS “WHaT A tImE To bE AlIvE” ITS SO STUPID AND MEANS NOTHING.
@@DoubleRainbowXT Shitposting is not a crime
Create a sound of 100s of fellow scholars holding on to their papers.
I can't even begin to fathom how beautiful such a sound would be.
the sound of a stack of papers, followed by a 100 scholars gasp, followed by deafening silence. LongHall reverb
Hahaha awesome
Follow by another 100s of fellow scholars squishing on their papers
The true Turing test. When your videos will be 100% AI generated and your audience don't notice.
Most people don't notice. They're usually very distracted by other things
The demonstration of emotional nuance in synthesized speech is a game changer. Imagine the potential for storytelling and immersive experiences, AI is truly blurring boundaries here.
Or thousands of incels and gooners create an artificial girlfriend/lover and go deeper into delusional mindscape.
I think humans are better storytellers. That would be the real game changer when AI cleans your house and cooks for you while you can do all the fun like creating stories
@@hagis23 nah, I've been using small models like Mistral to have them run text-based adventures. Even with all their current limitations, they are at the same level if not better than most experienced humans at storytelling.
@@OnigoroshiZero most people want to hear real people telling stories, not from robots its uncanny
@@OnigoroshiZero honestly I don't trust your ability to determine if that's the case - most people don't understand what makes a great story vs what makes a passable one
Let's just hope that Nvidia has the balls to actually release this
open source it*/ release the weights
@@jpgallegoarexactly this, open source alllllll the way
I'd guess that both Nvidia and Meta are going sit on these things until they see how the copyright lawsuits against Suno, etc pan out. Patents typically only stifle innovation for the benefit of the few "inventors" (who rarely make significant novel changes that would not have eventually been devised by others). In fact, the "first to file" is proof that is the case, as a number of inventors may be working on the same invention with the same ideas, but only the first to file the patent is credited and* "owns" the idea despite the efforts of others who may have been days away from, or had already arrived at, the same solution. As much as I understand protecting artists somewhat, copyright isn't significantly better, especially with how much their artistry tends to appropriated as industry profits by corporations.
This isnt up to invidia at all
@@eugeneputin1858 how so? it's their model
1:35 This is the most impressive part to me. An angry voice making a declarative statement like that with those emphasis pauses is super realistic and I'm impressed the AI replicated not only the sound, but the pacing as well. What a time to be alive!
3:54 I think the correct approach here is for Dr. Zsolnai-Fehér to release a hit single song of his own so he can use his song whenever he wants :D
to be honest it starts to get also scary when it comes to jobs, dating etc.
I hate that AI research is getting wasted on replacing jobs that never needed replacing in the first place. So much for that future where mundane work is automated and people can focus on art and other things. Guess we're getting the complete opposite reality, just fuckin depressing
@@CrispyMuffin2lmao exactly
nothing scary about it. I can't wait for an AI waifu that isn't an annoying as f narcissist that wastes money and has tantrums constantly
@@CrispyMuffin2it’s simply a matter of complexity unfortunately. The most readily consumable information here is speech, text, art, and the likes. Information that doesn’t require rigorous logic or complex relationships, nor a connection to the physical world like robotic, autonomous vehicles, etc. If somehow it were easier to develop machine learning models for complex scientific research than art, speech, and text, it would’ve happened that way
@@crazyfrogmixSo instead of finding an actual thinking, feeling human being who can be a equal partner to you, someone you admire and share interests with, you’d rather download a computer program and instruct it to pretend to love you?
You want a make believe girlfriend who looks and talks like a doe eyed anime character and whose entire existence revolves exclusively around pleasing you?
Instead of dating an independent person with her own life, goals and inner world, you want software that superficially simulates the facade of your dream girl? a pet wife who wears what you tell her to wear, likes what you tell her to like, thinks what you tell her to think and never speaks out of turn? who has no material needs, no other friends or family, no dreams you can’t delete if you feel they compete with her devotion to you.
Rather than your partner making a choice to date you over everyone else because she values you as an individual, you want the shallow, on-demand validation of a glorified tamagotchi whose existence you can entirely remake or erase the moment you become bored with her?
How utterly romantic! It’s just like the classic love story ‘I Have No Mouth and I Must Scream’.
That train one was absolutely beautiful omg
I can't wait for 1,000's of AI-Generated Song Slop in my feed!!!
It has already begun. (I've seen at least one YT channel posting AI music without saying it is, and somehow noone notices.)
I do like that this system has the ability to add to and edit already created music though.
I don't like the fact that people enjoy music made from people just telling AI to do everything; I enjoy the human aspect to music and art. Without that I feel like it's pointless in a way.
Using AI as a tool to help develop ideas is good but I don't think it should be the idea generator in creative fields. Set AI to solve problems. Art isn't a problem to be solved.
@@smugsenko Oh yeah no doubt. it has begun awhile ago actually, just with the better tech it might get a bit harder to notice if it's legit, since some people will genuinely like it and it will get pushed into recommended feeds
You know this video is AI right?
How does the A.I. generating music, pictures, videos, etc make anyone an artist?
It doesn't. Just because someone gave the AI a quirky commission doesn't mean they're the one who created the output "art". Prompt "artists" tweak & twiddle their digital knobs until something comes out they agree with.
That said, the use of AI in audio can really be a game-changer. Imagine being a blind person and having your surroundings narrated to you in real time, or having it read text chats or in-game written dialogue with "emotion."
It doesn't but most people today are lazy lowlifes who love to pat themselves on the back. Besides, it's not like AI started it. Having ghost writers, auto tune etc is all cheating and has been the standard for a while now.
I don't know what you guys are excited about, I'm terrified!
Yea, im suprised how many people are excited about AI today not thinking about cosequences, just the amount of people who will loose their jobs. Also much how it will destroy human crativity... Why be talented and learn anything when some guy who doesnt know thing about music can generate it with AI
@@punkskaska Yeah, music means nothing today, from the moment we "crafted" music instruments. Creativity was really better when we just had sticks and stones ! Technology kills art, bla bla bla
@@RavioliFr i think you look at it too shallow. Learning instruments and playing music is much more important process than you think, it affects neuroplastocity. You have generations of human brain evolving with music just so someone can give it to AI not even knowing how it works to make music. We should support this process and not kill it with AI.
Quote "Playing a musical instrument is the brain equivalent of a full-body workout. Unlike other brain-training activities like chess and sudoku, playing an instrument recruits almost every part of the brain, including regions that process vision, sound, movement, and memory"
You can read more online, there are many more benefits to music. I think human race would benefit more from it than some computer chip.
Also dont ignore the fact that a lot of musician already struggle with money. Last thing i would want is some lazy guy who never invested time to learn anything to benefit more than actual musicians.
That's fair. I'm personally intrigued by the technology, and a bit impressed, but also wary of it
Just to be clear I'm not scared that the AI will gain sentience, I know it's not capable of that right know, what I am fearful for is the the way that we communicate with each other, and the way we know what's right and what's wrong ,what's true and what's false could be altered.
Basically beating the Adobe audio to audio paper
Ive tried and paid for Adobe ai audio products. Not only are they far worse than free alternatives, Adobe's upload and processing speeds are atrocious. It aint cheap either.
The funny thing is. You definitely could replace yourself with audio generation (if your not already), and even if got foind out because it generates something funky, you can be like "woah, you found it out! Good work, I got away with skiving off on a beach for 6 months before you guys got suspicious! Isn't that incredible!?". No backlash from AI usage.
I think people are short sighting it. You can more than likely get your family generational wealth via Art, Voice, and likeness even tho you have been dead for years.
What is this psychopathic mindset lmao
@@xviii5780 So true
I love thinking of the fun uses for this.... but that gets dashed thinking of how much more AI slop is going to be taking over TH-cam and other platforms 😭
Y'all realise AIs that do better music than this have been around for over a year
@@clerothsun3933 "Better" is very subjective. I'd love to hear some of the songs that are your favorite, because sadly a lot of the AI ones feel really repetitive and then either super simple or way too complicated. Definitely willing to learn more! I also like the backstory to artists music journey, the inspiration behind the piece, and the passion that went into making it! AI has none of that (and never can) and will be taking so many opportunities away from budding artist that already have a hard time growing.
@@SaigeSauceSure be a fan of the hip hop genre a lot of AI have much better music and even revive some genres for me like post hardcore.
ai will never be better than Death Grips @@clerothsun3933
The next needed step is for audio AI systems to be able to handle spatial localization of sound. To be able to generate stereo or multichannel audio that is spatially coherent. So far all of them generate only mono audio.
Many years ago I recall seeing a video taken at Skywalker Sound (or ILM) and someone there was demonstrating the sound of a flute mixed with a voice. It was magical. And this was easily more than 20 years ago.
Yeah and now you can do 100 different versions of this in seconds, and if you change your mind doing it with 100 different instruments in the same time
It shouldn’t be too surprising that it might outperform specialist models on their specialty. In general I think the “scale is all you need” crowd has some blind spots, but I’d agree with them here. When it comes to instrumentation, music, isolation and denoising there are many ways to generate annotated synthetic audio data. I’m sure it cost an enormous amount to train this, but in terms of having a model grounded in music theory we’ve only just begun to scratch the surface.
As a video editor and sound designer.. What a time to be alive
imagine watching a movie, but each time, it is a little bit different, with how the AI picks the camera Angles, and how the actors contribute to the scene, and the specific performance of stunts being different.. but the AI follows a script it is given.
When this is released to the public, I am going to try..speak in Dr. Károly Zsolnai-Fehér's style "What a time to be alive"
Showing an example being played and then not letting us hear it and instead having you just continue talking is a massive annoyance. Either let us hear it or don’t show the footage at all.
Archetype of negativity, aha
He mentions in the video that he couldn't use much of the audio because it would be flagged by youtube.
@@willfrank961 why? isnt it AI generated? if youtube can detect where the audio coming from/generated from, then it's bad AI model
@@armanrozika specifically the clip he doesn't show is showcasing the extraction of lyrics from a song.
We can intuitively guess that the song utilised in the demo is copyrighted on TH-cam
01:06 its tenet soundtrack at the opera scene
😮😮😮
Generalist being specialist! What a time to be A I
I would love to use it right away. It would be even better if it was free to try.
Does any of this ever gets released?
Hello Bot, give me the sound of a joyous nerd like me, who knows more than I do about things, is amazing at teaching, has a deep accent who is super excited about the time to be alive!
The introduction in the middle of the the video always feels so out of place, I keep thinking the video reset or auto played the next one by mistake.
@@Gcrowan I think he does this so it's harder for other people to re upload his content
It was weird hearing a honking train pass by without the doppler effekt
0:23 I don't know any of those models, which one correspond to udio or suno ?
An Olympic swimmer winning in wrestling may be less unlikely than you think if they've had wrestling as a hobby. Swimmers are insanely strong. But to riff on that, there have been TV entertainment programs that pit athletes of different disciplines against each other in a series of challenges, and occasionally their sport comes up, and everyone would expect them to win, but fun stuff can ensue like a top cross country skier beating a top cyclist at bicycling. Both are endurance sports, and skiers may bicycle for cardio during the summer.
Did I miss the link in the video description he is talking about at 4 minute mark? The one about the voice Isolation...
Yes, you are indeed right! Thank you and apologies - fixed it in the description. Posting the link here too: th-cam.com/video/qj1Sp8He6e4/w-d-xo.htmlsi=ZtSesU1e7jeoN55U&t=63
The voice isolation was very underwhelming. The voice sounds OK, but it missed the lyrics.
Making sound effects for games would be amazing
at this rate we're gonna get an open source version of 4o's advanced voice mode within 12 months
I don't know if I am more scared or more amazed
2:37 Musicians play instruments in a musical context. You all can become producers or songwriters, but until the thinker can execute what the AI generated, they aren’t musicians.
Music producers, or songwriters; sure. I accept that.
Just writing down my thoughts:
In the future, people will mostly listen to AI generated music, there will be all-in-one music apps that function similar to YT music, but it generates the songs instead.
The apps would have a pause/play, skip, upvote and downvote, and save and randomize, plus a text field.
The app will play songs with randomized prompts, people will constantly upvote and downvote or save songs, and this is info which can be used to narrow down the users preference for future song generation.
The app will create a profile of your tastes, and eventually it can be trained to know which songs you like to automatically save them to a playlist.
So in the future, the listerners are the composers and it's their tastes which determine the composition of new songs. I reckon this might lead to a new music revolution.
the issue with this is that none of the music will be very good. Sure it might sound nice to listen to but all ai generated content is incredibly soulless and uninspired.
@@kingtasazRight now, yes. You talk as if this technology is stagnating. I am 90% sure in 10 years, AI music will be just as good if not better than human music.
I'm terrified that art is being taken out of the hands of the creators and being replaced with expensive algorithms owned by big companies.
@@BlackoutGootraxian AI simply cannot replace game music and thats that. What is the point of spending time on something no one made
@theonewhoslost What a nothingburger response. It can and will get better soon, if you follow AI you know how quickly its still advancing, even in the music department. Saying "nuh uh" in response just makes you look dumb.
what would I use this for? uhm.... adding sound to the AI generated porn of course, what else?
"A person who thinks all the time has nothing to think about except thoughts"
You are too stupid to think about the potential of it
That will be integrated, imagine infine Doom but with porn; Just 3D glasses and starvation
@@Adrian-ep4qmcringe
Replacing actors, influencing politics, memes and so on
1:32 - cool that they're getting Ashton Kutcher to do their voicing.
My favorite day dream about AI involves the future of mmorpg. We are not too far away from a very small group of people being able to create an incredible high quality game in a very short period of time. The game could be modified by customers changing the genre from one style to another fairly easily with the same tools.
Just waiting for the time when I can name a character in a game and have all the people in that world actually say the name.
And the characters can actually reason for themselves and talk to you as themselves
I wish I was younger to see another 25-30 years of development and progress... However, it's still a good time to be alive. 😉
In the next couple of years, I'm hoping for AI vector files THAT uses AI for optimal node placement and node reduction, proper curve smoothing and control, and accurate angles, and zero (stray node) artifacts. It should be doable right now, no!?
If the calculations are correct you'll see 75 years of advancement in a few years.
depending on you age you might just be able to escape to heaven before the singularity hits...
[SUPERINTENDENT CHALMERS]
GOOD LORD! What is happening in there?
[PRINCIPAL SKINNER]
Aurora Borealis.
[SUPERINTENDENT CHALMERS]
A---Aurora Borealis? At this time of year! At this time of day! In this part of the country! Localized entirely within your kitchen?!?
[PRINCIPAL SKINNER]
Yes.
[SUPERINTENDENT CHALMERS]
May I see it?
[PRINCIPAL SKINNER]
No.
Google may have to release 2-minute paper AI podcast generator
Hot Take: NVIDIA does not open source (Apache, MIT, GPL) enough of their models.
Can we use it already for longer form speech and music?
And how expensive is it if it is such a small model?
Based on analogous papers and projects:
“Were RNNs all we needed?”
“Training Diffusion Models on a Micro Budget”
“Cramming: Training LLMs in a single day” (might have gotten the title a bit wrong)
llm.c
Would all suggest that the model described here could probably be trained on a roughly $1,000-$10,000 budget, given sufficiently optimized training loops. A bit of a problem is that all of the ones I described above had the benefit of existing implementations to work off of, but a lot of the lessons do carry over. A naive implementation (such as was probably done by the researchers) was probably quite a bit more expensive, and could have been anywhere from $15,000 - $100,000 if I had to guess.
I also think a “free” training might be possible with recent advances in distributed training, if it were to be done as a volunteer effort (ie: DiLoCo, DisTrO, etc). I guess with around 100-200 people willing to chip in compute it could be done in a reasonable time frame.
I'm still waiting for a proper audio translator. It looks promising if it would be possible to remove the original voices without drowning out other sounds. And overlay the voiced text from Whisper.
Looks like we're headed for a paperless society pretty quickly.
we making brainrot music with this one 🗣🗣🗣🗣🗣🗣 🔥🔥🔥🔥🔥🔥🔥🔥
This seems cool but without it being publicly accessible, I don't really see the point.
Does it have to have a 'point'? Research can just be done for the sake of research. This doesn't make sense to me: usually people complain about how everything has to be monetized, but when it isn't, people complain too?
@@LJay205 Yes but it’s not about the research having a point, it’s about the public release of that research being pointless if nothing else becomes public after it. Monetisation has nothing to do with their comment. All this does is say “Here’s a look at the shiny new toys you’ll never get to play with.”
@@jduk1818 So would you rather have them keep their research a secret? Of course I'd love to play with this too. But to me it seems silly to declare their work entirely pointless simply because it isn't publicly accessible.
@@LJay205 Sigh. I didn't say their work was entirely pointless, I said releasing a paper on it is pointless unless the actual technology is going to be public at some point. That could be for companies to use or anyone. Until then anything they say is pointless. It's grandstanding, bragging and it's just claims on paper with no way of testing the validity of those claims.
I could write a paper about having created a new energy source, one that is cheap and will solve the world's energy problems and then let everybody know. All that is meaningless and unproven until it can be tested independently.
So yes, they may as well keep it secret because only telling people is useless. Let me ask you this, what good is research if nobody can use that research?
@@jduk1818 I was referring to the original comment, whose wording seems to suggest that they declare the research itself pointless unless the model would be made public, which is the stance i argued against. Other than that, I understand your point, but I do think that they are going to release this in the future. I do agree that publishing this paper and then never following up on it would be somewhat strange and induce skepticism, but it still doesn't make the paper meaningless. I think that announcing progress like this still has some value inherently, even if it poses no scientific significance since the model isn't public. To me it seems silly to proclaim that they shouldn't have announced this and kept quiet simply because no one else can derive value from their work at this stage, which isn't even something you can say with certainty.
Is there a link to a site where we can test Fugatto?
my input: scream geometrically
As technology advances like this, people tend to worry, but sound designers will probably welcome it. It can drastically improve their workflow. For example, they could use AI to pre-generate sounds, present them to clients, and if approved, synthesize them with actual sounds and voices.
Of course, designers who are less skilled than AI will likely be left behind. But this is simply history repeating itself.
You know what I think this sort of generative AI is extremely valuable for? Tabletop RPGs. One of the most time and energy (and sometimes money) consuming activities for a game master is sprucing things up with artwork and sound effects. With generative image models I'm already seeing people taking advantage of them. Sound would be a phenomenal addition.
The Bitter Lesson is that the generalist usually wins in AI.
This and soundhound A.i would be perfect match! SOUN & NVDA🔥
A video about a video showcasing 3 minutes of supposedly generating audio, without any actual showcase. The Nvidia video seems completely fake. At least we have a paper explaining the model.
best implementation of voice AI is still that one classic world of warcraft addon. VoiceOver is the name i think.
Adlibs and vocals without needing to do them?? nice
I truly am glad to live through this era when a lot of sci-fi came out to be true even in our daily, ordinary lives! Or I could say..
WHAT A TIME TO BE ALIVE!
As a singer songwriter - I am curious about where human artists go from here. Trying not to despair - trying to look ahead...
The example of an Olympic swimmer winning a medal in wrestling would map to a specialist AI beating another specialist AI on a completely different task, not a generalist AI beating a specialist. Or have I misunderstood the example 🤔?
2:10 Was that an error?
he really tries to make us believe his voice isn't AI lmao
He's been putting the voice intro near the middle of his videos for a little while now. I think he's experimenting with trying to split the video into discrete parts like an attention getter, then intro, then details. The new voice intro location has been jarring and confusing in most of his videos since then, especially when he's talking about a voice synth tool. I always expect him to say "And that's how good this tool is!" immediately after lol.
IMO, It would make a little more sense if the attention getter portion of the video was way way shorter and contained even fewer details. As it is right now, it's pretty jarring.
More like an experiment with ai generated voice 😅?
It makes me assume this was just simply dragging an audio to a wrong place but it happened consistently
@@dzxtricks That's what I thought, clicked and dragged the audio on the timeline by mistake. It doesn't fit in at all there.
When will this be released?
1:29 i used to think your name was Caro Jhon Aaifa as in "Dr. Caro jhon aaifa here"
How and when can I use this?
That name reminds me too much of The Sopranos though 😂 “FUGAZI”
Me faltó la imagen de dotcsv arriba de un pangolin 😉... Muy bueno que se pongan en herramientas prácticas todo lo que venimos viendo
Great video, thank you!
There are a lot of people whose job it is to create sound effects in movies and TV shows that will lose their job because of this
and there are some complete idiots under this video saying "what a time to be alive"
Rumor is that Michael Saylor invested in Alemio Network and will push this project to the moon!
This is absolutely incredible! 😎🤖
It's great, but ... where is it?
Yeah, this AI thing is going to end in tears. Haven't any of you ever seen Terminator for gods sake?
Lol, as the generative AI is being trained on human products, what you are saying may influence the IA to be more terminator-ish in the future ! Aha !
As a musician I can say "what a time to find another job!" 🤣 Seriously though, this is very impressive, how complete it is and lightweight.
Musicians will have to evolve though. Musicians a hundred years ago didn't have the kind of instruments and sound libraries like you have today, right?
@Nvidia-Lover Oh, I agree. I use AI when I need to, but I do see that most of my smaller gigs(smaller ads and the like) are now being done by the clients themselves, using AI. Most of the times, with mediocre and generic music, but still... But I don't know how to live without making music, so I'm not giving up any time soon. 😉
@@Nvidia-Lover Same with artists (the painting and drawing kind). They evolved when Photoshop style software and graphics tablets came along. It wasn’t so long ago, relatively speaking, they were making their own canvases and mixing their paints.
@@jduk1818 yeah but someone had to actually use a digital software to draw, with ai its not the same, you put a prompt and you have a picture, when we evolved from 2d paper drawing animation to digital drawing animations using adobe flash or photoshop, someone had to put the exact same effort to draw and animate just digitally, its not the same
@@bluerangergr7466 That's techniques and methods use and you are correct, they are not the same, which is why I didn't claim they were. My comment wasn't about that aspect, it was about artists of all types (and people in general) adapting and evolving along with the tech. Those are also two very different things.
you sound more ai generated than fugatto haha
Audio engineers are about to have a pay cut
Does AI generated audio sound slightly noisy or echoey to anyone else? It just doesn’t sound “clean” to my ears
I want sound made during seggs 😂
Yeah, during the train to orchestra example I was thinking I would much rather use two real recordings and blend between them in an audio editing software.
I think generative AI like this is good to get a quick and dirty example of what you're looking for, to use as a placeholder while you create the real thing later.
It'll improve rapidly.
Just two more papers down the line!
Yeah, the video title calls it "Stunning", I call it "meh". Yes, two papers down the line it might be useful. In its current implementation I call it useless. Just like most current generative AI, it's amazing that it can do what it can do, but what it can do isn't actually to a quality level that is useful currently. Also, I feel like there are a dozen AI tools already released that can do all of this (AI voice tools, AI SFX tools, AI music tools, all producing similar meh results currently).
In order for any of these tools to actually be useful, the quality still needs to improve by a huge amount and we need tools that allow for iterative modification to results (one-shot is almost never going to give me the final result I want, I need to be able to fine tune the results to get exactly what I want).
is there a way to be able to play with Fugatto?
Not yet.
Is this being narrated by Ren? Tell Stimpy I said hi!
Waiting for the day when Felecia reveals that it's an AI
This is great and everything, but I'm concerned for voice-over artists, sound effects artists, music producers, singers, etc. We could handle a gradual displacement of jobs, but wiping out a wide range of jobs with a single piece of software could be a problematic trend. When it comes to amazing AI breakthroughs, I'm most interested in hearing about how people are going to continue to be able to make a living and be safe. Only then, will I be able to fully embrace these revolutionary technologies. If there's no plan in place to avoid mass job loss, maybe we should slow down this development with regulations until there is a plan.
Where/when can we use Fugatto?
new meme. what a time to be alive!
Could it isolate music from voice in case of a mono track song .
Meaning isolate the singer voice from an 1930's mono song record.
Where is that "AI Enhance" Gradio code? I want it
I wanna join Nvidia fr and make such cool contributions to AI
how to get this up and running?
How can we use Fugatto?
Is there a way to get early access?
Ai gen narrator talking about Ai gen content
What a time to have ears!
Where I can get it?
It's a wrap for pretty much all creatives at this point, it just depends on what your standards are.
Professori, you must prepare yourself for becoming redundant. I guess you only have 2 more papers before that happens.
It's so absurd that you have to worry about a copystrike when playing a 5 second audio clip, demonstrating the extraction of a voice from a song. Fair use is dead on youtube. 😢
Can I get you to write a paper on Cascade please? Love, Greg
What a time indeed
I want an ai midi file to wav converter that makes the midi file sound realistic.
Bro, actually, maybe artlist will not exist in 3 years
Okay, how to run it locally?