10 - Understanding audio data for deep learning

Valerio Velardo - The Sound of AI

มุมมอง 61 054

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 ธ.ค. 2024

ความคิดเห็น • 180

@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺⁴⁷
I now have a full series called "Audio Signal Processing for Machine Learning", which develops the concept introduced here in greater detail. You can check it out at th-cam.com/video/iCwMQJnKk2c/w-d-xo.html
@josealbertoarangosanchez4694 3 ปีที่แล้ว
Thanks :)
@leumas8688 ปีที่แล้ว ⁺¹
Brazilian CS student here, thank you for your dedication, this exactly what I needed for my personal project.
@ValerioVelardoTheSoundofAI ปีที่แล้ว ⁺¹
Obrigado!
@ramangarg881 4 ปีที่แล้ว ⁺²¹
It is a great series. And I would love to learn about the digital processing stuff you were talking about in the video . Please do a series on it too. Thanks again.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺³
Thank you - stay tuned for more :)
@hannabohlin9410 3 ปีที่แล้ว
I see you have made the course, looking forward to watching that after i finish this one!
@gautamj7450 4 ปีที่แล้ว ⁺¹⁸
I've been following along, seeing your videos from the day I saw your Reddit post. I gotta say, you are doing a great work explaining the theory behind Deep Learning. Keep doing the work! Cheers :)
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you for the kind words :)
@JulioCesar-dd2ge 3 ปีที่แล้ว ⁺²
I'm new at machine learning for audio and I've been following along your videos taking some notes and I feel that I'm learning a loot.
Thanks Mr. Valerio!
@anishbhanushali 3 ปีที่แล้ว ⁺⁶
Dude, you've made my life so much easier. I'm going for DL in speech processing and frankly, the task of analog waves to DL features conversion has been a mystery untill now!! If at all you're launching a descriptive audio/signal processing series, I would love to watch it.
@rajatkeshri9392 4 ปีที่แล้ว ⁺³
I feel his tutorials should get more recognition. Thanks for the series
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you Rajat!
@mariantalbert8201 4 ปีที่แล้ว ⁺¹
I really appreciate how clearly you explain these concepts.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you for the feedback :)
@pankajkumarchoudhary3845 3 ปีที่แล้ว ⁺⁵
Really, Man, you are doing a great job. This is the best series for Audio Deep Learning. This is far away from any other courses. Hats off to you. I never preferred to comment, but this content forced me to comment. Thanks, buddy for your efforts and for sharing knowledge with us.
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว ⁺¹
Thank you!
@ХАЗЯЕВАССЫРОМ 3 ปีที่แล้ว ⁺¹
So cool, I am from Belarus and start work on my startup, this videos are so useful for my work.
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
Fantastic Giuliano!
@croftyprojects 3 ปีที่แล้ว ⁺¹
Taught me more than my uni lecturer, by far. You're the boss my dude
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
Thanks!
@ahmedanwer6899 2 ปีที่แล้ว
ugh u are a legendary for putting this info out for free this is always sommething i wanted to learn and i didn't know where to start and now i know i can just consume your content to learn more about this exciting field!!
god bless you
@lzdddd 2 ปีที่แล้ว
I was curious about what the input data format for deep learning. Now I understand. Very clear! thank you.
@malinthasandamal 4 ปีที่แล้ว ⁺²
Thanks for the great series. I am working on TTS and STT for my local language and this channel might be very helpful. Thanks and waiting for the next one. Kudos from Munich and Sri Lanka
@sathyanarayananvittal7832 11 หลายเดือนก่อน
Classic ! Enjoyed how you explain the use cases of MFCC with DL networks. Thanks
@Rishi-nv7bp 4 ปีที่แล้ว ⁺¹
YES would love a more in depth video about this topic
@alvinkariuki236 3 ปีที่แล้ว
Yes, would be very interested to see videos on audio processing and MFCCs
@PhilipTheDuke 3 ปีที่แล้ว ⁺¹
I would watch more audio processing videos for sure!
@chaithanyakumara5828 4 ปีที่แล้ว ⁺⁸
Dude this was really a great Lecture, can please do a video on the mathematical aspects of Fourier Transform and Mfccs
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺⁴
Thank you! I'm planning to create a whole series on audio DSP for music over the next months, where I'll delve into the mathematical details. Stay tuned :)
@Απο-ρ6σ 3 ปีที่แล้ว
You helped me a lot with my undergraduate thesis. Many thanks!
@Sawaedo 3 ปีที่แล้ว
Thank you, this content is far better than what i could found in some books. I hope you keep doing it!
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
Thank you! Of course I will... stay tuned ;)
@raj-nq8ke 2 ปีที่แล้ว
Please release the series on audio digital sinal processing. You 're the best.
@ValerioVelardoTheSoundofAI 2 ปีที่แล้ว
Thanks! I already have a series called "Audio Processing for ML". Check it out!
@_Saucypasta_ 4 ปีที่แล้ว
Awesome series! You have the clearest deep learning videos I've seen so far.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you Omar!
@souvikpaul1153 4 ปีที่แล้ว
Excellent series..In the first video you sid thhis is not for beginners but i am able to perfectly follow along.Excellent explainations.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Thank you!
@jennifer6278 4 ปีที่แล้ว
I'm studying for an Automatic Speech Recognition seminar right now and this was really helpful. Thank you!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
It's great you're finding this useful!
@Moonwalkerrabhi 3 ปีที่แล้ว ⁺¹
Glad I completed the audio signal processing playlist first, its a quick revision for me in this video
@mishachandar3965 4 ปีที่แล้ว
Can't find a better series for audio processing with DL like this! Great content as always. It would be really helpful if you can touch on concepts like audio augmentation techniques and transfer learning in the future in this series. Thanks Valerio!!!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Thank you! I'm now producing a new series "Audio Processing for ML", where I'll probably get into data augmentation for audio.
@ManontheBroadcast 4 ปีที่แล้ว ⁺¹
Thumbs up for digital audio signal processing videos ...
@maddonotcare 4 ปีที่แล้ว ⁺¹
Amazing Job i've been working for a project on Speech related AI and had very little knowledge about sound and everything related and you are a very good, cut to the point teacher, thanks!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
I'm glad I could help! Stick around for more :)
@aymentlili9109 2 ปีที่แล้ว
This has been really informative . the spaghetti i made with your recipe was off the charts ! love how you simplify the addition of the instances of the algebric space of linear functions and specifically Fourrier transforms so neatly . thanks for making things accessible to everyone
@dishkakrauch 3 ปีที่แล้ว
What an amazing explanation! Thank you. All the audio things became more clearer for me now.
@vikasnair8447 4 ปีที่แล้ว
I wish I had a teacher like you in school! Thank you so much :)
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Thank you Vikas!
@viga_hardjanto 8 หลายเดือนก่อน
Hi Valerio, thanks for the great explanation. At 25:05 you explain that the ZCR feature can be fed into the ML Algorithm. But when I use the librosa zero_crossing_rate function I get a quite long array, so how do I summarize this array? Is it by taking the average value? It's a pleasure if you answer my questions, thank you
@harleensingh2531 2 ปีที่แล้ว
What a great video! Very easy to follow, thank you!
@tentyluaysari3393 3 ปีที่แล้ว
i just start this lesson and the way you explain it really simple and helps me alot with my research paper, thank you!
@prasanthantonyraj8876 4 ปีที่แล้ว
It is really an amazing series and I am happy that I found it. I wish to thank you a lot for your time. Keep up the good work. Alongside, I am also more curious to learn about MFCCs. It would be really helpful if you make another series about Audio DSP as you mentioned earlier.
Thanks again!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Really glad you like this! I'm thinking about making an Audio DSP series in the future.
@SehamMohammed2020 4 ปีที่แล้ว
these are really helpful series, I hope you could make more series about audio data for DL with more details. I really liked your way of explaining things
@sambhavgulla6730 4 ปีที่แล้ว ⁺¹
Nicely explained concepts.
Can you also make a video on how mfcc are extracted...
I mean the use of pre emphasis filters and hamming windows
@flaminglotus11 3 ปีที่แล้ว
Thank you. My work brought me here and you helped me a lot.
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว ⁺¹
Glad I could help!
@nikotuba 4 ปีที่แล้ว ⁺²
Awesome! will you post link to github repo with python implementations?
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thanks :) I always include a link to the Python implementations (when there is one!) in the description section. Stay tuned for next video, where I'll implement some of the topics I've covered in this video :)
@rajansaharaju1427 4 ปีที่แล้ว
I'm working with TTS. Glad to see the series.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
I'm happy I can help!
@sunilshah300 4 ปีที่แล้ว ⁺¹
Thank you man for the series.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
You're welcome!
@sampadamatkar6027 4 ปีที่แล้ว
You made this topic so easy , you are amazing sir , thank you sir 🙏
@orhanors1800 4 ปีที่แล้ว
20:28 How audio transform to spectogram using stft... Spectogram: signal in the frequency domain
@akshaya3086 2 ปีที่แล้ว
Explain about DWT feature extraction ? And can you please explain what is a mFcc coefficient in particular.
@user-hi2hb2ny2p ปีที่แล้ว
Thanks, very enlightening and useful explanation
@islamic1007 4 ปีที่แล้ว ⁺¹
Thanks for such great videos .. make videos on the signal processing on sound wave
@IstiakAhammed 2 ปีที่แล้ว
It is a really interesting topic. I want to request to you could you please make videos related to the Speech Enhancement system. How can we do create a neural network model or CNN for speech enhancement? How can we remove the noise signal from human speech using deep learning or specific model like (CNN, ANN, RNN, and LSTM)? Thanks for making amazing videos for us.
@sbraun27 4 ปีที่แล้ว
Thanks for all the videos!!!! Would be very interested in learning more about this topic and potential other resources for supplement.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you for the feedback! I'm considering creating a series on audio DSP / music processing over the next months. If you're interested in the topic, you should take a look at "Fundamentals of Music Processing" www.springer.com/gp/book/9783319219448 This book is quite dense, but it'll give you a strong background in all of these topics, and way more. I'll probably use this book as the main reference for my series on music processing.
@amitbenhur3722 3 ปีที่แล้ว
Hi, amazing videos... Just a question,
When creating subtitles for a video with DL, do we create a spectrogram from the video's audio and use a network with CNN layers?
@prachichitnis 4 ปีที่แล้ว
This is a great series! I want to know how I can extract features about periodicity of audio data. The frequency, timbre and other MFCC features would tell me about the note or pitch at a point in time. But, to extract the rhythm signature, I would need to look at the repeating patterns over a time period.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
I suggest you check out my new series (still in production) "Audio Signal Processing for Machine Learning"
@movieimpact12 3 ปีที่แล้ว ⁺¹
best series ever!! thanks brother
@ekoteguhw 4 ปีที่แล้ว
A great explanation in understanding audio data for deep learning. It's really "new" for me.
I just want to ask that is all audio analysis using spectrogram data as the basis?
Thank you
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Thank you! Not all analysis uses power spectrograms. If you're using traditional ML / audio DSP techniques you would use also other features (e.g., chromograms, zero-crossing rate). Spectrograms and similar features are usually used in end-to-end DL approaches. I'm planning to create a series on audio/music processing where I dig deeper in the topics I only scratched in these couple of videos.
@chryszification 4 ปีที่แล้ว ⁺¹
So when we pass the spectrogram as input to the NN, we represent it as a 2-D input (meaning we have to get rid of either time or magnitude) or as a 3-D input? Thanks!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
The dimension depends on the type of network you're using. However, the basic idea is that you'll be able to package time, frequency, and magnitude in a 2d array, in the same way we visualise the spectrogram. The shape of the array is (# time steps, # frequency bands). The values featured in the array are the magnitudes, for each frequency band at each time step. In case of a CNN, however, you'll have to pass a 3d array, where the 3rd dimension indicating the depth. For audio data, depth is 1, just like in greyscale pictures. For RGB images, depth=3. I cover this and more in the following videos. So, stay tuned :)
@raktimbarua6601 4 ปีที่แล้ว ⁺¹
Absolutely brilliant, I just want to implement this on million song dataset
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
This sounds like a great idea -- and it'll enable you to learn a lot in the process!
@raktimbarua6601 4 ปีที่แล้ว
@@ValerioVelardoTheSoundofAI I may ask for some help from you!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
@@raktimbarua6601 I'm here to help... if I can ;)
@raktimbarua6601 4 ปีที่แล้ว
@@ValerioVelardoTheSoundofAI Would you mind to check my work and give me feedback, please? I can share my GitHub link. Many thanks
@aigen-journey 4 ปีที่แล้ว ⁺¹
This might be a stupid question, but do you use something like a sliding window on that MFCC? I thought most of the sequential data is processed with RNNs/LSTMs, but then I would guess only a value from a single time-step is processed from that MFCC
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺²
It's actually a great question. Deriving MFCCs is an elaborate process, with several steps. The first is to perform a STFT, which uses a sliding window. The sliding window is characterised by two values: the window (i.e., the frame size expressed in num. of samples), and the hop length (also expressed in num. of samples). When you perform the FFT, you consider a time interval equal to the frame size. Then, you shift to the left by an amount of samples equal to the hop length. The hop length is < frame size. This is the case, in order to produce overlapping FFTs, which preserve info about the edge of the intervals. Since the MFCCs rely on the STFT, you can state that to extract MFCCs you use a sliding window.
As for the second part of your comment, you can definitely use an RNN to process MFCCs, passing the MFCC vector for a single window at a time. However, you can also process MFCCs using basic MLP or CNN architectures, treating the MFCCs as 2D data, similar to images. We'll take a look at this in the following videos. Stay tuned!
@aigen-journey 4 ปีที่แล้ว
@@ValerioVelardoTheSoundofAI Thank you for the explanation. I've never worked with audio (I do graphics stuff mostly), so this new domain is pretty fascinating to me. I would guess that some architecture similar to video processing would also work as you have a series of 2D time-dependent inputs. Looking forward to the next video!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
@@aigen-journey your guess is right :)
@AshwaniKumar04 4 ปีที่แล้ว
Hello Valerio:
Thanks for this awesome channel. Thanks a lot.
I do have a doubt. When should we use spectrogram vs mfcc for a deep neural problem?
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Spectrogram are state-of-the-art features in DL now. MFCCs are rarely used in DL.
@AshwaniKumar04 4 ปีที่แล้ว
@@ValerioVelardoTheSoundofAI so,in that case we should use spectrogram always in DL
@javadmahdavi1151 3 ปีที่แล้ว
this is sooo cool , thanks for that valerio... hight quality content ❤❤❤❤😍😍😍
@yannickpezeu3419 3 ปีที่แล้ว
Hi, do you have a video explaining the role of the phase in the fourier representation ?
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
Yes, I have a detailed explanation of the FFT in the "Audio Signal Processing for ML" series.
@oroneki 4 ปีที่แล้ว
That content is amazing! Very very clear for me! I am just a programmer interested in extracting audio from video and then transforming it in a podcast (without the advertising intervals) to hear while I am doing the dishes :) I did the first version with just random forests and it was ok, but its time to do some deep learning now... this series is gooold! I did a small flask app with to divide the audio into small 1sec parts and serve as content to an app where I can easily classify my audio and use it like labels in the DL project...
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
I'm really glad you find this useful! Stay tuned for more interesting stuff to come ;)
@vincentouwendijk3746 3 ปีที่แล้ว
Interesting stuff Valerio!
@vincentouwendijk3746 3 ปีที่แล้ว
by the way, can i ask what you are using to automaticly give suggestions for your code? I use VS Code, but am looking for something as effective as you use :)
@Zuke22 3 ปีที่แล้ว
Why does the MFCC graph just look a like a blocky version of the spectrogram?
The intuition of looking at it is that the frequencies are just split into 13 groups?
@123siip 4 ปีที่แล้ว
so good video for me like not in to much about data like this even thought im still hard to know what does it mean so clearly
@yannickpezeu3419 3 ปีที่แล้ว
Question:
In the fourier transform, if we compute the full fourier transform (meaning the phase + the amplitude instead of just the amplitude) we can actually recompose the entire signal without any loss of time information. The original signal is just the inverse fourier transform of the fourier transform of the signal: f(n) -> F(w) -> f(n)
Why don't we just do that ? Why do we need the short time fourier transform ? is it more efficient this way ? Am I missing something ? Thanks for your great work !
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
I suggest you to check out my series on Audio Signal Processing for ML. There I spend 4+ videos on these topics ;)
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Given many of you have requested it, I've started a new in-depth series 🔥🔥 on Audio Processing for Machine Learning 🎼🤖. Check it out at: th-cam.com/video/iCwMQJnKk2c/w-d-xo.html
@Waffano 2 ปีที่แล้ว
Never really understood why we dont just use the raw waveform as input to the neural network as a 1D array or something? Where the index represents time and the values represent amplitude. Shouldnt it have all the information we need? Any help in understanding this would be much appreciated.
@caiovillela3708 3 ปีที่แล้ว
Great video, thank you for the work!!
@thrishulh9834 4 หลายเดือนก่อน
20:39. 4000 hz why is it blue it should be bright red right since it is the frequency with highest amplitude
@ManusiaSetengahChiKuadrat 4 ปีที่แล้ว
The explanation is very great and this is cool video😁👍
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thanks!
@pushkarpadmnav 3 ปีที่แล้ว
I am a bit confused about the meaning of "Magnitude" in the frequency domain graph , generated after doing fft on time domain. Can you please explain, out of the below two , which explanation is correct.
1) Magnitude corresponding to a particular frequency after fft shows the number of times that particular frequency has occurred.
2) Magnitude corresponding to a particular frequency after fft shows the Amplitude corresponding to the sine wave having that particular frequency.
Thanks for this wonderful video 🤩
@juliangermek4843 3 ปีที่แล้ว
Exactly what I was looking for, thank you!
One follow up question: Is there always exactly one possible result from a fourier transformation? Or (1) can it be impossible to decompose the sound or (2) can there be more than one possible composition?
@ayoadeadeyemi01 4 ปีที่แล้ว
Thanks for this amazing video. I gained a lot. Can you explain more about Harmonics and chronogram?
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Glad you liked this! I'll definitely cover more of that in the future :)
@harvajourdieyosua9529 3 ปีที่แล้ว
Hello sir, I'm very grateful i found your videos. I'm currently preparing for my thesis for musical genre classification, but I'm having problems in understanding the features extraction part.
So my question is, in music genre classification, we only need to use MFCC? is it enough? Thanks!
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
MFCCs do a pretty good job. Mel Spectrograms are state of the art. You don't need to mix these features with others.
@matthewk6522 ปีที่แล้ว
Did you find that uisng MFCCs works better than using a spectrogram? I just stumbled upon your video and I have been using the same dataset but I extract the spectrogram and feed that into my network. I am constantly running into overfitting, and even when I use the same CNN as you do (in your later video) I only get about 50% validation accuracy, while getting 99% training accuracy. Does using MFCCs reduce overfitting?
@meetgandhi8782 4 ปีที่แล้ว
This video was very well helpful, I would definitely like more videos on digital signal processing.Additionally, could you also make a video on feature engineering for ML algorithms.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺²
Thank you! When I'll make the audio/music DSP series, I'll definitely cover feature engineering for ML.
@sundeeparandara 4 ปีที่แล้ว ⁺¹
Great job!!!
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you!
@javadmahdavi1151 3 ปีที่แล้ว
hey Valerio, where can I talk to you directly?
do you have a conversation room in the discord or telegram or... ?
@arnabmukherjee9939 4 ปีที่แล้ว
Really good video 👌👌 Keep up posting such videos.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thank you!
@ShortVine 4 ปีที่แล้ว
Quality content, God bless you.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Glad you like it!
@achmadarifmunaji3320 4 ปีที่แล้ว
What voice extraction features are suitable for speech recognition (recognizing who is speaking not the words)?
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
You can try with MFCCs and spectrograms.
@achmadarifmunaji3320 4 ปีที่แล้ว
@@ValerioVelardoTheSoundofAI thank you for answering my question
continue your amazing work !!
@nishantbarsainyan5700 3 ปีที่แล้ว
can you please prepare one series for emotion recognition from speech
@alymohamed2936 4 ปีที่แล้ว
I have a project to create a model that performs acoustic source localization using deep learning is there any videos to help in this area?
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
I haven't tackled that topic. Unfortunately, I don't know any other videos.
@sifftube7537 4 ปีที่แล้ว
it was very helpful, but please can u show us how to prepare corpus for other language from scratch like under resoursed language from broadcast news data
@jokkerBANG 3 ปีที่แล้ว
Amazing! This helps me tremendously
@hussain_sh2763 4 ปีที่แล้ว
If the program has many MFCCs for an audio, will the program average that MFCCs to get just one MFCC? please if you understand what I have meant, explain your answer more
@zeldisuryady1541 4 ปีที่แล้ว
Hi Valerio, Nice and very informative series on understanding Audio for Machine learning.
One question about MFCC spectogram. it is shown from the MFCC spectogram, the first coefficient of MFCC is always the least value representing by blue color.
why is that so? thanks for your response.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
The first MFCC value is the least representative for an audio file, and is often dropped for audio characterisiation. That's because it has information mainly connected to loudness.
@zeldisuryady1541 4 ปีที่แล้ว
@@ValerioVelardoTheSoundofAI Thanks valerio for your response.
@raghavgupta6186 3 ปีที่แล้ว
Really great content 🙏
@isaacmwanza9162 3 ปีที่แล้ว
Would like to know how to classify sound
@133sjassson8 3 ปีที่แล้ว
Dear Mr. Velardo, I am currently work on a hearing aid app, which is real-time deep learning speech enhancement on Android. I am planning to deploy a Tensorflow lite model. I am stuck in both theoretical and technical problems. Can you please give me some advices? Which videos on your channel should I watch? Thank you.
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว ⁺¹
I would suggest checking the "AI audio application from scratch" playlist. There, you'll see how to build an AI audo app in TensorFlow and deploy it via Docker on AWS.
@133sjassson8 3 ปีที่แล้ว ⁺¹
@@ValerioVelardoTheSoundofAI I believe you are referring to the playlist "Deep Learning (Audio) Application: From Design to Deployment".
Thank you for replying.
@MrSushantsingh 4 ปีที่แล้ว
Can you reference some recent papers on Deep Learning applications for Audio applications?
Is it a hot area for research as of now or if it has a promising future with the implementation of NLP etc.
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
DL for audio/music is a hot area at the moment. There's a lot of research in the space. I suggest you to take a look at the proceedings of the ISMIR conference for a few applications of DL in the music space ismir.net/conferences/ismir2018.html I hope this helps :)
@mariameboukrim657 8 หลายเดือนก่อน
Great video, thank you.
@AM-jx3zf 4 ปีที่แล้ว
great stuff, man... props
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thanks!
@bgsouayboss199 4 ปีที่แล้ว
This is really a good video, can you make a video about x-vector and I-vector that would be really cool
@thiebesleeuwaert9930 4 ปีที่แล้ว
Hi, I found this video really informative
@pravinyadav8372 4 ปีที่แล้ว
This was incredible
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว
Thanks!
@dudusash 4 ปีที่แล้ว
How to understand more about MFCC?
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
Wait for my video on MFCCs in my new Audio Processing for Machine Learning series ;)
@artyomgevorgyan7167 4 ปีที่แล้ว
Very good resource for newcomers like me.
@a2sirmotivationdoses782 3 ปีที่แล้ว
Sir can you please teach how to build Speech To Text Machine learning Model from Scratch? Please Reply Sir
@ValerioVelardoTheSoundofAI 3 ปีที่แล้ว
This is something I'll tackle in the future. Stay tuned!
@bangladeshisingaporevlog9273 4 ปีที่แล้ว ⁺¹
want detail video on mfcc
@ValerioVelardoTheSoundofAI 4 ปีที่แล้ว ⁺¹
I'll publish a video on MFCCs as aprt of my "Audio Signal Processing for ML" series over the next few weeks. Stay tuned!
@Magistrado1914 4 ปีที่แล้ว
Excellent course
14/11/2020

ต่อไป

เล่นอัตโนมัติ

11- Preprocessing audio data for Deep Learning