Speech features intro 1: (Fast) Fourier transform

Speech features intro 4: Additional aspects

My Problem With (Most) Spectrograms

การ์นาโช่ เล่นเหมือน โรนัลโด้สมัยก่อน #แมนยู #การ์นาโช่

🔴Live สด! PUBG GLOBAL SERIES 6 | GROUP STAGE DAY 3

Trapped by the Machine, Saved by Kind Strangers! #shorts

Speech features intro 3: Mel-scale spectrogram

Herman Kamper

มุมมอง 9 428

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 พ.ย. 2024

ความคิดเห็น • 23

@na50r24 วันที่ผ่านมา
17:35
What confuses me about this is, can we do the comparison to figure out if the same word is in the signal or if the both signals came from the same speaker? (IIrc the algo used for this is called DTW which is very similar to the Edit Distance algo)
@doranoon10 ปีที่แล้ว ⁺¹
hey Herman!! just wanted to say thanks for your videos! it's helped me a bunch in my dissertation section on timbre similarity analysis, and it's clear enough that I, a musician, can understand it!
@OussemaGuerriche 4 หลายเดือนก่อน
your way of explanation is very good
@santiagoguisasola1834 11 หลายเดือนก่อน
Really amazing set of videos, thank you Herman! You have a great presentation style.
Could another way to think about the Mel scale involve harmonics? Since each frequency when doubled (or halved) is the same underlying note (e.g. A is 440Hz --- it is also 220Hz and 880Hz), the space between the same note in an octave gets bigger and bigger as frequency increases. For example, a low A is 27.5 Hz. If we double it, we get the next A at 55Hz. The difference is only 27.5Hz. Going higher, we have an A at 3520Hz. The next A is all the way up at 7040Hz. If we add only 27.5Hz to 3520Hz, we go up to only 3547.5Hz, which isn't even A#!!!! (which is at 3729.310Hz). So the Mel scale adjusts for the growing space between the same notes as frequency goes up.
If so, I wonder why the Mel scale isn't rooted in harmonics and equal temperament (instead of experimental data).
@alfredoalarconyanez4896 3 ปีที่แล้ว ⁺¹
Thank you very much for this awesome video, very well explained !
@kamperh 3 ปีที่แล้ว
Very happy you enjoyed it! :)
@Kotpaz ปีที่แล้ว
You are awesome! thank you so much you were extremely helpful in my project
@nedzadhadziosmanovic3785 3 ปีที่แล้ว ⁺⁴
I simply cannot believe that you have so little views and likes. To be hones, your video on this topic is the best there is on the internet. I hope you make videos as a side thing, and make a lot of money in the meanwhile, because man you know your stuff. All the best and cheers :D
@kamperh 3 ปีที่แล้ว ⁺¹
Very happy you found this so helpful! It's very encouraging! :)
@entertain8768 2 ปีที่แล้ว
@15:28 shape log_mel_spec is (40,161) but in the plot of the same doesn’t seem to have same dimensions why ?
@waisyousofi9139 2 ปีที่แล้ว
Thanks Herman!
Can you share the github link of this playlist's code
@eastchun2635 2 ปีที่แล้ว
Where can I download your example audios (siren.wav, dress_start.wav and where_were_you.wav)?
@mohamadhamoudy8232 3 ปีที่แล้ว
Dear Professor Herman , please could you post some videos on Wavelets , Scalogram in speech signal processing , thanks
@kamperh 3 ปีที่แล้ว ⁺¹
I wish I had more time, Mohamad!!
@emrekulkul4784 ปีที่แล้ว
hey man, hope youre doin good :) I have one part that still eludes me: when we obtain the vectors of each stft frame, what are exactly the values inside the vectors? I dont understand what people mean with “features”. What type of features do these values represent? Also, why are the filters shaped as a triangle? What is the reasoning of that? Thanks a lot in advance, luv ur channel :)
@kamperh ปีที่แล้ว ⁺¹
Thanks for good questions!
The features inside each STFT window is typically a modification of values coming from a discrete Fourier transform. Without getting into all the details here, you can think of the first value in this vector as telling you something about the lowest frequency content in that little snippet at audio; the last value in the vector at the highest dimension tells you something about the highest frequency content.
Your question about the triangular window is also good, especially since I don't actually know the answer! There are some good reasons that you want some tapering off of that window on the sides, but I don't know why we specifically use a triangular window. It might be that this window is one of the easiest tapered windows to actually implement. Hope that helps!
@entertain8768 2 ปีที่แล้ว
Great explanation. Please share the notebook
@SH-ee2hs 3 ปีที่แล้ว
hi sir can u also add linear predictive coding ,GMM,EM topics of speech
@kamperh 3 ปีที่แล้ว ⁺¹
I wish I had 60 hours every day to just make videos... :( But hopefully in the future!!
@SH-ee2hs 3 ปีที่แล้ว
@@kamperh can i have your email contact?
@kamperh 3 ปีที่แล้ว
@@SH-ee2hs I don't want to post it here on TH-cam, but you should be able to find it from my home page. Hope that helps!
@kumar707ful 2 ปีที่แล้ว
Where I can find the python code ?
@mrstanton81 10 หลายเดือนก่อน
Note.

ต่อไป

เล่นอัตโนมัติ

Speech features intro 1: (Fast) Fourier transform

Speech features intro 1: (Fast) Fourier transform

Speech features intro 4: Additional aspects

Speech features intro 4: Additional aspects

My Problem With (Most) Spectrograms

My Problem With (Most) Spectrograms

การ์นาโช่ เล่นเหมือน โรนัลโด้สมัยก่อน #แมนยู #การ์นาโช่

การ์นาโช่ เล่นเหมือน โรนัลโด้สมัยก่อน #แมนยู #การ์นาโช่

🔴Live สด! PUBG GLOBAL SERIES 6 | GROUP STAGE DAY 3

🔴Live สด! PUBG GLOBAL SERIES 6 | GROUP STAGE DAY 3

Trapped by the Machine, Saved by Kind Strangers! #shorts

Trapped by the Machine, Saved by Kind Strangers! #shorts

เรอัล มาดริด 1-3 เอซี มิลาน | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

เรอัล มาดริด 1-3 เอซี มิลาน | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

Mel Spectrograms Explained Easily

Mel Spectrograms Explained Easily

The Fourier Transform Applied to Sound

The Fourier Transform Applied to Sound

MFCC and Mel Spectrograms (.NET, librosa, kaldi, torchaudio)

MFCC and Mel Spectrograms (.NET, librosa, kaldi, torchaudio)

Mel-Frequency Cepstral Coefficients Explained Easily

Mel-Frequency Cepstral Coefficients Explained Easily

CS 472 Module 16: Mel Filterbanks And Mel Spectrograms

CS 472 Module 16: Mel Filterbanks And Mel Spectrograms

CSD3084 Spectrogram Basics

CSD3084 Spectrogram Basics

Short-time Fourier Transform and the Spectogram

Short-time Fourier Transform and the Spectogram

Seeing Voices: 1 - Intro to Spectrograms

Seeing Voices: 1 - Intro to Spectrograms

Sound and Waveforms

Sound and Waveforms

สปอร์ติ้ง ลิสบอน 4-1 แมนเชสเตอร์ ซิตี้ | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

สปอร์ติ้ง ลิสบอน 4-1 แมนเชสเตอร์ ซิตี้ | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

พีกจัด ! หลักฐาน “39 ล้าน” โผล่มัดทนายดัง อึ้ง “เส้นเงิน” ชัด ทิ้งร่องรอย ? #ถกไม่เถียง

พีกจัด ! หลักฐาน “39 ล้าน” โผล่มัดทนายดัง อึ้ง “เส้นเงิน” ชัด ทิ้งร่องรอย ? #ถกไม่เถียง

นาทีระทึก ตร.ไล่ล่า "ทนายตั้ม-เมีย" รวบคารถหรู นำตัวส่งกองปราบ : Matichon TV

นาทีระทึก ตร.ไล่ล่า "ทนายตั้ม-เมีย" รวบคารถหรู นำตัวส่งกองปราบ : Matichon TV

เส้นเงินมัด ข่าวสะพัด “หมายจับมาแน่” ตร. ยันชัด “ตั้ม - เมีย” หนีออกสระแก้ว ! #ถกไม่เถียง

เส้นเงินมัด ข่าวสะพัด “หมายจับมาแน่” ตร. ยันชัด “ตั้ม - เมีย” หนีออกสระแก้ว ! #ถกไม่เถียง

OHANA บ้าพลัง EP.126 : เกมการ์ดโอฮาน่า x นินิว โย ฝน

OHANA บ้าพลัง EP.126 : เกมการ์ดโอฮาน่า x นินิว โย ฝน

หนาวสองลีก!ผ่า5จุดหงส์พังห้างยาดับทีมคุณชาย-"ดิอาซ"รัวแฮททริคทะยานยึดฝูงยูซีแอล : ตามหงส์ลงสนาม

หนาวสองลีก!ผ่า5จุดหงส์พังห้างยาดับทีมคุณชาย-"ดิอาซ"รัวแฮททริคทะยานยึดฝูงยูซีแอล : ตามหงส์ลงสนาม

Liverpool 4-0 Bayer Leverkusen | Champions League 24/25 Match Highlights

Liverpool 4-0 Bayer Leverkusen | Champions League 24/25 Match Highlights

Easy hack to make sure your glass doesn't shatter!

Easy hack to make sure your glass doesn't shatter!