Mel Spectrograms Explained Easily

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ธ.ค. 2024

ความคิดเห็น • 134

  • @ValerioVelardoTheSoundofAI
    @ValerioVelardoTheSoundofAI  17 วันที่ผ่านมา +1

    I noticed I made a mistake in the formula at 7:36. In the log, I wrote "f / 500". But it's supposed to be "f / 700". So, the value of the constant is wrong in the slides. Sorry about that ;)

  • @zhouzhou3785
    @zhouzhou3785 4 ปีที่แล้ว +14

    thank god your videos just make my learning curve of speech processing much flatter just like mel scale does.

  • @romainpattyn4528
    @romainpattyn4528 3 ปีที่แล้ว +47

    Really nice video thank you, i like the way you explain things. Just wanted to mention that there is an error at 13:47, in the formula to go from Hz to Mel, the frequency should be divided by 700, not by 500. 😉

  • @jennifer6278
    @jennifer6278 4 ปีที่แล้ว +24

    I was struggling so much trying to understand this for my speech recognition class, I can't believe I understood everything within only 30 minutes! Thank you so much! :) This is incredibly well explained. Now on to MFCCs ...

    • @aoliveira_
      @aoliveira_ 2 ปีที่แล้ว +2

      Don't forget that when you came here you already had previous knowledge. I also consider that these videos are very good in explaining things that I've struggled to understand in other places. But I didn't begin here. Most likely you are complementing these explanations with your previous knowledge.

    • @free_thinker4958
      @free_thinker4958 หลายเดือนก่อน

      ​@@aoliveira_exactly

  • @dn54321
    @dn54321 ปีที่แล้ว +2

    At 2:20, you mention that the higher frequencies sound similar but I hear the opposite. The lower frequencies I can't distingush, the higher ones, I can.
    Edit: had to wear headphones to hear the difference x_x

  • @StefaanHimpe
    @StefaanHimpe 4 ปีที่แล้ว +21

    8:15 is it 500 or 700 ?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 ปีที่แล้ว +13

      Great catch Stefan! It's supposed to be '700' not '500'. Thank you for pointing this out!

    • @kazmzengin5176
      @kazmzengin5176 3 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI Hi Valerio, I would ask same question, if i couldn't read your emendation. Maybe you sould emendate in video too. Thank you very much for your video series.

  • @nedzadhadziosmanovic3785
    @nedzadhadziosmanovic3785 3 ปีที่แล้ว +5

    In this video, and the next video called "Extracting Mel Spectrograms with Python" you are explaining to us what does a mel band mean, mel scale, mel filter bank etc, but in my opinion there is a single step missing for understanding what is really done when using mel filter banks to construct a mel spectrogram.
    The process you are referring to:
    1. Find the smallest and biggest frequency expressed in Hz, which we got from the output of STFT
    2. Convert these two values from Hz to mel scale
    3. Choose the number of mel bands we want to use
    4. According to the chosen number of mel band, we construct a mel filter bank
    And now comes the part which is not clear to me: The use of mel filter banks on outputs of STFT to get output of some other kind, which will be used to construct a mel spectrogram.
    At this point let's just go back and look at a single output of the STFT (which is equivalent to performing DFT on one frame of an audio wave). As a result we get a set of complex numbers, and by finding their magnitudes we are able to construct a amplitudeVSfrequency graph (also called "frequency domain graph"), by simply plotting the magnitudes as the amplitude for a certain frequency. In other words, each of the magnitudes of the complex numbers (to be clear, one magnitude per one complex number) is responsible for the high of one bin inside the amplitudeVSfrequency graph.
    Now we have this single amplitudeVSfrequency graph, and we want to use it in combination with mel filter banks to construct output of some kind. First question is how to apply a mel filter bank to a single single output of STFT (i.e. to one amplitudeVSfrequency graph)? In other words, how to combine these two to get an output of some kind? (I know that is a multiplication of two vectors basically, but how would you represent this visually, using a mel filter bank and a single amplitudeVSfrequency graph). Secondly, what is the this output representing, the amplitude for a single mel band? Lastly, I think it would be much more clear if we used mel bands on the y-axis and mel measuring unit (but I don't know would this be correct), but in my opinion, putting frequency in Hz on y-axis of a mel spcetrogram is completely misleading (and is making me think I did understand anything).
    I wanted to ask you would you be so kind to make a single graph which is the output of a single amplitudeVSfrequency graph (which we got from STFT) and mel filter bank, also expressed visually as graph (I suppose, but I am not sure that it would then be a amplitudeVSfrequency graph, but this time with mel frquencies on the x-axis), as I think that it could help both me, and a lot of your viewers?

  • @aussieronnied
    @aussieronnied 4 ปีที่แล้ว +3

    Thanks Valerio! The triangular filter bank visualisation helped me connect the dots in understanding what is happening behind the scenes. Keep up the great work :)

  • @MathStatsMe
    @MathStatsMe 4 หลายเดือนก่อน

    Citing you in my master's thesis. Thank you for these videos!

  • @magnuspierrau2466
    @magnuspierrau2466 3 ปีที่แล้ว +3

    This was just awesome! Thank you so much for explaining this concept so clearly, intuitively and passionately! Great stuff! :)

  • @andres-ab
    @andres-ab 4 ปีที่แล้ว +3

    I have one question. Given the desired for the NN to learn or catch a pattern that the human ear may not recognize (e. g. classification in cough of different diseases, or positive/negative cases of one disease), what's the need to input the NN a spectogram with "humanly perceived coherence"? Could it be possible to avoid the frequency and amplitude correction? Does it make sense to do so?
    Thanks a lot. I really love this series.

    • @avidreader100
      @avidreader100 3 ปีที่แล้ว

      I guess there can be any number of features suitably defined based on our objective and current insight. Mel would be one such based on human perception. It could have a great fit for applications where the human perception is relevant. There is no compulsion to use it for classifying cough. I would imagine a differently defined scale can very well be used.

  • @oscarwjy5084
    @oscarwjy5084 3 ปีที่แล้ว

    Man you really helped me a lot for my thesis related to auditory filter bank

  • @SonGoku-rl9qf
    @SonGoku-rl9qf 10 หลายเดือนก่อน

    At 27:40 the Mel spectogram has Hz at it´s coordinate axis. I thought it should be Mel?

  • @Saitomar
    @Saitomar 3 ปีที่แล้ว +2

    Hi Valerio. How is mel spectrogram is better compared to vanilla spectrogram in terms of deep learning? I understand that it is better in terms of how we perceive audio as humans. But for deep learning, the models pick up features that are more relevant to the model like how for images we just provide the image as a 3d array and the model identifies the underlying pattern. Is there any paper where there is a comparison for mel spectrogram and vanilla spectrogram in terms of deep learning?
    Thank you for the video

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 ปีที่แล้ว +1

      In general, people tend to use MelSpecs over vanilla specs. I'm not aware of any paper that compares the two across the board. Performance of the 2 representations depends on each task. The empirical approach is the best way to check which is best for you. Try both representations for your use case on the same architecture.

    • @Saitomar
      @Saitomar 3 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI how does it depend on the given task? I am assuming the time-domain representations performance in DL to be task agnostic

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 ปีที่แล้ว

      @@Saitomar Unfortunately, it's not agnostic and it depends on the task.

    • @Saitomar
      @Saitomar 3 ปีที่แล้ว

      Thanks for the reply, I am working on a model which was used in image classification and trying to use it for audio classification, which is why I was curious. Hopefully the results will be good.

  • @superhorstful
    @superhorstful 3 ปีที่แล้ว +4

    Isn't there an error in the formula for the mel frequency? I mean it should be f/700 and not f/500?

    • @tiagobeltraolacerda5034
      @tiagobeltraolacerda5034 3 ปีที่แล้ว

      I noticed that too. Using 700 we got that 1000 Mel = 1kHz, but using 500, 1238.1 Mel = 1kHz. I didn't understand why.

  • @kirdiekirdie
    @kirdiekirdie 8 หลายเดือนก่อน

    Fantastic explanation! Needed this as a prerequisite to understand the OpenAI Whisper paper.

  • @rprantoine
    @rprantoine 3 ปีที่แล้ว +2

    Hi Valerio,
    Thank you for your content, first of all!
    One thing I struggle to understand though is the need to have bands for Mel, and then the use of filters.
    Intuitively, to convert frequencies to Mels, I would have just applied the given Mel=f(frequency) formula to my discrete frequency vector and used the resultant discrete Mel vector as my y-axis.
    How is that not correct?
    Why do we need bands?
    Thanks in advance
    Antoine

    • @antonselitskii8351
      @antonselitskii8351 3 ปีที่แล้ว

      Don't forget, we work with discretized data. You could notice that the number of Mels (0, m_1, ..., m_63 in total 64) is smaller than the number of frequencies (0, f_1, ..., f_512 in total 513 = 1024/2+1) . The intervals [0, f_1), [f_1, f_2), ..., [f_511, f_512), ..., [f_1023, f_1024) are called linear frequency bins, each interval is associated with its left boundary. Because of the symmetry of SFT, we use only half of them: 0, f_1, ..., f_512. We want to have Mel frequency bins [0, m_1), [m_1, m_2), ..., [m_63, m_64). Obviously, some linear frequency bins will collapse in one Mel bin, that is why we need a convolution with filters.
      In TorchAudio, this is done by a matrix 64x513. Use ms = torchaudio.transforms.MelScale(n_mels=64, sample_rate= sr, n_stft=1024//2+1), the matrix is saved in ms.fb variable.

    • @Waffano
      @Waffano 2 ปีที่แล้ว

      @@antonselitskii8351 Great answer. Made me wonder: why do we not have # mel frequency bins = # frequency bins? Then we could just apply the mel function on all the frequency bins like @Antoine suggest right?

    • @antonselitskii8351
      @antonselitskii8351 2 ปีที่แล้ว +1

      @@Waffano You can think about this as a dimension reduction: you have vector f (say 1024) and m (say 80 mels) and transformation matrix T of size 80x1024. Then m = Tf. Yes, it will transform all linear frequencies. It's clear that we can do the inverse transformation, but it will not be precise, because we'll go from vector of size 80 to a vector of size 1024.

  • @nezardasan5015
    @nezardasan5015 4 ปีที่แล้ว

    DANKE Valerio, always shining

  • @yuu_808
    @yuu_808 8 หลายเดือนก่อน

    How good explanation about that. It helped me to understand mel spec. Thank you so much.

  • @ash3844
    @ash3844 3 ปีที่แล้ว

    Amazing!!! Loving all the series of your videos. Thanks a ton!!!

  • @antonnaumov4889
    @antonnaumov4889 3 ปีที่แล้ว +1

    Hi, Valerio!
    Thanks a lot for your videos! Can you please explain, why on the mel spectrogram we are still using Hz units (at 26.47) ?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 ปีที่แล้ว

      That's just a convention to indicate how the different Mel bands are mapped to in terms of frequency.

  • @muntazirmehdi503
    @muntazirmehdi503 3 ปีที่แล้ว

    you mentioned about the piano that we can use 40 mel banks as the notes are similar, but if we are working on audio (speech data) and have voices of different people with different voices, for that case how we can determine mel banks.
    TIA

  • @sailfromsurigao
    @sailfromsurigao ปีที่แล้ว

    I greatly appreciate the content you've been sharing on audio processing for machine learning; it's incredibly insightful. I am particularly interested in the intersection of audio and image data. Would it be possible to discuss methods for transforming an image into a Mel spectrogram or a standard spectrogram?

  • @lenam317
    @lenam317 7 หลายเดือนก่อน

    Thank for great video. I am also trying to implement a kind of ASR for my project but I am unable to find any C/C+ libraries that support MFCC features from a live audio source ? It'd be great if you can give me some pointer here.

  • @Moonwalkerrabhi
    @Moonwalkerrabhi 3 ปีที่แล้ว

    at 18:55 , i think the x axis Freq is in KHZ not HZ, coz 1000 Khz = 1000 mel, m not sure though, but i think it is

  • @Underscore_1234
    @Underscore_1234 7 หลายเดือนก่อน

    Hi, nice stuff (didn't know any about mels), but I wonder, I guess you apply triangular filters in the mel-domain, if so, the filter is not triangular in the (linear) frequency domain right? I believe the shape shouldn't be a triangle anymore in the linear frequency domain (in other words you apply the mel transformation before applying a filter right?)

  • @markusbuchholz3518
    @markusbuchholz3518 4 ปีที่แล้ว +3

    Perfect! Thanks Valerio for this interesting video. Iam not going to be myself if I do not ask ... . There is a "long pipeline" in signal processing for deep learning. We "loose" info while sampling, quantisation, performing STFT, and now using triangular filers. Afterword we perform convolutions and again some important info is lost. Do you think that this process is "smart" enough and energy efficient ? I assume that, given question is related directly how we want to apply deep learning - I mean what we want to do with the signals - classification, generation, filtering, prediction and so for.
    Great channel and community!

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 ปีที่แล้ว +1

      You're spot on! The pre-processing audio pipeline can be quite convoluted. That's why some researchers are experimenting with raw audio signals. The problem with this is that audio is highly dimensional. The preprocessing steps we usually take with spectrograms trade "perfect" information with lowered data dimensionality.

    • @markusbuchholz3518
      @markusbuchholz3518 4 ปีที่แล้ว +1

      @@ValerioVelardoTheSoundofAI Thanks for feedback and clarification. Anyway, in order to improve something it is great to be familiar with principles. Thanks!

  • @kenand330
    @kenand330 3 ปีที่แล้ว +3

    Sir, there is something I don't understand here. We do not perceive the pitch difference between the first two notes you play. We can perceive the pitch difference between the second pair of notes. But shouldn't it be the other way around? Am I the only one hearing this?

  • @andreeamadalina8509
    @andreeamadalina8509 3 ปีที่แล้ว +3

    Is it just me but for the first pair I hear only one sound, while for the second one, I hear two sounds? Shouldn't have been the other way around? Lol

  • @canernm
    @canernm 3 ปีที่แล้ว +1

    Hi Valerio, thanks for the videos. I have one question: in the previous video of the playlist, we took a vanilla spectrogram and transformed it to be both a log-amplitude and log -frequency spectrogram. The difference between Mel Spectrogram and the transformed one, is simply that in the latter we use a simple log2 scale?

  • @tetlleyplus
    @tetlleyplus ปีที่แล้ว

    Is filtering using the mel banks just (algebraically) multiplied because convolution in the time domain is equivalent to multiplication in the frequency domain?

  • @mahathibodela
    @mahathibodela ปีที่แล้ว

    As, usual its a really informative, easy to understand video..Bt, i have a doubt. The spectogram u have showed in the last video was having log ranges for frequency and this mel spectorgram also has the same.. why cant we just do in the way as u said in the last video??

  • @armanz.9182
    @armanz.9182 ปีที่แล้ว

    How well would rhythm be represented in mel spectograms? I can imagine 'pure' rhythm information to be stored in the low frequencies, but these are compromised in these spectograms right? I had the idea that maybe rhythm information can be found between 0.55Hz (33bpm, lowest perceivable tempo) and 20Hz (lowest perceivable tone). I have no idea though as to how valid this is.
    I would love to hear if anyone knows a valid way to analyze just rhythm, thanks!

  • @Mattews1119
    @Mattews1119 4 ปีที่แล้ว +2

    Thank you Valerio for the amazing content! I'm really grateful for the time and work you're spending in this videos. The way you teach is very clear and simple, I like that a lot :D
    Also, if you don't mind, I have a question. I was wondering if extracting frequency features (Spectral Centroid, Rolloff, ...) from a mel spectrogram, instead of a regular spectrogram, would be more beneficial for a MIR application?

  • @Beatitat
    @Beatitat ปีที่แล้ว

    Could we go about using these features with identifying Keys and the chords? Watching your videos so I can learn a way to make a simple program that does chord progression detections of songs. Thanks for the videos!

  • @erkangjing2124
    @erkangjing2124 4 ปีที่แล้ว

    Thank you for your sharing. And it's really useful for my learning on audio signal processing. Others things such as mel bands, mel filter bands, frequency resolution, and the frequecy range that that can be perceived by human beings, are sometimes so hard to distinguish and determine them. I hope that I could find the answer in the discussion board or other sharings of yours. Finally, really thanks for you sharings.

  • @shahnaz1981fat
    @shahnaz1981fat 3 ปีที่แล้ว

    Hai Valerie . Nice explanation on Mel spectrograms. But I could not understand the triangular filter banks.
    It gives visualilization of the transformation from hz to mels. But as the triangles are overlapping, is it one to many transformation? I am preparing for PhD interview, unless it is not clear for me I cannot be confident. Please clarify…

  • @김성빈-g1w
    @김성빈-g1w 3 ปีที่แล้ว +1

    Thank you for the excellent explanation. One quick question, is mel-spectrogram always good for deep learning? What I mean is that regardless of the sound classes(speech, ambient sound ...), is mel-spectrogram always better than using spectrogram?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 ปีที่แล้ว +1

      That will depend on the particular problem. For that reason, it's always advisable to try out different audio representations.

  • @DavidKalinex
    @DavidKalinex 4 ปีที่แล้ว

    Very useful video! No doubt I will be revisiting for the rest of the year to finish my thesis

  • @Sam-jk5dw
    @Sam-jk5dw 3 ปีที่แล้ว

    I wish there was a frequency conversion example for the Mel Filter bank. LIke just one example where you take a freqency(which doesn't have a weight of 0 or 1) and convert it to Mels. I felt like I didn't quite know what you were trying to say.

  • @disturbedeyebrow5977
    @disturbedeyebrow5977 4 ปีที่แล้ว +1

    Thanks dude, you didn't mention the optimal number of MFCCs to use for image processing. In one of your previous videos you said that 13 MFCCs is the best choice for audio processing, why 13 ? and how to determine the optimal number ?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 ปีที่แล้ว +2

      I'll post a couple of videos (theory + implementation) about MFCCs in the coming weeks. (Stay tuned for those!)
      The short answer to your question is that 13 is a number traditionally used in earlier AI music research. Sometimes this number goes up to 48 or even 90.
      As I mentioned for the number of Mel bands in this video, these numbers are somewhat arbitrary and must be treated as hyperparameters, which should be optimised.

    • @disturbedeyebrow5977
      @disturbedeyebrow5977 4 ปีที่แล้ว +1

      Thank you for answering so fast ! I'll be patient for incoming vids !

    • @iioiggtrt9085
      @iioiggtrt9085 4 ปีที่แล้ว

      how save it as csv file for ml

  • @melverys
    @melverys ปีที่แล้ว

    This is how I found your video: I recently got into learning the Japanese language and I thought it would be cool to see the spelling of my name in Japanese. Seems like Mel translates to Meru and the definition of my name in Japanese is a logarithmic transformation of a signal’s frequency. Kind of an interesting rabid hole to go down since I’m a math geek and a musician too lol

  • @Deathlydave
    @Deathlydave 3 ปีที่แล้ว +1

    Great video and great series. I really learned a lot from watching these videos. One thing that I am a little unclear about is why is the shape of the mel filter band (# bands , frame size / 2 + 1)? Are the values of the mel filter band simply the weights for the triangle filters? If so, since the triangle filters cover an increasing range of frequencies in Hz, how do we maintain the fixed frame size / 2 + 1 size?

    • @ehtashamulhaque5002
      @ehtashamulhaque5002 2 ปีที่แล้ว

      Edit: Okay I also had this confusion but remember we are doing STFT? And the number of our frame_size is actually dictating how many bins we are producing in the spectogram. It is easy to get confused when there are so much stuff to look out for.

  • @andreiplatonov7689
    @andreiplatonov7689 3 ปีที่แล้ว +2

    Thank you for your videos!
    However, if you place f=1000 in the formula of 'frequency to mel' conversion, you do not get 1000 mel..

    • @pjmmccann
      @pjmmccann 2 ปีที่แล้ว

      * It should be 700, not 500 in the formula (see the inverse function, for example)

    • @zzhou4621
      @zzhou4621 2 ปีที่แล้ว

      Formulation: mel = 1/log(2) * (log(1 + (Hz/1000))) * 1000 [Reference: Traunmueller, H. (1990) \"Analytical expressions for the tonotopic sensory scale\" J. Acoust. Soc. Am. 88: 97-100]

  • @matthewsmalatji5994
    @matthewsmalatji5994 4 ปีที่แล้ว +1

    Hey man. I love the series. I need some help. I want to perform obtain AUDIO FRAMES and generate SPECTROGRAMS for each frame... SO I CAN FEED CNN the spectrograms to do Music Transcription. Please Help. I am able to generate spectrograms using VQT the issues comes with generating frames and spectrograms for each frame

  • @aayushchheda8689
    @aayushchheda8689 ปีที่แล้ว

    Don't really understand the psychoacoustic experiment ? Can you explain it here ?, I do not perceive the pitch difference between the first two notes you play. I can perceive the pitch difference between the second pair of notes. So shouldn't it be the other way around or am i getting something wrong..

  • @maddai1764
    @maddai1764 4 ปีที่แล้ว +2

    me again, why not just use the equation of frequency to mel to convert the hertz to mel just as you did in the previous videos to convert the herz to log (log frequence) ? why go through all these hastles ? I know there should be a reason, but dont grasp it.

    • @zzhou4621
      @zzhou4621 2 ปีที่แล้ว

      me toooo!

  • @sarathanurahiyarehewage4642
    @sarathanurahiyarehewage4642 2 ปีที่แล้ว

    I have a question. When m=2595.log(1+f/500), the f shud be equal to 500(10^(m/2595) -1). Where is this 700 come from in f=700(10^(m/2595) -1)? is it a mistake?. In your video, it shows 700 in two places? Or am I missing something ?

    • @Waffano
      @Waffano 2 ปีที่แล้ว

      Valerio wrote in a comment above that the first formula had a typo. It should be 700 instead of 500.

  • @mukundsrinivas8426
    @mukundsrinivas8426 3 ปีที่แล้ว

    Amazing series of videos. Did u cover how to deal with audio of varying lengths in any video?

  • @alfredoalarconyanez4896
    @alfredoalarconyanez4896 3 ปีที่แล้ว

    Thank you Valerio for this super nice video

  • @zzhou4621
    @zzhou4621 2 ปีที่แล้ว

    oh, why need use the triangular filters , it seems also can get Mel spectrogram if use the formulation straightly. is there anybody know?

  • @ebrukeklek3237
    @ebrukeklek3237 3 ปีที่แล้ว

    Incredibly good work Mr.
    Sometimes it was hard to understand you because of you talking really fast and with a dialect 🤣🙈 but your devotion is fantastic ❤️

  • @jamalseyedmohammadi6681
    @jamalseyedmohammadi6681 3 ปีที่แล้ว

    Hi. Great video. I have one question. What is the difference between log frequency spectrogram and mel spectrogram? Thanks

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 ปีที่แล้ว +1

      I suggest you to check out the previous videos on STFT, where I introduce the concept of (Log) Spectrogram. In a nutshell, the Mel Spectrogram is a normal spectrogram where we apply Mel filterbanks.

    • @pranavsingh1081
      @pranavsingh1081 3 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI it is not clear .please explain difference between log spectrogram and mel spectrogram

  • @IamAayam-rz8md
    @IamAayam-rz8md 6 หลายเดือนก่อน

    In the formula for mel, there should be f/700 right?

  • @mangomonkey7830
    @mangomonkey7830 4 ปีที่แล้ว

    Hi, What if my audio files are an hour long. When I use librosa to load them, I only obtain the first 3 mins. What's the standard practice to generate mel spectrograms for hour-long audio recordings?

  • @burak4799
    @burak4799 ปีที่แล้ว

    You are a life saver! Thank you very much for the detailed lecture :)

  • @qin7280
    @qin7280 4 ปีที่แล้ว

    Hi Valerio Thanks so much for your effort making these videos! I am keeping learning it by watching all your videos.
    May I ask a simple question about the Mel-spectrograms? Is it also useful if I want to detect the sound of heartbeat?
    Actually that's what I am doing recently but I am a totally beginner.
    I am so appreciate if you can share your ideas or any other good materials of this heartbeat detection stuff!!

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 ปีที่แล้ว

      Yes, Mel spectrograms (usually!) work well with most audio classification problems.

  • @jaydeepchauhan2737
    @jaydeepchauhan2737 3 ปีที่แล้ว

    What is difference between filter bank feature and Mel-spectrogram feature? Are both same?

  • @arvindramanathan329
    @arvindramanathan329 4 ปีที่แล้ว

    clear and intuitive explanation, thanks!

  • @damdidum2601
    @damdidum2601 3 ปีที่แล้ว

    excellent video, u r realy good at explainig these stuff!

  • @luandesouzasilva565
    @luandesouzasilva565 3 ปีที่แล้ว

    Thank you so much for these videos!

  • @minired4611
    @minired4611 3 ปีที่แล้ว

    thank for your clear explanation. It help me a lot.

  • @preethamgali3023
    @preethamgali3023 3 ปีที่แล้ว

    Great explaination. 🔥🔥

  • @kirdiekirdie
    @kirdiekirdie 8 หลายเดือนก่อน

    Tried to listen to the C2 note several times until I figured out that my Lenovo laptop speakers apparently don't go that low, but my cheap headphones do :-)

  • @pranavsingh1081
    @pranavsingh1081 3 ปีที่แล้ว

    could u please tell us the difference between log spectrogram and mel spectrogram ?

    • @chrischang1980
      @chrischang1980 3 ปีที่แล้ว +2

      I think the difference is mel spectrogram is applying the mel filter, the result for a specific mel frequency is a weight sum of original frequency. Log spectrogram only change the scale from linear to log.

  • @henoknigatu7121
    @henoknigatu7121 ปีที่แล้ว

    can you show us how to convert melspectrogram to audio using python like vocoder

  • @uthsingi
    @uthsingi 10 หลายเดือนก่อน

    I'd like to politely confirm: at 2.20s, it seems like the note played as C2 might actually be C1. I'm not very familiar with musical notes, but the C2 played in your video sounds lower.

  • @manjulakumari953
    @manjulakumari953 3 ปีที่แล้ว

    great video. Must watch

  • @deepikasingh3122
    @deepikasingh3122 ปีที่แล้ว

    but what are filter banks?

  • @lwolstanholme
    @lwolstanholme 3 ปีที่แล้ว +1

    your formula for working out frequency to mel (m = ...) is wrong. your formula for mel to hz however is correct (f = ...)

  • @ian-atg
    @ian-atg 17 วันที่ผ่านมา

    the mel freq formula is incorrect. the graph on 7:50 is kinda deceiving as well. a 20,000 hz freq would be around 9000 mels. in your example its around 1500. i just wonder how many other mistakes are in your series.

  • @ratfuk9340
    @ratfuk9340 2 ปีที่แล้ว

    Why is f=700(10^(m/2595) -1)? Shouldnt it be f=500(10^(m/2595) -1) if m=2595*log(1+f/500)

  • @husamahmed9251
    @husamahmed9251 หลายเดือนก่อน

    why did you changed 500 to 700?

  • @shreyaskulkarni5823
    @shreyaskulkarni5823 2 ปีที่แล้ว

    It should have been 2052 actually to get 263 difference constant.When you showed the graph of mel and freq.

  • @pranavsingh1081
    @pranavsingh1081 3 ปีที่แล้ว

    what is this vanilla spectrogram?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  3 ปีที่แล้ว +1

      It's just the "basic" spectrogram without any manipulation (e.g., applying log, transforming amplitude to dBs).

    • @pranavsingh1081
      @pranavsingh1081 3 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI thank u so much

  • @ashinkajay
    @ashinkajay 2 ปีที่แล้ว

    Thank you so much !

  • @razvandumitrugrecea9388
    @razvandumitrugrecea9388 3 ปีที่แล้ว

    nice one :)
    somebody who shares :)

  • @김하준-j4h
    @김하준-j4h 3 ปีที่แล้ว

    Thanks you are so genius and everyone can understand the concept of Mel Spectrogram by watching your video, however it actually takes too long time to understand a single concept cuz it seems that you repeat certain words or sentences several times and too offer much extra informations time to time. If you can deal with that, I am sure that you will get way more subscribers. Anyways thank you so much.

  • @seohopa
    @seohopa ปีที่แล้ว +1

    th-cam.com/video/3HzgUx9jdy8/w-d-xo.html
    챗지피티 인터프리터로 스펙트로그램 만들기 입니다.

  • @harshitjuneja9462
    @harshitjuneja9462 ปีที่แล้ว

    If we use a CNN model (let's say), shouldn't they automatically learn any such mathematical transformations?

  • @Paplu-i5t
    @Paplu-i5t 6 หลายเดือนก่อน

    There were supposed to be a pair of notes C2, C4, there was only one. Bad editing ?

  • @laithswais7172
    @laithswais7172 5 หลายเดือนก่อน

    ❤❤❤

  • @barbara-su
    @barbara-su 7 หลายเดือนก่อน

    非常好的视频,爱来自中国

  • @berankilic
    @berankilic 3 ปีที่แล้ว

    You are like watching chess videos. And I like chess xd

  • @oguzynx
    @oguzynx 2 ปีที่แล้ว

    what da f is mel bands..... dude do not comfuse us..

  • @mdevelde
    @mdevelde 4 ปีที่แล้ว

    Wrong explanation with many errors. You clearly have no real understanding of what you're talking about.
    First of all. Everybody knows since ancient times we perceive frequency mostly logarithmic. For instance octaves / musical intervals / musical instrument tuning etc are based on this.
    So the question is not how the Mel scale (a recent invention) differs from linear frequency but how it differs from logarithmic frequency. So your whole video is nonsense and fails to explain the actual difference between the Mel scale and the logarithmic scale.
    And many other errors in explaining things and choice of filterbank type etc etc.

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 ปีที่แล้ว +4

      I'll wait for your explanation to learn more.

    • @mdevelde
      @mdevelde 4 ปีที่แล้ว +1

      @@ValerioVelardoTheSoundofAI Too large a list to respond to here. But a simple look at the Mel scale wikipedia page should inform you.
      As for musical intervals they are based on a division of octaves. Octaves are 2/1 ratio, so 100Hz - 200Hz - 400Hz - 800Hz - etc. A logarithmic scale. Again, as I already said, one should compare the Mel scale to a logarithmic scale not to a linear scale.
      And further, number of filterbands are not just randomly chosen they have good reason. It has to do with ringing of the filters or in other words you cannot zoom in on a narrow frequency band without introducing errors in other ways namely amplitude and time. It always works like this it is the law of nature there's no getting around it. And the choice of triangular filters is a particularly poor and naïve one but understandable as many examples have been written using them.
      One more thing about the Mel scale. It's likely not a great model for equidistance hearing. Errors were made in the studies when inventing it over 50 years ago. But again, understandable to use it.
      And apologies for the unfriendly tone of my previous message. I just read it back and could have written it in another way. I was a bit tired and grumphy.

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI  4 ปีที่แล้ว +11

      @@mdevelde I'll avoid commenting on your smug attitude. It speaks volumes by itself.
      I don't see how the "arguments" you raise clash with the content of the video. What superior power ordered that we should "compare the Mel scale to a logarithmic scale not to a linear scale"? Also, what does this mean? The Mel scale IS a logarithmic scale. Or, do you think that applying a few scaling factors to a logarithm (as in the case of the Mel scale) modifies the nature of the logarithm? If you're referring to the difference between the Mel scale and a log2 function, of course I could have shown that. However, people are usually familiar with linear scales, and they probably have an easier time appreciating the difference between a linear scale and the Mel scale, than they have between the latter and a log2 function. BTW, thank you very much for letting me know about the 2/1 octave ratio. In my 25+ years of study in music and my PhD I never encountered this information. Have you thought of publishing this revolutionary result? Oh wait... I mentioned this revolutionary property in a previous video in the series.
      Your comment regarding the number of filter bands makes little sense in the context of this video. I'm not sure what's your background, but in AI audio we use a wide array of filter bands (from as little as 40, to as much as 128+), depending on what works best for the problem at hand.
      I've read papers that suggest that errors were made while working on the experiments for the Mel scale. I'm also aware that triangular filters are not ideal. Nonetheless, Mel spectrograms are used in Machine Learning these days and achieve state-of-the art results in several audio classification problems. This is why I introduced this feature in this series (Audio Signal Processing for ML). I'm not sure if this is clear, but this video approaches the Mel scale from the perspective of machine learning and audio processing, not music cognition.

  • @shahnaz1981fat
    @shahnaz1981fat 3 ปีที่แล้ว

    Hai Valerie . Nice explanation on Mel spectrograms. But I could not understand the triangular filter banks.
    It gives visualilization of the transformation from hz to mels. But as the triangles are overlapping, is it one to many transformation? I am preparing for PhD interview, unless it is not clear for me I cannot be confident. Please clarify…