ความคิดเห็น •

  • @ice_creamu
    @ice_creamu 3 ปีที่แล้ว +34

    I wanted to know what mel spectrograms are but then I watched the first video of the series and now I'm learning a far better-stepped approach to Audio Signal Processing and I'm loving it!

  • @bugveyronFTW
    @bugveyronFTW 4 ปีที่แล้ว +11

    Fantastic video. Much more helpful than any of the other STFT videos on youtube. Thanks a lot!

  • @LFSDR
    @LFSDR 2 ปีที่แล้ว +6

    Just would thought i would let you know that i am about to finish my thesis on Dolphin vocalization feature extraction and distinction using ML classifiers and just found your videos. Very easy to understand and visualize concepts that took me more time that a wished to understand. Keep up the good work

  • @pritamroy770
    @pritamroy770 3 ปีที่แล้ว

    I cant believe you have only 9k views with this level and clarity of teaching!

  • @MrOpossumx3
    @MrOpossumx3 3 ปีที่แล้ว +1

    The concept of the time/frequency trade off in the STFT is greatly introduced!

  • @islandsounds9357
    @islandsounds9357 9 หลายเดือนก่อน

    Thank you so much for this its not only the content and the educational approach is also your style that keeps the interest high

  • @janavanrooyen3798
    @janavanrooyen3798 2 หลายเดือนก่อน

    Fantastic video! It's engaging all the way through and a wonderfully clear explanation of everything.

  • @ramkumarganapathy7795
    @ramkumarganapathy7795 9 หลายเดือนก่อน

    Thank you so much for all these knowledge sharing. These are one of the best video series I have watched in you-tube!

  • @WahranRai
    @WahranRai 4 ปีที่แล้ว +4

    To avoid confusion it will be better to choose another variable (for example L for sfft) instead of N (for fft)

  • @SB-rp8sn
    @SB-rp8sn 9 หลายเดือนก่อน

    Great job on simplifying such a complex topic. Thanks!

  • @ahnafsamin7464
    @ahnafsamin7464 ปีที่แล้ว

    This tutorial is really helpful! Please keep making contents for us.

  • @rangiding99
    @rangiding99 3 ปีที่แล้ว +1

    Though I have said that for a couple of times, but still thank you for all this passionate series!

  • @hernanvaltierra6912
    @hernanvaltierra6912 3 ปีที่แล้ว

    You were born to explain, thank you

  • @shanmukhasaratponugupati6308
    @shanmukhasaratponugupati6308 3 ปีที่แล้ว

    Give this guy a noble prize

  • @thoaiphan5725
    @thoaiphan5725 2 ปีที่แล้ว

    Ver comprehensive. Your body language is also awesome! Thanks so much Prof.

  • @dionsetiawan8798
    @dionsetiawan8798 ปีที่แล้ว

    Cool! This video really helped me for my signal processing subject

  • @radhikasece2374
    @radhikasece2374 3 ปีที่แล้ว

    Thanks a lot for the wonderful explanation. I very first started to learn abt MFCC later, am interested in watching all the videos related to the audio signal processing series.

  • @canyoupleaserunfast
    @canyoupleaserunfast 6 หลายเดือนก่อน

    I wish there was Sound Of AI Discord community! Thanks for these videos a lot!

  • @shaidhasan6895
    @shaidhasan6895 4 ปีที่แล้ว +9

    Could you please make a video on MFCC?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI 4 ปีที่แล้ว +4

      That is planned over the next few weeks!

    • @shaidhasan6895
      @shaidhasan6895 4 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI waiting for it. And thank you for the whole series!

  • @MichaelSievers
    @MichaelSievers 3 ปีที่แล้ว

    Wow, what a great explanation, this has really answered a lot of questions I had about how FFT would work on longer samples. Grazie mille, è stato un piacere guardare il video..

  • @malikamalika7960
    @malikamalika7960 3 ปีที่แล้ว +1

    Amazing! Love your work and engagement, thank you!

  • @JIAmitdemwesen
    @JIAmitdemwesen ปีที่แล้ว

    Very informative and well-presented. Thank you!

  • @cloudhuang700
    @cloudhuang700 3 ปีที่แล้ว +2

    Thanks for the fantastic video. One quick question. What are the advantage and disadvantage for setting frame_size > window_size ? What is the use case for this parameter choice ?

  • @mohammadrahimpoor513
    @mohammadrahimpoor513 ปีที่แล้ว

    Thank you for your great video. It really helped me understand STFT. I subscribed your channel and will eagerly waiting for your videos.

  • @joshmiller3712
    @joshmiller3712 3 ปีที่แล้ว +2

    Hey man! Your fourier stuff is great! I've been playing with audio a bit and found that tensorflow has an awesome method for getting real-valued spectrograms using a method called MDCT (modified discrete cosine transform). Have you ever considered making a video about that? I'm curious to know how that's different from STFT

  • @Afflictionability
    @Afflictionability ปีที่แล้ว

    lovin your videos man keep up the good work :)

  • @chinedueleh3045
    @chinedueleh3045 3 ปีที่แล้ว +1

    Wow! I love this series!

  • @Birdsneverfly
    @Birdsneverfly 12 วันที่ผ่านมา

    3 years late, but I love the video

  • @juleswombat5309
    @juleswombat5309 2 ปีที่แล้ว

    Yes this is the dream scenario emerging! - So I am a bit slow but I think I could use e Spectrograms as the input feature layer directly into a convolutional networks or LSTM networks.

  • @egegoksu9557
    @egegoksu9557 5 หลายเดือนก่อน

    Thank you for this beautiful video

  • @proteus5
    @proteus5 7 หลายเดือนก่อน

    When you multiply your signal with the window function in the time domain you are convolving the frequency response of the window function with your signal in the frequency domain. The frequency response of most window functions is some form of a sinc function. Sinc functions are long and ringy, so the result of the convolution is to smear out the frequency response of your output. This reduces the accuracy of the output of the STFT. There are other spectral decomposition algorithms that produce more accurate results. The STFT is popular because of it's ease of computation, not because of it's accuracy.

  • @mutalasuragemohammed6954
    @mutalasuragemohammed6954 ปีที่แล้ว

    I love the bit; "the k-th frequency at the end temporal uh! bin or n-th frame." 12:18

  • @lumpi806
    @lumpi806 8 หลายเดือนก่อน

    Thank your for your great work !
    At 23:00 there is something I don't understand : you divide by two the frame size, because of the nyquist rule. You obtain 501. But the 501 frequency bins are for...the interval (0 , sampling rate/2).
    So, in the end, you divide by 4 ! The sampling rate is divided by 2, THEN the framesize is also divided by 2.
    Could you explain this,? Thank you.

  • @metehanyurt
    @metehanyurt 2 ปีที่แล้ว

    Fantastic explanation!

  • @adityalesmana2134
    @adityalesmana2134 2 ปีที่แล้ว

    Awesome explanation, thanks !!!

  • @burnspeed
    @burnspeed 2 ปีที่แล้ว

    Hi @Valerio, this is great stuff. Do you have any recommendations for a book to soak all this in ? These would need intense focus and going multiple back and forth on the videos. Thanks.

  • @DOMINIK32110
    @DOMINIK32110 3 ปีที่แล้ว

    You should write your own book, I'd definitely buy it

  • @stefanhopman9176
    @stefanhopman9176 2 ปีที่แล้ว

    Thank you for the video!

  • @DíazRamírezManuel
    @DíazRamírezManuel 10 หลายเดือนก่อน

    Great work. Thanks you so much

  • @MichelHabib
    @MichelHabib 2 ปีที่แล้ว

    great Video, thank you again

  • @scienceshiritai5604
    @scienceshiritai5604 4 ปีที่แล้ว

    Thank you for the video! If you set 'frame size' bigger than the 'window size', does that increase frequency resolution while keeping the time resolution the same (but at more computational cost)?

  • @zookaroo2132
    @zookaroo2132 3 ปีที่แล้ว

    Very cool lecture !!

  • @klaimouad740
    @klaimouad740 4 ปีที่แล้ว

    a Wonderful video, we need to know more about speech processing and especially the mirror process of inverting spectrograms and STFT, could you please suggest me where i can find explanation about the Griffin & Lim algorithm.

  • @Waffano
    @Waffano ปีที่แล้ว

    @20:00 How come we don't use a similar definition of frequency bins for DFT (where the frame size is the size of the whole signal ofcourse)?

  • @edsonjunior9267
    @edsonjunior9267 2 ปีที่แล้ว

    Great job!

  • @karennino6639
    @karennino6639 2 ปีที่แล้ว

    Thank you for sharing!!!

  • @MrDari88
    @MrDari88 4 ปีที่แล้ว

    Just brilliant once again

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI 4 ปีที่แล้ว

      Thanks Dario!

    • @MrDari88
      @MrDari88 4 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI I hope you can help me with this doubt. Is the Hann Window applied after STFT to each frequency bin or before the STFT to each sample within the frame? I got a bit confused since in the STFT formula you have used w(n) and on the Hann window formula w(k). Could you please clarify this? Thanks once again for your amazing videos.

  • @malahatmehraban4340
    @malahatmehraban4340 2 ปีที่แล้ว

    The video was great, thank you. Do you have any instructional videos explaining zero padding?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI 2 ปีที่แล้ว +1

      I've used zero-padding here and there in some videos, but never dedicated a full video to it only.

    • @malahatmehraban4340
      @malahatmehraban4340 2 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI Thank you for your quick response. Actually, I'm a master's student in audio signal processing and your videos helped me a lot to do my master's project. Thanks an ocean :)

  • @MarineroAndroid
    @MarineroAndroid 2 ปีที่แล้ว

    The "temporal" information isn't contained in the DFT phase?, because the DFT is a linear transform, there is not any lose of information, applying the inverse DFT we can get back our signal with it's temporal distribution

  • @huukhangnguyen3497
    @huukhangnguyen3497 3 ปีที่แล้ว

    Wonderful video!!

  • @jessicachen9236
    @jessicachen9236 3 ปีที่แล้ว

    Is the frequency bin parameter in STFT (in the example is 501 bins) means for each frame in the signal?

  • @rekreator9481
    @rekreator9481 3 ปีที่แล้ว +2

    Dont you know why when using librosa stft function in python, the resulting number of temporal bins does not equal to math.ceil of (num_of_samples - frame_size) / hop_length + 1, but rather to math.ceil of (num_of_samples) / hop_length? I am processing some audio files and calculating stft for 66150 samples of signal as input to the function, using 2048 window and 512 samples as hop size... So in theory, I should get (66150 - 2048) / 512 + 1 = 126,199... ~127 temporal bins... But rather than that, the temporal output shape of stft does have 130 elements.. How is it calculating the last few windows, for which the function should not actually have enough samples available, as they are out of provided signal range?

    • @mohammadrezapourtorkan8595
      @mohammadrezapourtorkan8595 2 ปีที่แล้ว

      I encountered the same thing

    • @rushrukhrayan1082
      @rushrukhrayan1082 2 ปีที่แล้ว

      I encountered the same thing using Librosa.
      ```
      number_of_samples = 661500
      FRAME_SIZE = 2048
      HOP_LENGTH = 512
      debussy_spec = librosa.stft(debussy, n_fft=FRAME_SIZE, hop_length=HOP_SIZE)
      debussy_spec.shape
      ```
      This gives: (1025, 1292) ~ (#frequency bins, #frames)
      1. [According to the given formula] #frames = ((number_of_samples - frame_size) / hop_length) + 1 = ((661500 - 2048) / 512) + 1 = 1288.9921875 = 1289
      2. [In reality] #frames = number_of_samples / hop_length = 661500 / 512 = 1291.9921875 = 1292
      I feel like, 2 is more intuitive too. We have X number of samples. In each iteration of calculation we move over 512 samples to the right. How many times do we get to do that? Samples/512. Anyone knows where I am going wrong?

  • @petrosgw5928
    @petrosgw5928 3 ปีที่แล้ว

    That was a greate video sir. I have got just one question :what happens if we use a very swall hop length for instance 2 or 4?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI 3 ปีที่แล้ว

      You would have a lot of redundancy in the data and a greater memory footprint.

  • @mazmaxman1
    @mazmaxman1 3 ปีที่แล้ว

    hello, please if you can provide us the process of the inverse short time fourier transform for overlapped frames, in order to recover the original time domain signal.

  • @zeuspolancosalgado4762
    @zeuspolancosalgado4762 2 ปีที่แล้ว

    You are awesome! I love you!

  • @ahaditab6364
    @ahaditab6364 2 ปีที่แล้ว

    you are a legend!!!!!!!!

  • @snippletrap
    @snippletrap 3 ปีที่แล้ว

    Hop size and frame size are like stride and kernel size in CNNs.

  • @jennas5039
    @jennas5039 2 ปีที่แล้ว

    Hi Valerio, may I notice that the calculation for #frames at 22:22 should return 39 and not 19?
    #frames = (10000 - 1000)/500 + 1 = 39

    • @ganmohim4273
      @ganmohim4273 2 ปีที่แล้ว

      His calculation is correct: (10000 - 1000)/500 + 1 = 19 .🙂

    • @jennas5039
      @jennas5039 2 ปีที่แล้ว

      @@ganmohim4273 Ah, yes. Don't know what I was thinking. Apologies

  • @simenhex1
    @simenhex1 3 ปีที่แล้ว

    Thanks for a great series of videos!
    However, I have a question regarding the resolution trade-off between time and frequency.
    The time part makes sense, but I do not understand why the frequency resolution depends on the frame size.
    Obviously something I am missing here, but in my head the frequency range we can represent does not rely on the number of samples, but the sampling rate, ref. the Nyquist sampling theorem.
    Lets say we have a signal with a sampling rate of 10 samples/sec and we choose a frame size corresponding to 1 second of signal, i.e. 10 samples. Then we can represent frequencies up to max. 5 Hz. If we double the frame size to 2 seconds of signal we now have 20 samples instead of 10. However, the sample rate is still fixed at 10 samples/sec and hence we can still only represent frequencies up to 5Hz...?
    Would appreciate if you (or anyone else) could explain this.

    • @JaskaranSingh-hp3zy
      @JaskaranSingh-hp3zy 2 ปีที่แล้ว

      I have the same doubt!
      I think @22:00 he explained that the frequency bins gives the information about the frequencies present in the (0,Sr/2) range equidistant from each other.
      Bigger the frame size more will be the freq-bins -> more detailed Information about freq
      when frame size becomes the whole wave -> it will become dft and we will have N samples in the range (0,Sr/2).

  • @goku-np5bk
    @goku-np5bk 2 ปีที่แล้ว

    beautiful!

  • @abdouazizdiop8279
    @abdouazizdiop8279 4 ปีที่แล้ว +1

    Thanks Master

  • @smithflores6968
    @smithflores6968 ปีที่แล้ว

    wonderful!!

  • @siddharthsharma2248
    @siddharthsharma2248 2 ปีที่แล้ว

    you used a phrase called 'pure term' at 17:03, what do you mean by that?

    • @Waffano
      @Waffano 2 ปีที่แล้ว

      He said "pure tone"

  • @sarvagyagupta1744
    @sarvagyagupta1744 3 ปีที่แล้ว

    Hey, this is an amazing video. Thanks. I have a question though. Through spectrogram, we know the magnitude and phase of the signal at a given time. So is it possible to reconstruct the signal from that domain?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI 3 ปีที่แล้ว

      Yes, for reconstruction from a complex spectrogram, you would use the inverse short-time Fourier transform.

    • @sarvagyagupta1744
      @sarvagyagupta1744 3 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI But spectrogram is the abs of STFT right? So will we need STFT for the reconstruction or just the spectrogram plot will be enough?

    • @ValerioVelardoTheSoundofAI
      @ValerioVelardoTheSoundofAI 3 ปีที่แล้ว

      @@sarvagyagupta1744 no, the abs of STFT is the magnitude spectrogram, which loses its imaginary part. STFT is complex.

    • @sarvagyagupta1744
      @sarvagyagupta1744 3 ปีที่แล้ว

      @@ValerioVelardoTheSoundofAI but if we take spectogram of STFT, then certain frequencies won't show in the spectogram plot till such time our window doesn't process that, right? So we kinda get some information about the phase of the signal.

  • @jeremyuzan1169
    @jeremyuzan1169 3 ปีที่แล้ว

    thks valerio

  • @artyomgevorgyan7167
    @artyomgevorgyan7167 4 ปีที่แล้ว

    Generally what is the reason for introducing 2 parameters such that one of them reduces the need for the other? I am talking about window size and frame size, in case they are not equal. We could achieve any uniform split just by varying the window size, couldn't we? Just by watching the video, me personally sees no reason for having frame_size != window_size. Of course, I am missing something out, but what?

    • @ericchuhaochan2066
      @ericchuhaochan2066 4 ปีที่แล้ว +2

      For me, the 2 parameters are conceptually different. Frame size is a param is STFT and window size is a param in window function. Pragmatically, it is pointless to assign frame_size != window_size because those samples in between the gap are going to be 0 padding anyways.

    • @artyomgevorgyan7167
      @artyomgevorgyan7167 4 ปีที่แล้ว

      ​@@ericchuhaochan2066 Agree with you now.

  • @bubblefoil
    @bubblefoil 3 ปีที่แล้ว

    Thanks!
    I still don't get the reason for the +1 fft point, though.

  • @SHADABALAM2002
    @SHADABALAM2002 3 ปีที่แล้ว

    what is the k and K in Hann window formula??

    • @nhactrutinh6201
      @nhactrutinh6201 3 ปีที่แล้ว

      sample number and number of samples in window

  • @jeremyuzan1169
    @jeremyuzan1169 3 ปีที่แล้ว

    amazing

  • @aboo1999
    @aboo1999 3 ปีที่แล้ว

    Thanks is not enough!

  • @mutalasuragemohammed6954
    @mutalasuragemohammed6954 ปีที่แล้ว

    beautifully explained. Thank you

  • @Moonwalkerrabhi
    @Moonwalkerrabhi 3 ปีที่แล้ว

    Awesome explanation !

  • @MichelHabib
    @MichelHabib 2 ปีที่แล้ว

    Great Video, Thank you

  • @saranshgokhale8298
    @saranshgokhale8298 2 ปีที่แล้ว

    This is great, thanks a lot!