OpenAI Whisper Demo: Convert Speech to Text in Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ส.ค. 2024

ความคิดเห็น • 92

  • @ThorstenMueller
    @ThorstenMueller ปีที่แล้ว +8

    Thanks for making this helpful video. I really enjoyed watching it.
    Whisper is a huge step forward to local speech recognition.

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Appreciate the feedback. Whisper is pretty impressive.

  • @Ethernick_V2
    @Ethernick_V2 4 หลายเดือนก่อน

    Such an awesome video. I've been looking for a little while now and this is exactly what i'm looking for. Additionally, the way you presented everything was super quick and easy to understand (which i appreciate since I'm currently running a fever lol). Either way, you're a life saver, and I want to thank you so much for all your hard work.

  • @IntenseRouge
    @IntenseRouge ปีที่แล้ว +6

    Great video, thanks Rob! ... I tried the model in German a few times and it worked quite well but not without errors. One time I took an audio example from Hermann Hesse's wonderful book: Narcissus and Goldmund and the model translated 'Narciss' (German for Narcissus) with 'Nazi'. ... so, I will still read and correct the future results before sending them to my boss. ;-)

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Haha. Love the story. Hopefully these models will just continue to get better.

  • @davidliu5112
    @davidliu5112 ปีที่แล้ว +3

    Thanks for this valuable video. You deserve more views and likes

    • @robmulla
      @robmulla  ปีที่แล้ว

      Really appreciate that. Share the video with a friend to spread the word 😊

  • @AlejandroGonzalez-pz7hl
    @AlejandroGonzalez-pz7hl หลายเดือนก่อน

    More content like this please! and thank you for the tutorial

  • @Chris_zacas
    @Chris_zacas ปีที่แล้ว +2

    Hello all ! nice first impression! I ran a 8mins mp3 file and it worked perfectly. I am pretty surprised. q=)

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Great to hear! I've been very impressed by whisper too.

  • @AhsanNawazish
    @AhsanNawazish ปีที่แล้ว +3

    Really nice explanation and demonstration, You sir have a new subscriber (me)

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks. Glad to have you as a subscriber

  • @bujin5455
    @bujin5455 ปีที่แล้ว +2

    Seriously, such an awesome project!!!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you liked it! I appreciate the comment.

  • @Sachin-at
    @Sachin-at ปีที่แล้ว +2

    Da Vinci Resolve needs to use this to generate subtitles 👌

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      I use it to add subtitles to my TH-cam videos. 😎

  • @registeel2000
    @registeel2000 ปีที่แล้ว +9

    Cool video! I want to get this working for live speech-to-text since it is fast enough to run real-time but it seems like since you can't pass in continuous audio you would run into issues where the model would not have the previous output as context and could easily get cut off mid word. Any ideas for how to tackle that issue?

    • @robmulla
      @robmulla  ปีที่แล้ว +9

      That's a great point. You should check out this repo where someone made whisper work with a microphone input: github.com/mallorbc/whisper_mic

  • @dimorischinyui1875
    @dimorischinyui1875 ปีที่แล้ว +4

    Hey guys please can anyone help me with this issue. I am trying to run whisper on my machine and I am getting this error in cmd. UserWarning: FP16 is not supported on CPU; using FP32 instead
    warnings.warn("FP16 is not supported on CPU; using FP32 instead").
    I use a windows 10 with gpu RTX2060. Also it seems it runs on my cpu instead of NVIDIA GPU. I created a python virtual environment and pip installed whisper in that virtual environment just for more details.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Hey Dimoris, unfortunately I don't have a windows machine. It does look like you are using CPU and not GPU. Are you sure you have CUDA installed?

  • @reubenthomas1033
    @reubenthomas1033 ปีที่แล้ว +7

    Hey Medallion! What’s the best way/library to perform text to speech, speech to text and speech to speech translations between languages. I’m from India, so a model that’s capable of a lot of indigenous languages is necessary. And if possible could you make a video about this?

    • @robmulla
      @robmulla  ปีที่แล้ว +5

      Thanks for the comment. There is a text to speech library that uses the google api. This one can be used offline: github.com/nateshmbhat/pyttsx3 - as for the different languages, I think it's going to depend a lot on what is already out there. Are the languages part of the whisper library? If so then that's a good start, it allows for some translation and maybe in the future they will add TTS.

    • @reubenthomas1033
      @reubenthomas1033 ปีที่แล้ว +1

      Thanks!

  • @randomgrrl
    @randomgrrl ปีที่แล้ว +1

    Great video, thanks for sharing!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for watching!

  • @leecloud7070
    @leecloud7070 ปีที่แล้ว +1

    Thx for your kind detail explanation!. Could you explain to me how the improvement of a Whisper model works?
    Do I need text or audio or both?? I would like to improve for the recognition of new words in the specific field I targeted.

  • @anirbanc88
    @anirbanc88 ปีที่แล้ว +1

    best teacher ever!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for saying so Anirban!

  • @theHaloFM02
    @theHaloFM02 ปีที่แล้ว +2

    Do you have any advice for how to fix the 'ModuleNotFoundError: no module named 'torch._C'? I looks around the internet for answers but there's none that works, i even tried different python versions.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Looks like you need to install pytorch. You can do so by running "pip install torch" in your python environment. Good luck!

  • @geoffreybell7223
    @geoffreybell7223 ปีที่แล้ว +5

    Hi Medallion, Thanks for the video.
    I've followed both of your processes, but when I run I get a FileNotFoundError: [WinError 2] The system cannot find the file specified. I've got my test file in the same folder as my main.py. Any ideas what I need to do to get it to work?

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Interesting. You might be referencing it wrong. It needs to be in the same folder as the script. I’d need to see the full stack trace though.

    • @mute888
      @mute888 ปีที่แล้ว +1

      @@robmulla I get the same error

    • @mute888
      @mute888 ปีที่แล้ว +1

      The issue was not installing FFMPEG properly. thanks for great vid

    • @padholikho24
      @padholikho24 6 หลายเดือนก่อน

      @Rob I have the same issue.

  • @spartan112
    @spartan112 ปีที่แล้ว +1

    Great video 👍, just wanted to know in detail how to use this, and i now seen u r video, i 100% understanded. Btw which software or the thing..
    In which you r writing the code ?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Thanks! Im using jupyterlab check my channel for my video on jupyer.

  • @mattd7828
    @mattd7828 ปีที่แล้ว +3

    Thanks for this! I'm fairly new to NLP but already amazed by Whisper. Any idea what the *max_initial_timestamp* argument is used for in the DescribeOptions()? I'm curious to know what the smallest timestamp window is possible to achieve. Anyone know if it's possible to pull timestamps for each word's onset? I'm seeing ranges of 2-5 seconds for defaults on my samples (which are kinda verbose).

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Great question. I don't know too much about the details but I did find it in the source code: github.com/openai/whisper/blob/main/whisper/decoding.py#L97
      It says "the initial timestamp cannot be later than this"

  • @hareeshkumar4492
    @hareeshkumar4492 ปีที่แล้ว +2

    Thanks for providing details. Does it support live streaming audio? Instead of using pre-recorded audio clip can it transcribe the live speech

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Great question. I believe there are some packages out there that can do it near real time, but I haven’t used one myself.

  • @user-bb8ke8ib2q
    @user-bb8ke8ib2q ปีที่แล้ว

    If I load a video in my Python code, and I store it's audio line in a variable, how can I give that variable to the Whisper's transcribe function without saving the audio in a wav or mp3 file?

  • @all-in-one-890
    @all-in-one-890 8 หลายเดือนก่อน

    Sir ive a questionI want to make a program in python such that first it recognize the text from 20 images one by one and, store the last word from the image text and at the same time it should also recognize the audio from a file(which is currently running at its normal pase) through speech recognition and if it found the last from the image text in the audio at 36 seconds from the start . Then it should press a specific key on the keyboard. This thing continues utill the audio finishes.
    Can this be possible by using whisper?

  • @nelsonkayode106
    @nelsonkayode106 ปีที่แล้ว +2

    Hi Rob, thank you for taking the time to share out of the wealth of your knowledge. I tried running the model, and it keeps telling me Numpy not available. I used Pip Install numpy, and I realized that numpy is available. Please, what could the problem be? Thank you. I want to use this for qualitative research. Thank you once again, and I hope to hear from you.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      That’s strange, check your internet connection because that package definitely should be available. Thanks for watching!!

    • @nelsonkayode106
      @nelsonkayode106 ปีที่แล้ว +1

      @@robmulla Thank you, Rob. I posted the question on the Q&A page on GitHub. The issue is my python version. I have 3.10, and Pytorch isn't compatible with any version above 3.9. So I needed to downgrade the python version to allow for Pytorch and Numpy in pytorch.
      Thanks once again.

  • @ArpitaRane-rj1gk
    @ArpitaRane-rj1gk 2 หลายเดือนก่อน

    How is it with respect to data privacy?Does it store our data?

  • @yusufcan1304
    @yusufcan1304 3 หลายเดือนก่อน

    thanks man

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo ปีที่แล้ว +1

    first video seen and subscribed

    • @robmulla
      @robmulla  ปีที่แล้ว

      Love it! Thanks for subscribing.

  • @cbara568
    @cbara568 ปีที่แล้ว +2

    Noob question, but does this work offline, or is it an API call to OpenAI?

    • @robmulla
      @robmulla  ปีที่แล้ว

      This model is completely open sourced so you can download the model and run it offline.

  • @vaibhavgirase3021
    @vaibhavgirase3021 8 หลายเดือนก่อน

    Hello sir, I am trying to transcribe large audio files then it takes more time, I want to transcribe in minimum time, can this possible sir?

  • @bryede
    @bryede ปีที่แล้ว +3

    Can you give it more than 30 seconds of audio or are you forced to break up the source file?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      I believe that since the model was trained on 30 second clips, the audio must be split before processed though the pipeline. However the built in transcribe method handles that for you.

    • @Chris_zacas
      @Chris_zacas ปีที่แล้ว +1

      TRY!

    • @rafhaelsilva9132
      @rafhaelsilva9132 ปีที่แล้ว

      I Just transcribed a q hour long audio file for work... worked like a charm. Took a long time though, but still less time If i transcribed by hand

  • @WolstonLobo
    @WolstonLobo ปีที่แล้ว +1

    Hello! What's the best way to bulk upload mp3 files and convert them to SRT files? I'm assuming whisper does not do srt and does vtt instead.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      I recently used a TH-cam whisper subtitle maker on a live stream. You can watch it on my channel. It did vtt format but it think it also had an option for other formations

  • @iamabot2667
    @iamabot2667 ปีที่แล้ว +2

    I am using MacBook M1 and visual studio, I keep getting "no module named torch". Switch to Jupiter, but then get FP16 is not supported on CPU; using FP32 instead

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      So you got it working?

  • @Hazar-bt6nf
    @Hazar-bt6nf หลายเดือนก่อน

    Can it be Run on raspberry pi5?

  • @imtimjames
    @imtimjames ปีที่แล้ว +2

    Can whisper analyze voice? Like screen and score dialect etc?

    • @robmulla
      @robmulla  ปีที่แล้ว

      I don't believe so....

    • @imtimjames
      @imtimjames ปีที่แล้ว

      @@robmulla just figured it out :)

  • @kevinagbasso4946
    @kevinagbasso4946 ปีที่แล้ว +2

    Hi Medallion! thank you very much. I was expecting that since your last one on Audio data processing in python. Is there a possibility to add a new language? I am currently working on a large audio data set in my mother tongue Fon in West Africa and would like to have some guidance. Best!

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      These models are trained on extremely large datasets for each language- so if you are looking to have something for a language that isn't in the existing list it would be really hard to train that yourself. Maybe reach out to OpenAI and request that language be added in future versions?

  • @vignesh_m_1995
    @vignesh_m_1995 8 หลายเดือนก่อน

    So, Is Whisper used only for Speech to Text and also only in Python?? Any JS support?

  • @manatahir9870
    @manatahir9870 ปีที่แล้ว

    Hi Rob, thanks for sharing this video,
    I am looking for a linrary/ Api that can convert speech to text from a youtube video and then I would combine the video with the translation of the text in another language.
    Do you have any idea how I can do it?
    Is Wissem a good library for that.
    Ps: the video may last more than an hour. Thanks in advance for your help🙏🏼

  • @user-ix8vg1hq8z
    @user-ix8vg1hq8z ปีที่แล้ว

    Hi buddy, can you help with a detail video on Speech to text conversion using python

  • @bastothemax
    @bastothemax ปีที่แล้ว

    Thanks!

  • @BoweryK
    @BoweryK ปีที่แล้ว +1

    Thanks for making this video. I was wondering if you can say the steps to follow to execute these 4 lines of code in GPU. I installed CUDATOOLKIT, NUMBA( I good Graphic card GTX 3050) and followed some examples online, but I failed. Ty and have a great day!

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Hey Hamza. It depends on the operating system you are using. Installing CUDA correctly and having it linked in your global path is usually the hardest part. For me, I followed the instructions on the nvidia website. Then I just pip installed the requirements from the whisper repo. Good luck!

  • @michal5869
    @michal5869 ปีที่แล้ว +1

    it is working with logn files like 2 hours ?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Yes, when it predicts it splits the long audio into smaller chunks but can run on long audio files.

  • @ismaelundahmad
    @ismaelundahmad ปีที่แล้ว +2

    YO Whats up , can you then translate it to another lang like print(Result) but from Englisch to german or other language

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      I don't think whisper can do that type of translation out of the box. Most everything I've seen is translation to english.

    • @ismaelundahmad
      @ismaelundahmad ปีที่แล้ว

      @@robmulla ok thank you sir

  • @billykotsos4642
    @billykotsos4642 ปีที่แล้ว +2

    how big is that model? It has to be huge right?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      It comes in various sizes, from tiny (39M) to large (1.5GB). You can find them listed in the repo here: github.com/openai/whisper#available-models-and-languages

    • @billykotsos4642
      @billykotsos4642 ปีที่แล้ว

      @@robmulla thanks man that helps a lot

  • @olivercarmignani9082
    @olivercarmignani9082 ปีที่แล้ว +2

    is this also realisable in realtime?

    • @robmulla
      @robmulla  ปีที่แล้ว

      I belive so. Check this out: huggingface.co/spaces/Amrrs/openai-whisper-live-transcribe

  • @sandstorm973
    @sandstorm973 ปีที่แล้ว +2

    Github repo?

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Whisper is available here github.com/openai/whisper/

  • @legendawaken5527
    @legendawaken5527 9 หลายเดือนก่อน

    can it be realtime??

  • @hamzaahmad564
    @hamzaahmad564 ปีที่แล้ว +1

    Can this be made into .srt files?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Great question! I haven't done it myself but it looks like others have. Checkout this github discussion someone put together code that might be what you are looking for: github.com/openai/whisper/discussions/98

    • @hamzaahmad564
      @hamzaahmad564 ปีที่แล้ว

      @@robmulla Cheers brother, I'll check that out very soon

  • @vanshbhati6087
    @vanshbhati6087 ปีที่แล้ว +2

    Traceback (most recent call last):
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/site-packages/pyttsx3/__init__.py", line 20, in init
    eng = _activeEngines[driverName]
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/weakref.py", line 137, in __getitem__
    o = self.data[key]()
    KeyError: 'sapi5'
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 31, in
    start(fakepyfile,mainpyfile)
    File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 30, in start
    exec(open(mainpyfile).read(), __main__.__dict__)
    File "", line 2, in
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/site-packages/pyttsx3/__init__.py", line 22, in init
    eng = Engine(driverName, debug)
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/site-packages/pyttsx3/engine.py", line 30, in __init__
    self.proxy = driver.DriverProxy(weakref.proxy(self), driverName, debug)
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/site-packages/pyttsx3/driver.py", line 50, in __init__
    self._module = importlib.import_module(name)
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File "", line 1030, in _gcd_import
    File "", line 1007, in _find_and_load
    File "", line 986, in _find_and_load_unlocked
    File "", line 680, in _load_unlocked
    File "", line 850, in exec_module
    File "", line 228, in _call_with_frames_removed
    File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.9/site-packages/pyttsx3/drivers/sapi5.py", line 1, in
    import comtypes.client # Importing comtypes.client will make the gen subpackage
    ModuleNotFoundError: No module named 'comtypes'
    [Program finished]
    Plz help me with this error

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Oh no. Did you figure it out? Might need to pip install that package.

    • @vanshbhati6087
      @vanshbhati6087 ปีที่แล้ว +1

      @@robmulla i can't figure out the problem plz help
      (I know that the program i written is absolutely fine but i can't understand what the problem is)