Building an Audio Transcription App with OpenAI Whisper and Streamlit

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 พ.ย. 2022
  • In this video, I will show you how to build a simple and yet powerful audio transcription app using the recently released Whisper model from OpenAI and Streamlit.
    If you liked this video don't forget to like and subscribe! :)
    Here are a few affiliate links for the best gadgets for programmers:
    Bose Noise Cancelling headphones: amzn.to/3Um2qIR
    Logitech MX Master 3 Advanced Wireless Mouse: amzn.to/3DVffUZ
    Corsair K55 RGB Keyboard: amzn.to/3zFucIs
    - Subscribe!: / @automatalearninglab
    - Follow me on Medium: / lucas-soares
    - Join Medium: / membership
    - Tiktok: www.tiktok.com/@enkrateialucc...
    - Twitter: / lucasenkrateia
    - LinkedIn: / lucas-soares-969044167
    Music "Before Chill" by Yomoti on Epidemic Sound
    www.epidemicsound.com/track/v...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 130

  • @abroniewski
    @abroniewski ปีที่แล้ว +11

    Thank you for NOT editing out any mixups in your coding. It is REALLY helpful to watch others struggle through and figure things out instead of making everything look perfect from the first go. SUBSCRIBED!

    • @automatalearninglab
      @automatalearninglab  11 หลายเดือนก่อน +3

      Check out the code without THAT MANY mixups here 😂
      github.com/EnkrateiaLucca/openai_whisper
      Thanks for subscribing though! 😊🎉

  • @marcoaerlic2576
    @marcoaerlic2576 หลายเดือนก่อน +1

    Thanks for the video. Also, good on you for having the courage to upload an unedited video.

    • @automatalearninglab
      @automatalearninglab  หลายเดือนก่อน +1

      Yeah I mean I did some edits, but overall I found people appreciate if you publish your process, which is what I was trying to do here.

  • @gacevedobastias
    @gacevedobastias ปีที่แล้ว +3

    Nice work!!! it was all I've been looking for working as a court reporter!!! :) Thank you so much

  • @tgard007
    @tgard007 5 หลายเดือนก่อน +1

    Love your videos thank you man - try working through code first and bringing up pain points for us along the way, it’s a little disengaging to watch that happen the first time

    • @automatalearninglab
      @automatalearninglab  5 หลายเดือนก่อน

      Ok got it. SOme people like this format, some prefer it like you're saying, I've been trying different ways but I guess working through the major pain points first should be a no brainer! Appreciate the feedback, watch my next video coming out next Sunday and tell me what you think! :) Cheers!

  • @mikiallen7733
    @mikiallen7733 ปีที่แล้ว +1

    thanks sir , but what extra elements should be added to transcript an audio from one language to another ? so basically somebody provides me with some audio file format , I want the app to take that in -and without any editing- the app should be able to transcript in audio as well as text format but in another language let say french - German ?
    your input is highly appreciated

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      I'm pretty sure whisper can accept multiple languages, look up the different models in the whisper documentation from openai,

  • @marcelogarcia6981
    @marcelogarcia6981 ปีที่แล้ว +1

    Thank you! 👏

  • @hamzahbaagil9828
    @hamzahbaagil9828 18 วันที่ผ่านมา +1

    Is it possible to also do speaker recognition? Do you have a video for it?

  • @swelanauguste6176
    @swelanauguste6176 9 หลายเดือนก่อน +1

    i am been trying to find a solution for larger audio files, can you integrate celery with streamlite and have the run in the background?

    • @automatalearninglab
      @automatalearninglab  9 หลายเดือนก่อน +2

      A good solution I found is using pydub if I am not mistaken to break the large audio files into chunks and then apply on those and then concatenate the result!

  • @rushilpatel702
    @rushilpatel702 8 หลายเดือนก่อน +1

    Is there any way we can make it so that the text will dynamically highlight each word as it is played through the st.audio

    • @automatalearninglab
      @automatalearninglab  8 หลายเดือนก่อน +1

      Not sure! I haven't tried highlighting each word like that before. Sorry could not help more! :( Thanks for watching! :)

  • @naturallydope247
    @naturallydope247 ปีที่แล้ว +2

    do you have a github repo for this or somewhere where the code is that you used to build this?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      Yep, it's here github.com/EnkrateiaLucca/openai_whisper

  • @awa8766
    @awa8766 ปีที่แล้ว +3

    This was awesome! Quick question - is there any way that streamlit would receive voice input , rather than uploading an audio file? The workflow I'm thinking of is 1) user presses "record audio" on streamlit 2) once finished, the generated audio output will be passed to whisper 3) whisper transcribes. I've been researching how to incorporate audio input into streamlit for a while to no avail,

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      I am not sure if Streamlit takes in direct audio input (GPT-4 says no...LoL) but apparently gradio can! I wrote some boilerplate code for you to test:
      import gradio as gr
      import openai
      import numpy as np
      # Set OpenAI key
      openai.api_key = 'your-openai-api-key'
      def transcribe_audio(audio):
      """
      This function transcribes audio using OpenAI's Whisper API
      """
      # You might need to convert the audio into a suitable format for Whisper
      # Convert to suitable audio format (like .wav)
      # Make the API request
      # Here we assume whisper_asr is a hypothetical function that performs the transcription.
      transcription = whisper_asr(audio)
      # Return the transcription
      return transcription
      iface = gr.Interface(fn=transcribe_audio, inputs=gr.inputs.Audio(source="microphone"), outputs="text")
      iface.launch()
      Let me know if it works! (replace the whisper_asr stuff with the proper call to whisper)
      Cheers!! Thanks for watching!

    • @awa8766
      @awa8766 ปีที่แล้ว +1

      @@automatalearninglab Thanks! I tried implementing the following code, and I got few errors I would greatly appreciate your input on!
      Code:
      ```
      def transcribe(audio):
      file = open(audio, "rb")

      if file is None:
      return ""

      with file as f:
      t_text = openai.Audio.transcribe(
      model="whisper-1",
      file=f,
      api_key=OPENAI_API_KEY
      )
      return t_text["text"]
      gr.Interface(
      title = 'Medical Scriber',
      fn=transcribe,
      inputs=[gr.Audio(source="microphone", type="filepath")],
      outputs=["text"]
      ).launch()
      ```
      I get the following errors:
      1) When I record audio, then pass it to Whisper API, I get the following message pop up in terminal, though the transcription works: "UserWarning: Trying to convert audio automatically from int32 to 16-bit int format."
      2) Whisper API has a 25 Mb limit on the file size. I recorded a 5 minute audio snippet (2.5 Mb), and I got a "size limit exceeded" error. More info: github.com/openai/whisper/discussions/1385
      Any suggestions?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      @@awa8766 Hey! I think you’ll find everything you need in this more recent video where I did a slight update on this project here: th-cam.com/video/H3s5fx7CsZg/w-d-xo.html
      Thanks for watching! :) Cheers!

  • @stoufa
    @stoufa 6 หลายเดือนก่อน

    Thanks for sharing! ^_^

  • @user-xi8cq9zr4c
    @user-xi8cq9zr4c ปีที่แล้ว +1

    Nice video my friend! Good job and nice relaxing music :)
    I want to ask if you have any idea how can we create a real time speech recogniton app with whisper.

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      Thanks! Great question! :) I'm not sure how to reduce the latency of these models to make it work on real time but hugging face seems to have a working demo here: www.google.com/url?sa=t&source=web&rct=j&url=huggingface.co/spaces/anzorq/openai_whisper_stt&ved=2ahUKEwj8vKLJs637AhWqhv0HHeSyAK0QFnoECAsQAQ&usg=AOvVaw1KGFMD_qray96CgiXAgMb6

    • @user-xi8cq9zr4c
      @user-xi8cq9zr4c ปีที่แล้ว

      @@automatalearninglab Thanks for the answer! I will check it out

  • @uen1857
    @uen1857 4 หลายเดือนก่อน +1

    interesting and good explanation thanks. i wish you made it also for real time transcription using the Mcirophone if its possible

    • @automatalearninglab
      @automatalearninglab  4 หลายเดือนก่อน +1

      its been on my mind actually, I did something with whisper cpp a while back. Will probably take a crack at this real time audio transcription stuff soon! :) thanks for watching

  • @fredsakay994
    @fredsakay994 2 หลายเดือนก่อน +1

    It would be nice to have it as free ready-to-use web online. Not everyone is a programmer.

    • @automatalearninglab
      @automatalearninglab  2 หลายเดือนก่อน +1

      Right but that involves some work. I do want to have something like that running soon!

  • @aldya1532
    @aldya1532 5 หลายเดือนก่อน +1

    Thanks for great tutorial. Problem is that on the laptop working fine just with tiny and base model. More huge models have problems with memory. All solutions for fixiing from documentantion dont work

    • @automatalearninglab
      @automatalearninglab  5 หลายเดือนก่อน

      Yes, in this case you can host the app in a cloud with a better machine with more memory, or try to make your use case acceptable with the tiny model!

  • @ravindumihiranga6165
    @ravindumihiranga6165 7 หลายเดือนก่อน +1

    Hey bro If I need to detect the language of the voice how should I do it. I meant what is the modification which I should do for the code?

    • @automatalearninglab
      @automatalearninglab  7 หลายเดือนก่อน

      WHen you're loading the base model make sure to add the initial of your target language

  • @Nursultan_karazhigit
    @Nursultan_karazhigit 6 หลายเดือนก่อน +1

    is it possible to make it possible to perform actions after transcribing like in SIRI?

    • @automatalearninglab
      @automatalearninglab  6 หลายเดือนก่อน

      yeah of course you just go to add that to the workflow in the script.

  • @eduardogamboa7209
    @eduardogamboa7209 ปีที่แล้ว +1

    Very nice video thanks it looks like a really cool project😊. Do you think after having the text to be able to move it to chat gpt in order to get some good class notes?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      Yeah of course!

    • @eduardogamboa7209
      @eduardogamboa7209 ปีที่แล้ว

      @@automatalearninglab Sorry I was trying to follow your video and just installed Vs studio. and cant run even the start, can you maybe upload or guide me how to setup VS code in order to run the codes? I don't get why I cant sorry I know these comment might be frustrating, but I'm just starting to code outside of repplit or google collab :(

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@eduardogamboa7209 checkout this article about how to set up vscode for machine learning. copyassignment.com/machine-learning-in-visual-studio-code/

  • @jakubfronczyk2496
    @jakubfronczyk2496 ปีที่แล้ว +1

    Nice work, can you do somthing like that with whisper-jax ?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      I haven't looked into it, but I'll take a look!
      Thanks! :)

  • @incredibleG007
    @incredibleG007 ปีที่แล้ว +1

    Great! Thank you. Where can we find the source code to try it?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      Here github.com/EnkrateiaLucca/openai_whisper
      After a lot of people asking for this! I finally created a proper repo with the code! enjoy!

  • @tomasdemarcos570
    @tomasdemarcos570 ปีที่แล้ว +1

    Is there a limit size / length on audio ? You are using api key ?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      Yeah, right now it just supports file sizes of up to 25mb and audios up to 30s

    • @meditatepositivity5111
      @meditatepositivity5111 10 หลายเดือนก่อน

      How to overcome these limitations?
      @@automatalearninglab

  • @joelmartinez7628
    @joelmartinez7628 ปีที่แล้ว +2

    Is this possible if x number of individuals speaking can it identify them? Speaker 1 up to n?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      I don't think so I think it will only transcribe as the same voice in a stream

    • @JoEl-jx7dm
      @JoEl-jx7dm 3 หลายเดือนก่อน

      same name same duobt, that process is called speaker diarization, yes it is possible with a custom classification model integrated to this workflow!

  • @SportyxChannel
    @SportyxChannel ปีที่แล้ว +1

    Hi! Can Whisper transcribe MP3 greater than 30 seconds? If yes, can you share the code? Thanks!

  • @PenguLuna
    @PenguLuna ปีที่แล้ว +1

    How can we add an option to translate the transcribed text? The AI has that capability

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +2

      No, not this one.This is just for transcription but you can use openai gpt3 based translation model for the rest (almost 100% sure but check it!)
      Cheers :)

  • @rubibeats
    @rubibeats ปีที่แล้ว +1

    can we get real time transcription? say up to certain lengths in time?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +2

      I haven't played with real time applications yet so o could not say right now

  • @TalkingWithBots
    @TalkingWithBots 4 หลายเดือนก่อน +1

    Do you know maybe some alternatives to Streamlit? I am curious which you were already used :)

    • @automatalearninglab
      @automatalearninglab  4 หลายเดือนก่อน +1

      Probably Gradio would be the first to come to mind, also there are many other options coming up now but I haven't been looking into it that much, I usually use the terminal for most things. :)

    • @TalkingWithBots
      @TalkingWithBots 4 หลายเดือนก่อน

      @@automatalearninglabI can relate :) I know Gradio and Streamlit are good frameworks for Machine Learning apps. Recently I was using collab paired with Anvil to create something. It was also nice and easy.

  • @blackhat965
    @blackhat965 ปีที่แล้ว +1

    This is a great step-by-step tutorial. Where do I find the Whisper documentation to know what language and syntax it uses with Python? I want to be able to add features and functionality. I saw you had a openai_whisper_tutorial.ipynb - is that the official documentation to building whisper apps in Python?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      Check out the openai documentation which has the official docs! :)

    • @blackhat965
      @blackhat965 ปีที่แล้ว +1

      @@automatalearninglab Hey I'm really sorry to ask - I'm not being lazy here. I'm new to coding so a lot of this isn't obvious to me.
      The Github OpenAI whisper documentation only has a few scrappy examples of the language under the "Python Usage". It's far less robust than proper documentation that showcases how to leverage all of the whisper functionality.
      Is there anywhere else I could see the Python translation for all the functionality? For example, using whisper on terminal produces srt/txt/vtt etc. files, but there's no standard script to show how to create .srt files in a .py script. I had to look at how other people created .srt files and they didn't reference documentation either.
      Sorry for the long question.

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      @@blackhat965 not sure, check out the GitHub repo for whisper!

  • @batosato
    @batosato ปีที่แล้ว +1

    Any recommendation for hosting website where I can deploy this app?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      Not really, I haven't hosted it so I couldn't tell yah.

    • @batosato
      @batosato ปีที่แล้ว +1

      @@automatalearninglab Thanks. I do get the following error when I load an audio file. Any suggestion on how to fix this?
      AttributeError: module 'ffmpeg' has no attribute 'Error'

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@batosato Sup man, yeah, you should install the ffmpeg module! For windows: phoenixnap.com/kb/ffmpeg-windows
      FOr linux: phoenixnap.com/kb/install-ffmpeg-ubuntu
      Cheers man!

  • @hedinaouara5625
    @hedinaouara5625 ปีที่แล้ว +2

    thanks for this tutorial 1st , but I got this error "FileNotFoundError: [WinError 2] The system cannot find the file specified"

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      Your welcome, check out the discussion below where we walked about this! Cheers :)

    • @hedinaouara5625
      @hedinaouara5625 ปีที่แล้ว +1

      @@automatalearninglab thanks for your response, i solved with run the VS code as administrator

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      @@hedinaouara5625 nice!

  • @JoEl-jx7dm
    @JoEl-jx7dm 3 หลายเดือนก่อน +1

    how about diarization of multiple speakers to classify them?

    • @automatalearninglab
      @automatalearninglab  3 หลายเดือนก่อน +1

      This one does not have it, I was playing around with that using just GPT4 api and it works quite well.

    • @JoEl-jx7dm
      @JoEl-jx7dm 3 หลายเดือนก่อน

      @@automatalearninglab one more thingy, found any solutions to process real time voice rather than basic audio file input?

  • @ravindrakarande59
    @ravindrakarande59 11 หลายเดือนก่อน +1

    I am still getting this error please help FileNotFoundError: [WinError 2] The system cannot find the file specified

    • @automatalearninglab
      @automatalearninglab  11 หลายเดือนก่อน

      Did you point to the path of the file?

    • @chinnibngrm272
      @chinnibngrm272 2 หลายเดือนก่อน

      hii Did u solved the error
      please help me to solve

  • @chinnibngrm272
    @chinnibngrm272 2 หลายเดือนก่อน +1

    FileNotFoundError: [WinError 2] The system cannot find the file specified
    Facing this error can anyone please help me to solve the error

    • @automatalearninglab
      @automatalearninglab  2 หลายเดือนก่อน

      Feed the right path (file to the app) I think thats the issue

  • @udaykumarbilla6436
    @udaykumarbilla6436 ปีที่แล้ว +1

    does the code works in streamlit cloud

  • @jaybhuva5531
    @jaybhuva5531 4 หลายเดือนก่อน +1

    i want that it should only recognise english language is it possible?

  • @bingolio
    @bingolio ปีที่แล้ว +1

    Any particular reason no Github repo, dont wanna share code?

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +2

      No reason at all, the code is here: github.com/EnkrateiaLucca/openai_whisper
      I was just lazy before LoL

    • @bingolio
      @bingolio 11 หลายเดือนก่อน

      @@automatalearninglab Thx Good work btw!

  • @satisfyingartwork6839
    @satisfyingartwork6839 ปีที่แล้ว +1

    why don't you give app link I will use it for transcription

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว

      There is no app link, just the code for you to run it yourself. Sorry :(, thanks for watching! Cheers! :)

    • @satisfyingartwork6839
      @satisfyingartwork6839 ปีที่แล้ว

      @@automatalearninglab OK

  • @spider279
    @spider279 6 หลายเดือนก่อน +1

    can you add timestamp and diarization to your app ?

    • @automatalearninglab
      @automatalearninglab  6 หลายเดือนก่อน +1

      Timestamp yes, diarization not sure. THere is a github repo called openwhisper timestamp that's pretty good.

    • @spider279
      @spider279 6 หลายเดือนก่อน +1

      Try to combine it with pyannote lib for diaritization ,i will be very happy if you do so 😄

    • @automatalearninglab
      @automatalearninglab  6 หลายเดือนก่อน +1

      nice!@@spider279

    • @spider279
      @spider279 6 หลายเดือนก่อน

      @@automatalearninglab Do you know whisper jax diarization ? if yes have you ever tested it ?

  • @rajkumarsingh8862
    @rajkumarsingh8862 ปีที่แล้ว +2

    I'm still getting a FileNotFound error

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      That's weird, did you upload a file present in your machine?

    • @rajkumarsingh8862
      @rajkumarsingh8862 ปีที่แล้ว +1

      @@automatalearninglab ofcourse and I'm in the same directory where my code and file is
      Have tried many things but doesn't work
      Can u provide me your code please because i need to create this project asap 🙂🙂

    • @automatalearninglab
      @automatalearninglab  ปีที่แล้ว +1

      @@rajkumarsingh8862 Sure! here it is:
      import streamlit as st
      import whisper
      st.title("Whisper App")
      # upload audio file with streamlit
      audio_file = st.file_uploader("Upload Audio", type=["wav", "mp3", "m4a"])
      model = whisper.load_model("base")
      st.text("Whisper Model Loaded")
      if st.sidebar.button("Transcribe Audio"):
      if audio_file is not None:
      st.sidebar.success("Transcribing Audio")
      transcription = model.transcribe(audio_file.name)
      st.sidebar.success("Transcription Complete")
      st.markdown(transcription["text"])
      else:
      st.sidebar.error("Please upload an audio file")
      st.sidebar.header("Play Original Audio File")
      st.sidebar.audio(audio_file)

    • @rajkumarsingh8862
      @rajkumarsingh8862 ปีที่แล้ว

      @@automatalearninglab brother it's giving me error the system cannot find the file specified
      Winapi.createprocess errors
      Please help can we chat on somewhere

    • @rajkumarsingh8862
      @rajkumarsingh8862 ปีที่แล้ว +2

      @@automatalearninglab i just want yo ask you that uow can i host or deploy this web app please tell me 🙂❤️

  • @adesigne
    @adesigne 10 หลายเดือนก่อน +2

    I got the error. Tell me how to solve pls))
    2023-09-04 18:13:53.734 Uncaught app exception
    Traceback (most recent call last):
    File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
    File "/mount/src/ai/app.py", line 2, in
    import whisper
    ModuleNotFoundError: No module named 'whisper'

    • @automatalearninglab
      @automatalearninglab  10 หลายเดือนก่อน

      pip install openai-whisper

    • @forexhunter2040
      @forexhunter2040 7 หลายเดือนก่อน +1

      ​@@automatalearninglab will we need an api key from open ai?

    • @automatalearninglab
      @automatalearninglab  7 หลายเดือนก่อน

      yep@@forexhunter2040

    • @forexhunter2040
      @forexhunter2040 7 หลายเดือนก่อน

      I see, it is the reason i have been facing issues with running the app successful. thanks for quick reply@@automatalearninglab

  • @zocio_
    @zocio_ ปีที่แล้ว +1

    sub number 541