Building an Audio Transcription App with OpenAI Whisper and Streamlit
ฝัง
- เผยแพร่เมื่อ 5 พ.ย. 2022
- In this video, I will show you how to build a simple and yet powerful audio transcription app using the recently released Whisper model from OpenAI and Streamlit.
If you liked this video don't forget to like and subscribe! :)
Here are a few affiliate links for the best gadgets for programmers:
Bose Noise Cancelling headphones: amzn.to/3Um2qIR
Logitech MX Master 3 Advanced Wireless Mouse: amzn.to/3DVffUZ
Corsair K55 RGB Keyboard: amzn.to/3zFucIs
- Subscribe!: / @automatalearninglab
- Follow me on Medium: / lucas-soares
- Join Medium: / membership
- Tiktok: www.tiktok.com/@enkrateialucc...
- Twitter: / lucasenkrateia
- LinkedIn: / lucas-soares-969044167
Music "Before Chill" by Yomoti on Epidemic Sound
www.epidemicsound.com/track/v... - วิทยาศาสตร์และเทคโนโลยี
Thank you for NOT editing out any mixups in your coding. It is REALLY helpful to watch others struggle through and figure things out instead of making everything look perfect from the first go. SUBSCRIBED!
Check out the code without THAT MANY mixups here 😂
github.com/EnkrateiaLucca/openai_whisper
Thanks for subscribing though! 😊🎉
Thanks for the video. Also, good on you for having the courage to upload an unedited video.
Yeah I mean I did some edits, but overall I found people appreciate if you publish your process, which is what I was trying to do here.
Nice work!!! it was all I've been looking for working as a court reporter!!! :) Thank you so much
Oh Nice man! glad I was able to help! :) Cheers!
Love your videos thank you man - try working through code first and bringing up pain points for us along the way, it’s a little disengaging to watch that happen the first time
Ok got it. SOme people like this format, some prefer it like you're saying, I've been trying different ways but I guess working through the major pain points first should be a no brainer! Appreciate the feedback, watch my next video coming out next Sunday and tell me what you think! :) Cheers!
thanks sir , but what extra elements should be added to transcript an audio from one language to another ? so basically somebody provides me with some audio file format , I want the app to take that in -and without any editing- the app should be able to transcript in audio as well as text format but in another language let say french - German ?
your input is highly appreciated
I'm pretty sure whisper can accept multiple languages, look up the different models in the whisper documentation from openai,
Thank you! 👏
Thank you!
Is it possible to also do speaker recognition? Do you have a video for it?
i am been trying to find a solution for larger audio files, can you integrate celery with streamlite and have the run in the background?
A good solution I found is using pydub if I am not mistaken to break the large audio files into chunks and then apply on those and then concatenate the result!
Is there any way we can make it so that the text will dynamically highlight each word as it is played through the st.audio
Not sure! I haven't tried highlighting each word like that before. Sorry could not help more! :( Thanks for watching! :)
do you have a github repo for this or somewhere where the code is that you used to build this?
Yep, it's here github.com/EnkrateiaLucca/openai_whisper
This was awesome! Quick question - is there any way that streamlit would receive voice input , rather than uploading an audio file? The workflow I'm thinking of is 1) user presses "record audio" on streamlit 2) once finished, the generated audio output will be passed to whisper 3) whisper transcribes. I've been researching how to incorporate audio input into streamlit for a while to no avail,
I am not sure if Streamlit takes in direct audio input (GPT-4 says no...LoL) but apparently gradio can! I wrote some boilerplate code for you to test:
import gradio as gr
import openai
import numpy as np
# Set OpenAI key
openai.api_key = 'your-openai-api-key'
def transcribe_audio(audio):
"""
This function transcribes audio using OpenAI's Whisper API
"""
# You might need to convert the audio into a suitable format for Whisper
# Convert to suitable audio format (like .wav)
# Make the API request
# Here we assume whisper_asr is a hypothetical function that performs the transcription.
transcription = whisper_asr(audio)
# Return the transcription
return transcription
iface = gr.Interface(fn=transcribe_audio, inputs=gr.inputs.Audio(source="microphone"), outputs="text")
iface.launch()
Let me know if it works! (replace the whisper_asr stuff with the proper call to whisper)
Cheers!! Thanks for watching!
@@automatalearninglab Thanks! I tried implementing the following code, and I got few errors I would greatly appreciate your input on!
Code:
```
def transcribe(audio):
file = open(audio, "rb")
if file is None:
return ""
with file as f:
t_text = openai.Audio.transcribe(
model="whisper-1",
file=f,
api_key=OPENAI_API_KEY
)
return t_text["text"]
gr.Interface(
title = 'Medical Scriber',
fn=transcribe,
inputs=[gr.Audio(source="microphone", type="filepath")],
outputs=["text"]
).launch()
```
I get the following errors:
1) When I record audio, then pass it to Whisper API, I get the following message pop up in terminal, though the transcription works: "UserWarning: Trying to convert audio automatically from int32 to 16-bit int format."
2) Whisper API has a 25 Mb limit on the file size. I recorded a 5 minute audio snippet (2.5 Mb), and I got a "size limit exceeded" error. More info: github.com/openai/whisper/discussions/1385
Any suggestions?
@@awa8766 Hey! I think you’ll find everything you need in this more recent video where I did a slight update on this project here: th-cam.com/video/H3s5fx7CsZg/w-d-xo.html
Thanks for watching! :) Cheers!
Thanks for sharing! ^_^
Nice video my friend! Good job and nice relaxing music :)
I want to ask if you have any idea how can we create a real time speech recogniton app with whisper.
Thanks! Great question! :) I'm not sure how to reduce the latency of these models to make it work on real time but hugging face seems to have a working demo here: www.google.com/url?sa=t&source=web&rct=j&url=huggingface.co/spaces/anzorq/openai_whisper_stt&ved=2ahUKEwj8vKLJs637AhWqhv0HHeSyAK0QFnoECAsQAQ&usg=AOvVaw1KGFMD_qray96CgiXAgMb6
@@automatalearninglab Thanks for the answer! I will check it out
interesting and good explanation thanks. i wish you made it also for real time transcription using the Mcirophone if its possible
its been on my mind actually, I did something with whisper cpp a while back. Will probably take a crack at this real time audio transcription stuff soon! :) thanks for watching
It would be nice to have it as free ready-to-use web online. Not everyone is a programmer.
Right but that involves some work. I do want to have something like that running soon!
Thanks for great tutorial. Problem is that on the laptop working fine just with tiny and base model. More huge models have problems with memory. All solutions for fixiing from documentantion dont work
Yes, in this case you can host the app in a cloud with a better machine with more memory, or try to make your use case acceptable with the tiny model!
Hey bro If I need to detect the language of the voice how should I do it. I meant what is the modification which I should do for the code?
WHen you're loading the base model make sure to add the initial of your target language
is it possible to make it possible to perform actions after transcribing like in SIRI?
yeah of course you just go to add that to the workflow in the script.
Very nice video thanks it looks like a really cool project😊. Do you think after having the text to be able to move it to chat gpt in order to get some good class notes?
Yeah of course!
@@automatalearninglab Sorry I was trying to follow your video and just installed Vs studio. and cant run even the start, can you maybe upload or guide me how to setup VS code in order to run the codes? I don't get why I cant sorry I know these comment might be frustrating, but I'm just starting to code outside of repplit or google collab :(
@@eduardogamboa7209 checkout this article about how to set up vscode for machine learning. copyassignment.com/machine-learning-in-visual-studio-code/
Nice work, can you do somthing like that with whisper-jax ?
I haven't looked into it, but I'll take a look!
Thanks! :)
Great! Thank you. Where can we find the source code to try it?
Here github.com/EnkrateiaLucca/openai_whisper
After a lot of people asking for this! I finally created a proper repo with the code! enjoy!
Is there a limit size / length on audio ? You are using api key ?
Yeah, right now it just supports file sizes of up to 25mb and audios up to 30s
How to overcome these limitations?
@@automatalearninglab
Is this possible if x number of individuals speaking can it identify them? Speaker 1 up to n?
I don't think so I think it will only transcribe as the same voice in a stream
same name same duobt, that process is called speaker diarization, yes it is possible with a custom classification model integrated to this workflow!
Hi! Can Whisper transcribe MP3 greater than 30 seconds? If yes, can you share the code? Thanks!
Ill work on something like that soon stay tuned! :)
How can we add an option to translate the transcribed text? The AI has that capability
No, not this one.This is just for transcription but you can use openai gpt3 based translation model for the rest (almost 100% sure but check it!)
Cheers :)
can we get real time transcription? say up to certain lengths in time?
I haven't played with real time applications yet so o could not say right now
Do you know maybe some alternatives to Streamlit? I am curious which you were already used :)
Probably Gradio would be the first to come to mind, also there are many other options coming up now but I haven't been looking into it that much, I usually use the terminal for most things. :)
@@automatalearninglabI can relate :) I know Gradio and Streamlit are good frameworks for Machine Learning apps. Recently I was using collab paired with Anvil to create something. It was also nice and easy.
This is a great step-by-step tutorial. Where do I find the Whisper documentation to know what language and syntax it uses with Python? I want to be able to add features and functionality. I saw you had a openai_whisper_tutorial.ipynb - is that the official documentation to building whisper apps in Python?
Check out the openai documentation which has the official docs! :)
@@automatalearninglab Hey I'm really sorry to ask - I'm not being lazy here. I'm new to coding so a lot of this isn't obvious to me.
The Github OpenAI whisper documentation only has a few scrappy examples of the language under the "Python Usage". It's far less robust than proper documentation that showcases how to leverage all of the whisper functionality.
Is there anywhere else I could see the Python translation for all the functionality? For example, using whisper on terminal produces srt/txt/vtt etc. files, but there's no standard script to show how to create .srt files in a .py script. I had to look at how other people created .srt files and they didn't reference documentation either.
Sorry for the long question.
@@blackhat965 not sure, check out the GitHub repo for whisper!
Any recommendation for hosting website where I can deploy this app?
Not really, I haven't hosted it so I couldn't tell yah.
@@automatalearninglab Thanks. I do get the following error when I load an audio file. Any suggestion on how to fix this?
AttributeError: module 'ffmpeg' has no attribute 'Error'
@@batosato Sup man, yeah, you should install the ffmpeg module! For windows: phoenixnap.com/kb/ffmpeg-windows
FOr linux: phoenixnap.com/kb/install-ffmpeg-ubuntu
Cheers man!
thanks for this tutorial 1st , but I got this error "FileNotFoundError: [WinError 2] The system cannot find the file specified"
Your welcome, check out the discussion below where we walked about this! Cheers :)
@@automatalearninglab thanks for your response, i solved with run the VS code as administrator
@@hedinaouara5625 nice!
how about diarization of multiple speakers to classify them?
This one does not have it, I was playing around with that using just GPT4 api and it works quite well.
@@automatalearninglab one more thingy, found any solutions to process real time voice rather than basic audio file input?
I am still getting this error please help FileNotFoundError: [WinError 2] The system cannot find the file specified
Did you point to the path of the file?
hii Did u solved the error
please help me to solve
FileNotFoundError: [WinError 2] The system cannot find the file specified
Facing this error can anyone please help me to solve the error
Feed the right path (file to the app) I think thats the issue
does the code works in streamlit cloud
I have no idea!
i want that it should only recognise english language is it possible?
Yep
Any particular reason no Github repo, dont wanna share code?
No reason at all, the code is here: github.com/EnkrateiaLucca/openai_whisper
I was just lazy before LoL
@@automatalearninglab Thx Good work btw!
why don't you give app link I will use it for transcription
There is no app link, just the code for you to run it yourself. Sorry :(, thanks for watching! Cheers! :)
@@automatalearninglab OK
can you add timestamp and diarization to your app ?
Timestamp yes, diarization not sure. THere is a github repo called openwhisper timestamp that's pretty good.
Try to combine it with pyannote lib for diaritization ,i will be very happy if you do so 😄
nice!@@spider279
@@automatalearninglab Do you know whisper jax diarization ? if yes have you ever tested it ?
I'm still getting a FileNotFound error
That's weird, did you upload a file present in your machine?
@@automatalearninglab ofcourse and I'm in the same directory where my code and file is
Have tried many things but doesn't work
Can u provide me your code please because i need to create this project asap 🙂🙂
@@rajkumarsingh8862 Sure! here it is:
import streamlit as st
import whisper
st.title("Whisper App")
# upload audio file with streamlit
audio_file = st.file_uploader("Upload Audio", type=["wav", "mp3", "m4a"])
model = whisper.load_model("base")
st.text("Whisper Model Loaded")
if st.sidebar.button("Transcribe Audio"):
if audio_file is not None:
st.sidebar.success("Transcribing Audio")
transcription = model.transcribe(audio_file.name)
st.sidebar.success("Transcription Complete")
st.markdown(transcription["text"])
else:
st.sidebar.error("Please upload an audio file")
st.sidebar.header("Play Original Audio File")
st.sidebar.audio(audio_file)
@@automatalearninglab brother it's giving me error the system cannot find the file specified
Winapi.createprocess errors
Please help can we chat on somewhere
@@automatalearninglab i just want yo ask you that uow can i host or deploy this web app please tell me 🙂❤️
I got the error. Tell me how to solve pls))
2023-09-04 18:13:53.734 Uncaught app exception
Traceback (most recent call last):
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "/mount/src/ai/app.py", line 2, in
import whisper
ModuleNotFoundError: No module named 'whisper'
pip install openai-whisper
@@automatalearninglab will we need an api key from open ai?
yep@@forexhunter2040
I see, it is the reason i have been facing issues with running the app successful. thanks for quick reply@@automatalearninglab
sub number 541