Thanks ! I have so may gaps in Docker and how it works. I learned so much! I am working on getting something like your TH-cam-Video-Summarization-App but for youtube downloader so it can eat any media not just youtube
Great video, thanks very much! I'm looking to deploy Whisper for an app I'm working on which will require multiple transcriptions of small audio chunks to take place concurrently. If I were to deploy your solution on EC2, what sort of specs would I need?
why. just have the API feed the chunks unless you have massive rig it's likly better to send the audio to whisper one at a time or use something like ffmpeg to join them together or transcode them even. also do cool stuff like automatic UMM and UHH removal etc ... (WIP) you don't want to do any of this in the cloud unless it's 1off ... I don't know of any GPU service that pays you per usage insted of per uptime IE if the GPU is on then they charge you.... I had to get a 3090 $700 to ensure my services run and I don't have to worry about anything going wrong.
thank you so much! One question, in the first version of whisper you couldn't do a translation from English to Spanish. You could only do a .transcribe of one language or another but not the translation. Do you know if whisper v3 can now do translations from English to Spanish? or any updated whisperX or any options? In truth, where I want to use it the most is, for example, translating your videos since the TH-cam translator is very bad and it is difficult to follow you. If possible, could you make a video? ;)
Is it possible to use this for real-time streaming? My goal is to see if it's better than Chrome's captions. I want it to take the audio from the browser and transcribe it. Then, using ChatGPT, translate it to another language (Spanish to English). If I speak through the mic, I want it to do the same thing.
I notice you're pushing the audio file via http post method. Is there anyway to pull the file from a given location? i.e. from AWS S3 bucket, file system etc...
Hey , you created video on the text to image API in past , so can we able to create API that can use checkpoint from civitai , like able to use multiple checkpoint , models and able to call that API ? Is it possible ?
Why would you comment on a video off topic ? respond to his orig video then ... not here .. just kagi search for stable diffustoin API or just use OpenWeb UI and write a tool or Pipe for it ...
there 1000 GPU servies out there but like above .. you pay not per usage but for the image being on so you end up spending money for not using the service
Someone know how to handle myltiples requests and running in differents GPU sockets? Because I have four GPU in the server but the model and FastAPI only use one GPU (number 0)
When I run in Postman in headers I put Content-Type: multipart/form-data and in the Body I put Key as "files" and for Value I upload the .wav file. For some reason I get files: undefined Maybe on Mac I'm supposed to do something different?
I got the same error. Because I called Files to the parameter and is mandatory (from FastAPI documentation) to call "file" the parameter in the function. file: UploadFile Then, you can access to the file: File = file.file
@@AIAnytime I figured it out finally. There were some issues in the newer version of openai-whisper package. fastapi==0.78.0 uvicorn[standard]==0.23.2 aiofiles==23.2.1 python-multipart==0.0.6 torch==2.0.1 openai-whisper==20230314 tiktoken==0.3.1
How can I run with GPU. Currently when I run a container, the code line "DEVICE = "cuda" if torch.cuda.is_available() else "cpu"" the DEVICE is "cpu" though my computer has gpu. Thanks.
Absolute quality content. So informative and I love how every step is explained in great detail.
Glad you liked it!
Thanks ! I have so may gaps in Docker and how it works. I learned so much! I am working on getting something like your TH-cam-Video-Summarization-App but for youtube downloader so it can eat any media not just youtube
Liked the idea...
You are incredible. Can we get more of end to end projects involving Docker
Thanks... you can watch this as well. th-cam.com/video/7CeAJ0EbzDA/w-d-xo.html
Solely judging from the title this is exactly what i need. I hope it works as I expect :D gonna keep watching
Thanks 👍
Great explanations, thank you so much for the tutorial!
You're very welcome!
Thanks for the demo and info, very informative and precise. I truly appreciate it. Easy to deploy. Have a great day.
Glad it was helpful!
Outstanding!
thank you
Great video, thanks very much! I'm looking to deploy Whisper for an app I'm working on which will require multiple transcriptions of small audio chunks to take place concurrently. If I were to deploy your solution on EC2, what sort of specs would I need?
why. just have the API feed the chunks unless you have massive rig it's likly better to send the audio to whisper one at a time or use something like ffmpeg to join them together or transcode them even. also do cool stuff like automatic UMM and UHH removal etc ... (WIP) you don't want to do any of this in the cloud unless it's 1off ... I don't know of any GPU service that pays you per usage insted of per uptime IE if the GPU is on then they charge you.... I had to get a 3090 $700 to ensure my services run and I don't have to worry about anything going wrong.
thank you so much! One question, in the first version of whisper you couldn't do a translation from English to Spanish. You could only do a .transcribe of one language or another but not the translation. Do you know if whisper v3 can now do translations from English to Spanish? or any updated whisperX or any options? In truth, where I want to use it the most is, for example, translating your videos since the TH-cam translator is very bad and it is difficult to follow you. If possible, could you make a video? ;)
Hii 👋,
Can you do for whisperjax?😉
Is it possible to use this for real-time streaming? My goal is to see if it's better than Chrome's captions. I want it to take the audio from the browser and transcribe it. Then, using ChatGPT, translate it to another language (Spanish to English). If I speak through the mic, I want it to do the same thing.
if you have a like 8gig Nvidia GPU just use whisper
I notice you're pushing the audio file via http post method. Is there anyway to pull the file from a given location? i.e. from AWS S3 bucket, file system etc...
Hey , you created video on the text to image API in past , so can we able to create API that can use checkpoint from civitai , like able to use multiple checkpoint , models and able to call that API ? Is it possible ?
Why would you comment on a video off topic ? respond to his orig video then ... not here .. just kagi search for stable diffustoin API or just use OpenWeb UI and write a tool or Pipe for it ...
How can I use async with the code line: result = model.transcribe(temp.name)
Thank you!
It's working perfectly
Best way to deploy this container? AWS EC2 kind of expensive... needs lot of RAM
there 1000 GPU servies out there but like above .. you pay not per usage but for the image being on so you end up spending money for not using the service
Do you know if speaker diarization (breaking up the transcription be each speaker) can be built into this?
kagi search "speaker diarization whisper" its built in
Someone know how to handle myltiples requests and running in differents GPU sockets?
Because I have four GPU in the server but the model and FastAPI only use one GPU (number 0)
what happen when i pass 8 gn file
When I run in Postman in headers I put Content-Type: multipart/form-data and in the Body I put Key as "files" and for Value I upload the .wav file. For some reason I get files: undefined
Maybe on Mac I'm supposed to do something different?
I got the same error. Because I called Files to the parameter and is mandatory (from FastAPI documentation) to call "file" the parameter in the function.
file: UploadFile
Then, you can access to the file:
File = file.file
requirements file in incomplete. Is not working with the whisper library that i am usign from pypi
You don't have to install Whisper from pypi from requirements.txt. Dockerfile will take care of it. As it is building directly from Git.
@@AIAnytime I figured it out finally.
There were some issues in the newer version of openai-whisper package.
fastapi==0.78.0
uvicorn[standard]==0.23.2
aiofiles==23.2.1
python-multipart==0.0.6
torch==2.0.1
openai-whisper==20230314
tiktoken==0.3.1
hero
How can I run with GPU.
Currently when I run a container, the code line "DEVICE = "cuda" if torch.cuda.is_available() else "cpu"" the DEVICE is "cpu" though my computer has gpu.
Thanks.
Yah I think all the torch stuff is missing ?? the docker as of today 20240825 does not work for me GPU 3090