Build a Containerized Transcription API using Whisper Model and FastAPI

AI Anytime

มุมมอง 7 627

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 12 ก.ย. 2024

ความคิดเห็น • 39

@shivamroy1775 11 หลายเดือนก่อน ⁺²
Absolute quality content. So informative and I love how every step is explained in great detail.
@AIAnytime 11 หลายเดือนก่อน
Glad you liked it!
@freeload101 19 วันที่ผ่านมา
Thanks ! I have so may gaps in Docker and how it works. I learned so much! I am working on getting something like your TH-cam-Video-Summarization-App but for youtube downloader so it can eat any media not just youtube
@AIAnytime 19 วันที่ผ่านมา
Liked the idea...
@kshitizkhandelwal879 11 หลายเดือนก่อน ⁺²
You are incredible. Can we get more of end to end projects involving Docker
@AIAnytime 11 หลายเดือนก่อน
Thanks... you can watch this as well. th-cam.com/video/7CeAJ0EbzDA/w-d-xo.html
@MrZelektronz 10 หลายเดือนก่อน ⁺¹
Solely judging from the title this is exactly what i need. I hope it works as I expect :D gonna keep watching
@AIAnytime 10 หลายเดือนก่อน
Thanks 👍
@shubhbhalla3850 6 หลายเดือนก่อน ⁺¹
Great explanations, thank you so much for the tutorial!
@AIAnytime 6 หลายเดือนก่อน
You're very welcome!
@chrisumali9841 7 หลายเดือนก่อน
Thanks for the demo and info, very informative and precise. I truly appreciate it. Easy to deploy. Have a great day.
@AIAnytime 7 หลายเดือนก่อน
Glad it was helpful!
@nicolassuarez2933 3 หลายเดือนก่อน ⁺¹
Outstanding!
@klamangarogaro6753 หลายเดือนก่อน
thank you
@HowayaNowTed 2 หลายเดือนก่อน
Great video, thanks very much! I'm looking to deploy Whisper for an app I'm working on which will require multiple transcriptions of small audio chunks to take place concurrently. If I were to deploy your solution on EC2, what sort of specs would I need?
@freeload101 19 วันที่ผ่านมา
why. just have the API feed the chunks unless you have massive rig it's likly better to send the audio to whisper one at a time or use something like ffmpeg to join them together or transcode them even. also do cool stuff like automatic UMM and UHH removal etc ... (WIP) you don't want to do any of this in the cloud unless it's 1off ... I don't know of any GPU service that pays you per usage insted of per uptime IE if the GPU is on then they charge you.... I had to get a 3090 $700 to ensure my services run and I don't have to worry about anything going wrong.
@SonGoku-pc7jl 7 หลายเดือนก่อน
thank you so much! One question, in the first version of whisper you couldn't do a translation from English to Spanish. You could only do a .transcribe of one language or another but not the translation. Do you know if whisper v3 can now do translations from English to Spanish? or any updated whisperX or any options? In truth, where I want to use it the most is, for example, translating your videos since the TH-cam translator is very bad and it is difficult to follow you. If possible, could you make a video? ;)
@RAVINDRACHOWDARY 7 หลายเดือนก่อน ⁺¹
Hii 👋,
Can you do for whisperjax?😉
@dcleinad 2 หลายเดือนก่อน
Is it possible to use this for real-time streaming? My goal is to see if it's better than Chrome's captions. I want it to take the audio from the browser and transcribe it. Then, using ChatGPT, translate it to another language (Spanish to English). If I speak through the mic, I want it to do the same thing.
@freeload101 19 วันที่ผ่านมา
if you have a like 8gig Nvidia GPU just use whisper
@ryanbradbury3745 6 หลายเดือนก่อน
I notice you're pushing the audio file via http post method. Is there anyway to pull the file from a given location? i.e. from AWS S3 bucket, file system etc...
@harshkadam3702 11 หลายเดือนก่อน ⁺¹
Hey , you created video on the text to image API in past , so can we able to create API that can use checkpoint from civitai , like able to use multiple checkpoint , models and able to call that API ? Is it possible ?
@freeload101 19 วันที่ผ่านมา
Why would you comment on a video off topic ? respond to his orig video then ... not here .. just kagi search for stable diffustoin API or just use OpenWeb UI and write a tool or Pipe for it ...
@nguyentoanhnt 3 หลายเดือนก่อน
How can I use async with the code line: result = model.transcribe(temp.name)
@anukamithara 2 หลายเดือนก่อน ⁺¹
Thank you!
It's working perfectly
@nicolassuarez2933 3 หลายเดือนก่อน
Best way to deploy this container? AWS EC2 kind of expensive... needs lot of RAM
@freeload101 19 วันที่ผ่านมา
there 1000 GPU servies out there but like above .. you pay not per usage but for the image being on so you end up spending money for not using the service
@joshmay9531 7 หลายเดือนก่อน
Do you know if speaker diarization (breaking up the transcription be each speaker) can be built into this?
@freeload101 19 วันที่ผ่านมา
kagi search "speaker diarization whisper" its built in
@josuechacon6240 8 หลายเดือนก่อน
Someone know how to handle myltiples requests and running in differents GPU sockets?
Because I have four GPU in the server but the model and FastAPI only use one GPU (number 0)
@user-jf5ru5ow8u 4 หลายเดือนก่อน
what happen when i pass 8 gn file
@rois8888 10 หลายเดือนก่อน
When I run in Postman in headers I put Content-Type: multipart/form-data and in the Body I put Key as "files" and for Value I upload the .wav file. For some reason I get files: undefined
Maybe on Mac I'm supposed to do something different?
@josuechacon6240 8 หลายเดือนก่อน
I got the same error. Because I called Files to the parameter and is mandatory (from FastAPI documentation) to call "file" the parameter in the function.
file: UploadFile
Then, you can access to the file:
File = file.file
@concaption 11 หลายเดือนก่อน
requirements file in incomplete. Is not working with the whisper library that i am usign from pypi
@AIAnytime 11 หลายเดือนก่อน ⁺¹
You don't have to install Whisper from pypi from requirements.txt. Dockerfile will take care of it. As it is building directly from Git.
@concaption 11 หลายเดือนก่อน
@@AIAnytime I figured it out finally.
There were some issues in the newer version of openai-whisper package.
fastapi==0.78.0
uvicorn[standard]==0.23.2
aiofiles==23.2.1
python-multipart==0.0.6
torch==2.0.1
openai-whisper==20230314
tiktoken==0.3.1
@datasciencetoday7127 11 หลายเดือนก่อน
hero
@nguyentoanhnt 3 หลายเดือนก่อน
How can I run with GPU.
Currently when I run a container, the code line "DEVICE = "cuda" if torch.cuda.is_available() else "cpu"" the DEVICE is "cpu" though my computer has gpu.
Thanks.
@freeload101 19 วันที่ผ่านมา
Yah I think all the torch stuff is missing ?? the docker as of today 20240825 does not work for me GPU 3090

ต่อไป

เล่นอัตโนมัติ

Is this the Fastest Speech to Text AI Model?