This was a really nice interview and interesting project. It’s incredible the superpowers that we developers have gained over the last two years. Things that you could’ve asked for 10 years ago and I would’ve said maybe with a year and a few million dollars worth of headcount are now an API call away. I have LLM‘s integrated into nearly every part of my workflow and my tooling. The way I work now looks almost nothing like the way it used to. I want to know more about the price difference with Gemini flash versus Whisper for transcription particularly with all the many flavors of local whisper that are available. I’ll have to do some research on this.
OpenAI charge $0.006 / minute for their Whisper API - so an hour of audio would cost 36 cents. Gemini 1.5 Flash is $0.075 for 1 million tokens and every second of audio is charged as 25 tokens, which means an hour is 90,000 tokens and hence costs just 0.675 cents - so it's over 50x cheaper!
@@swillison If you use GPU spot instances yourself you can run whisper large v3 turbo at about a penny per hour. Since this project only requires timestamping, and appears to have a high tolerance for timestamps not being exactly accurate, I would think your guest would be well served with just whisper tiny, which you can run at roughly 10x on a single CPU - basically free.
This was a really nice interview and interesting project. It’s incredible the superpowers that we developers have gained over the last two years. Things that you could’ve asked for 10 years ago and I would’ve said maybe with a year and a few million dollars worth of headcount are now an API call away. I have LLM‘s integrated into nearly every part of my workflow and my tooling. The way I work now looks almost nothing like the way it used to.
I want to know more about the price difference with Gemini flash versus Whisper for transcription particularly with all the many flavors of local whisper that are available. I’ll have to do some research on this.
OpenAI charge $0.006 / minute for their Whisper API - so an hour of audio would cost 36 cents.
Gemini 1.5 Flash is $0.075 for 1 million tokens and every second of audio is charged as 25 tokens, which means an hour is 90,000 tokens and hence costs just 0.675 cents - so it's over 50x cheaper!
@@swillison If you use GPU spot instances yourself you can run whisper large v3 turbo at about a penny per hour. Since this project only requires timestamping, and appears to have a high tolerance for timestamps not being exactly accurate, I would think your guest would be well served with just whisper tiny, which you can run at roughly 10x on a single CPU - basically free.