one thing I'd add is an LLM powered search bar to search for nuggets in natural language. So something like "show me where I talk about X", and it jumps to that timestamp
So much fun statistics can be collected with this too. How many words spoken during a stream. Word occurrence histogram. Average word complexity. Then group the words together to identify a set of tags for the video. This is so cool.
Should highlight the current position in the text window to help with debugging to you know exactly what words are at which specific parts in the timelines
In your place, i'd try to use a text editing widget for the script text and use it in read-only mode. That may detecting which word a user clicked on easier because it's likely built in. Also, you need a way to search in the text, again, just like in a text editor.
Along with the text highlighting which others have mentioned, maybe change the audio based on text edits in the transcript, although this might be a very intense thing to do
Hi, I really like your project and seeing you go through the dev process. I think it would be easier to track if the current spoken word would be in bold, or a different color. A bit like karaoke. And from what I understand, Whisper already gives you the ranges of each word. While playing the video, iterating through the list would be fast. And on seek, since the word & time range list is in a sorted order, doing a binary search should be pretty fast.
A QoL feature might be highlighting the words on the right panel if they are included in the current clips on the timeline.
whats up whats up whats up
Next step: Editing the text and the video gets updated.
one thing I'd add is an LLM powered search bar to search for nuggets in natural language. So something like "show me where I talk about X", and it jumps to that timestamp
This is looking great
I miss the time where we played trackmania on the school computers. It was fun
So much fun statistics can be collected with this too. How many words spoken during a stream. Word occurrence histogram. Average word complexity. Then group the words together to identify a set of tags for the video. This is so cool.
Should highlight the current position in the text window to help with debugging to you know exactly what words are at which specific parts in the timelines
"is it a bug or is it just a little bit of a surprise" 😂😂
In your place, i'd try to use a text editing widget for the script text and use it in read-only mode. That may detecting which word a user clicked on easier because it's likely built in.
Also, you need a way to search in the text, again, just like in a text editor.
Along with the text highlighting which others have mentioned, maybe change the audio based on text edits in the transcript, although this might be a very intense thing to do
Hi,
I really like your project and seeing you go through the dev process.
I think it would be easier to track if the current spoken word would be in bold, or a different color. A bit like karaoke.
And from what I understand, Whisper already gives you the ranges of each word.
While playing the video, iterating through the list would be fast.
And on seek, since the word & time range list is in a sorted order, doing a binary search should be pretty fast.