Have you guys ever used pyannote audio before? I'm trying to figure out what's best to get the "essence" of someones voice from their voice embedding. Do I extract embeddings from longer audio clips of the person's voice? Or do I extract embeddings from 30ms chunks of voice audio and then obtain an average embedding from the embeddings I extracted?
Have you guys ever used pyannote audio before? I'm trying to figure out what's best to get the "essence" of someones voice from their voice embedding. Do I extract embeddings from longer audio clips of the person's voice? Or do I extract embeddings from 30ms chunks of voice audio and then obtain an average embedding from the embeddings I extracted?