MIAI Deeptails Seminar : Generative Models as Data-driven Priors for Speech Enhancement
ฝัง
- เผยแพร่เมื่อ 18 พ.ย. 2024
- ABSTRACT
Generative models have recently demonstrated remarkable capabilities across various domains, including text, image, video, and audio generation, with wide-ranging applications. In the field of speech processing, these models have been actively explored to solve inverse problems such as speech enhancement, separation, and dereverberation. Two primary approaches are used to harness the power of generative models for these tasks. The first approach involves fine-tuning a pre-trained generative model using paired data (clean and corrupted speech) in a supervised manner for the specific task. The second approach, which will be the focus of this talk, leverages generative models as data-driven priors in an unsupervised manner. This approach offers several advantages: with a single generative model trained on clean speech, multiple inverse problems can be addressed without requiring additional task-specific training. Moreover, this approach tends to generalize better across different tasks. In this talk, I will review the application of variational autoencoders and diffusion models as speech priors for solving the speech enhancement problem, presenting some of our recent works in this area.
Mostafa Sadeghi
is a researcher with the Multispeech team at Inria, Nancy - Grand Est, France. He received his PhD from Sharif University of Technology, Tehran, Iran, in April 2018. He has been a visiting PhD scholar at the Information Science and Engineering Department, KTH, Stockholm, Sweden, from 2016 to 2017, and later, a research engineer at the Automatic Control Department. From August 2018 to October 2020, he was a postdoctoral researcher with the Perception team at Inria, Grenoble, collaborating with Radu Horaud and Xavier Alameda-Pineda. His current research focuses on robust audio-visual speech processing, particularly speech enhancement and separation, leveraging the synergy between deep neural networks and probabilistic machine learning approaches.