Research-Generative models for speech processing

Research

All

Generative models for speech processing

Speaker: Meng Yu（Tencent AI Lab）
Time: Dec 26, 2023, 10:20-11:20
Location: Room M5024, College of Science Building

Abstract

In today’s speech processing, enhancing speech quality in challenging acoustic settings remains a challenge. Current deep learning solutions often can’t fully negate background noise or reverberation, affecting the listening experience. Our research introduces a pioneering method that utilizes pre-trained generative techniques to recreate clear speech from inferior inputs. By capitalizing on pre-trained vocoder and codec models, our approach ensures superior speech quality and resilience in demanding scenarios. Generative techniques skillfully address speech signal information loss, leading to enhanced audio clarity and minimized distortions. Our experiments, spanning both simulated and real-world datasets, emphasize the method’s effectiveness. Notably, using codec models resulted in optimal audio ratings. This research emphasizes the immense potential of leveraging pre-trained generative tools, especially where conventional methods fall short.