全部

Generative models for speech processing

  • 演讲者:Meng Yu(Tencent AI Lab)

  • 时间:2023-12-26 10:20-11:20

  • 地点:理学院大楼 M5024

Abstract

 In today’s speech processing, enhancing speech quality in challenging acoustic settings remains a challenge.  Current deep learning solutions often can’t fully negate background noise or reverberation, affecting the listening experience.  Our research introduces a pioneering method that utilizes pre-trained generative techniques to recreate clear speech from inferior inputs.  By capitalizing on pre-trained vocoder and codec models, our approach ensures superior speech quality and resilience in demanding scenarios.  Generative techniques skillfully address speech signal information loss, leading to enhanced audio clarity and minimized distortions.  Our experiments, spanning both simulated and real-world datasets, emphasize the method’s effectiveness.  Notably, using codec models resulted in optimal audio ratings.  This research emphasizes the immense potential of leveraging pre-trained generative tools, especially where conventional methods fall short.