Dilated FCN: Listening Longer to Hear Better

التفاصيل البيبلوغرافية
العنوان: Dilated FCN: Listening Longer to Hear Better
المؤلفون: Gong, Shuyu, Wang, Zhewei, Sun, Tao, Zhang, Yuanhang, Smith, Charles D., Xu, Li, Liu, Jundong
سنة النشر: 2019
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: Deep neural network solutions have emerged as a new and powerful paradigm for speech enhancement (SE). The capabilities to capture long context and extract multi-scale patterns are crucial to design effective SE networks. Such capabilities, however, are often in conflict with the goal of maintaining compact networks to ensure good system generalization. In this paper, we explore dilation operations and apply them to fully convolutional networks (FCNs) to address this issue. Dilations equip the networks with greatly expanded receptive fields, without increasing the number of parameters. Different strategies to fuse multi-scale dilations, as well as to install the dilation modules are explored in this work. Using Noisy VCTK and AzBio sentences datasets, we demonstrate that the proposed dilation models significantly improve over the baseline FCN and outperform the state-of-the-art SE solutions.
Comment: 5 pages; will appear in WASPAA conference
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/1907.11956
رقم الانضمام: edsarx.1907.11956
قاعدة البيانات: arXiv