Computer Science ›› 2022, Vol. 49 ›› Issue (6): 187-192.doi: 10.11896/jsjkx.210500114

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Speech Enhancement Based on Time-Frequency Domain GAN

YIN Wen-bing1, GAO Ge1, ZENG Bang1, WANG Xiao1, CHEN Yi2   

  1. 1 National Engineering Research Center for Multimedia Software,Wuhan University,Wuhan 430072,China
    2 School of Computer Science,Central China Normal University,Wuhan 430077,China
  • Received:2021-05-17 Revised:2021-09-04 Online:2022-06-15 Published:2022-06-08
  • About author:YIN Wen-bing,born in 1997,postgra-duate.His main research interests include speech enhancement and so on.
    GAO Ge,born in 1973,Ph.D,professor,is a member of China Computer Federation.His main research interests include speech processing and computer vision.

Abstract: The traditional speech enhancement algorithm based on generative adversarial networks (SEGAN) enhances speech in the time domain,and completely ignores the distribution of speech samples in frequency domain.Under the condition of low signal-to-noise ratio,the speech signal will be submerged in noise,and the time-domain distribution information of noisy speech is difficult to capture.Therefore,the enhancement performance of SEGAN will drop sharply,and the speech quality and speech intelligibility of its enhanced speech are very low.To solve this problem,this paper proposes a speech enhancement algorithm (time-frequency domain SEGAN,TFSEGAN) based on time-frequency domain generation confrontation network.TFSEGAN adopts the model structure of the time-frequency domain dual discriminator,and a time-frequency L1 loss function.The input of time domain discriminator is time domain feature of the speech sample,and the input of frequency domain discriminator is frequency domain feature of the speech sample.In the training process,time-domain discriminator uses the time-domain distribution information of speech sample as the criterion,and frequency-domain discriminator uses the frequency-domain distribution information of the speech sample as the criterion.Under the action of two discriminators,the generator of TFSEGAN could simulta-neously learn the distribution rules and information of speech samples in time domain and frequency domain.Experiments prove that,compared with SEGAN,the speech quality and intelligibility of TFSEGAN improve by about 17.45% and 11.75% respectively at low signal-to-noise ratio.

Key words: Generative adversarial network, Low signal-to-noise ratio, Speech enhancement, Speech intelligibility, Speech qua-lity, Time-frequency domain

CLC Number: 

  • TN912.35
[1] BOLL S F.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics Speech & Signal Processing,1979,27(2):113-120.
[2] LIM J S,OPPENHEIM A V.Enhancement and bandwidth compression of noisy speech[J].Proceedings of the IEEE,2005,67(12):1586-1604.
[3] MCAULAY R J,MALPASS M L.Speech enhancement using a soft-decision noise suppression filter[J].IEEE Trans. Acoust. Speech Signal Process,1980,28(2):137-145.
[4] DENDRINOS M,BAKAMIDIS S,CARAYANNIS G.Speechenhancement from noise:A regenerative approach[J].Speech Communication,1991,10(1):45-57.
[5] WANG D L.On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis[M].Springer,US,2005.
[6] SRINIVASAN S,ROMAN N,WANG D L.Binary and ratiotime-frequency masks for robust speech recognition[J].Speech Communication,2006,48(11):1486-1501.
[7] OORD A,DIELEMAN S,ZEN H,et al.Wavenet:A generative model for raw audio[J].arXiv:1609.03499,2016.
[8] QIAN K,ZHANG Y,CHANG S,et al.Speech EnhancementUsing Bayesian Wavenet[C]//Interspeech.2017:2013-2017.
[9] RETHAGE D,PONS J,SERRA X.A wavenet for speech de-noising[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5069-5073.
[10] PASCUAL S,BONAFONTE A,SERRA J.SEGAN:Speech enhancement generative adversarial network[J].arXiv:1703.09452,2017.
[11] PHAN H,MCLOUGHLIN I V,PHAM L,et al.ImprovingGANs for speech enhancement[J].IEEE Signal Processing Letters,2020,27:1700-1704.
[12] ZHANG Z,DENG C,SHEN Y,et al.On loss functions and recurrency training for GAN-based speech enhancement systems[J].arXiv:2007.14974,2020.
[13] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
[14] MIRZA M,OSINDERO S.Conditional Generative AdversarialNets[J].Computer Science,2014:2672-2680.
[15] ODENA A.Semi-supervised learning with generative adversarial networks[J].arXiv:1606.01583,2016.
[16] DONAHUE J,KRÄHENBÜHL P,DARRELL T.Adversarial feature learning[J].arXiv:1605.09782,2016.
[17] MAO X,LI Q,XIE H,et al.Least squares generative adversarial networks[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:2794-2802.
[18] YUAN W H,SHI Y L,HU S D,et al.A Speech Enhancement Approach Based on Fusion of Time-Domain and Frequency-Domain Features[J].Computer Engineering,2021,47(10):75-81.
[19] LIU H,LI Y,YUAN H Q,et al.Speech Signal Separation Based on Generative Adversarial Networks[J].Computer Enginee-ring,2020,46(1):302-308.
[20] LIU S H,SUN X,LI C B.Emotion Recognition Using EEG Signals Based on Location Information Reconstruction and Time-Frequency Information Fusion[J].Computer Engineering,2021,47(12):95-102.
[1] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[2] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[3] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[4] XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[5] XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.
[6] DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202.
[7] GAO Zhi-yu, WANG Tian-jing, WANG Yue, SHEN Hang, BAI Guang-wei. Traffic Prediction Method for 5G Network Based on Generative Adversarial Network [J]. Computer Science, 2022, 49(4): 321-328.
[8] LI Si-quan, WAN Yong-jing, JIANG Cui-ling. Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal [J]. Computer Science, 2022, 49(3): 179-184.
[9] LI Jian, GUO Yan-ming, YU Tian-yuan, WU Yu-lun, WANG Xiang-han, LAO Song-yang. Multi-target Category Adversarial Example Generating Algorithm Based on GAN [J]. Computer Science, 2022, 49(2): 83-91.
[10] TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[11] CHEN Gui-qiang, HE Jun. Study on Super-resolution Reconstruction Algorithm of Remote Sensing Images in Natural Scene [J]. Computer Science, 2022, 49(2): 116-122.
[12] SHI Da, LU Tian-liang, DU Yan-hui, ZHANG Jian-ling, BAO Yu-xuan. Generation Model of Gender-forged Face Image Based on Improved CycleGAN [J]. Computer Science, 2022, 49(2): 31-39.
[13] JIANG Zong-li, FAN Ke, ZHANG Jin-li. Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(1): 133-139.
[14] ZHANG Wei-qi, TANG Yi-feng, LI Lin-yan, HU Fu-yuan. Image Stream From Paragraph Method Based on Scene Graph [J]. Computer Science, 2022, 49(1): 233-240.
[15] LIN Zhen-xian, ZHANG Meng-kai, WU Cheng-mao, ZHENG Xing-ning. Face Image Inpainting with Generative Adversarial Network [J]. Computer Science, 2021, 48(9): 174-180.
Full text



No Suggested Reading articles found!