基于时频域生成对抗网络的语音增强算法

doi:10.11896/jsjkx.210500114

Abstract

Abstract: The traditional speech enhancement algorithm based on generative adversarial networks (SEGAN) enhances speech in the time domain,and completely ignores the distribution of speech samples in frequency domain.Under the condition of low signal-to-noise ratio,the speech signal will be submerged in noise,and the time-domain distribution information of noisy speech is difficult to capture.Therefore,the enhancement performance of SEGAN will drop sharply,and the speech quality and speech intelligibility of its enhanced speech are very low.To solve this problem,this paper proposes a speech enhancement algorithm (time-frequency domain SEGAN,TFSEGAN) based on time-frequency domain generation confrontation network.TFSEGAN adopts the model structure of the time-frequency domain dual discriminator,and a time-frequency L1 loss function.The input of time domain discriminator is time domain feature of the speech sample,and the input of frequency domain discriminator is frequency domain feature of the speech sample.In the training process,time-domain discriminator uses the time-domain distribution information of speech sample as the criterion,and frequency-domain discriminator uses the frequency-domain distribution information of the speech sample as the criterion.Under the action of two discriminators,the generator of TFSEGAN could simulta-neously learn the distribution rules and information of speech samples in time domain and frequency domain.Experiments prove that,compared with SEGAN,the speech quality and intelligibility of TFSEGAN improve by about 17.45% and 11.75% respectively at low signal-to-noise ratio.

Key words: Generative adversarial network, Low signal-to-noise ratio, Speech enhancement, Speech intelligibility, Speech qua-lity, Time-frequency domain

CLC Number:

TN912.35

YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi. Speech Enhancement Based on Time-Frequency Domain GAN[J].Computer Science, 2022, 49(6): 187-192.

References

[1] BOLL S F.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics Speech & Signal Processing,1979,27(2):113-120.
[2] LIM J S,OPPENHEIM A V.Enhancement and bandwidth compression of noisy speech[J].Proceedings of the IEEE,2005,67(12):1586-1604.
[3] MCAULAY R J,MALPASS M L.Speech enhancement using a soft-decision noise suppression filter[J].IEEE Trans. Acoust. Speech Signal Process,1980,28(2):137-145.
[4] DENDRINOS M,BAKAMIDIS S,CARAYANNIS G.Speechenhancement from noise:A regenerative approach[J].Speech Communication,1991,10(1):45-57.
[5] WANG D L.On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis[M].Springer,US,2005.
[6] SRINIVASAN S,ROMAN N,WANG D L.Binary and ratiotime-frequency masks for robust speech recognition[J].Speech Communication,2006,48(11):1486-1501.
[7] OORD A,DIELEMAN S,ZEN H,et al.Wavenet:A generative model for raw audio[J].arXiv:1609.03499,2016.
[8] QIAN K,ZHANG Y,CHANG S,et al.Speech EnhancementUsing Bayesian Wavenet[C]//Interspeech.2017:2013-2017.
[9] RETHAGE D,PONS J,SERRA X.A wavenet for speech de-noising[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5069-5073.
[10] PASCUAL S,BONAFONTE A,SERRA J.SEGAN:Speech enhancement generative adversarial network[J].arXiv:1703.09452,2017.
[11] PHAN H,MCLOUGHLIN I V,PHAM L,et al.ImprovingGANs for speech enhancement[J].IEEE Signal Processing Letters,2020,27:1700-1704.
[12] ZHANG Z,DENG C,SHEN Y,et al.On loss functions and recurrency training for GAN-based speech enhancement systems[J].arXiv:2007.14974,2020.
[13] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
[14] MIRZA M,OSINDERO S.Conditional Generative AdversarialNets[J].Computer Science,2014:2672-2680.
[15] ODENA A.Semi-supervised learning with generative adversarial networks[J].arXiv:1606.01583,2016.
[16] DONAHUE J,KRÄHENBÜHL P,DARRELL T.Adversarial feature learning[J].arXiv:1605.09782,2016.
[17] MAO X,LI Q,XIE H,et al.Least squares generative adversarial networks[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:2794-2802.
[18] YUAN W H,SHI Y L,HU S D,et al.A Speech Enhancement Approach Based on Fusion of Time-Domain and Frequency-Domain Features[J].Computer Engineering,2021,47(10):75-81.
[19] LIU H,LI Y,YUAN H Q,et al.Speech Signal Separation Based on Generative Adversarial Networks[J].Computer Enginee-ring,2020,46(1):302-308.
[20] LIU S H,SUN X,LI C B.Emotion Recognition Using EEG Signals Based on Location Information Reconstruction and Time-Frequency Information Fusion[J].Computer Engineering,2021,47(12):95-102.

Related Articles 15

[1]	ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[2]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[3]	DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[4]	XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[5]	XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.
[6]	DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202.
[7]	GAO Zhi-yu, WANG Tian-jing, WANG Yue, SHEN Hang, BAI Guang-wei. Traffic Prediction Method for 5G Network Based on Generative Adversarial Network [J]. Computer Science, 2022, 49(4): 321-328.
[8]	LI Si-quan, WAN Yong-jing, JIANG Cui-ling. Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal [J]. Computer Science, 2022, 49(3): 179-184.
[9]	LI Jian, GUO Yan-ming, YU Tian-yuan, WU Yu-lun, WANG Xiang-han, LAO Song-yang. Multi-target Category Adversarial Example Generating Algorithm Based on GAN [J]. Computer Science, 2022, 49(2): 83-91.
[10]	TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[11]	CHEN Gui-qiang, HE Jun. Study on Super-resolution Reconstruction Algorithm of Remote Sensing Images in Natural Scene [J]. Computer Science, 2022, 49(2): 116-122.
[12]	SHI Da, LU Tian-liang, DU Yan-hui, ZHANG Jian-ling, BAO Yu-xuan. Generation Model of Gender-forged Face Image Based on Improved CycleGAN [J]. Computer Science, 2022, 49(2): 31-39.
[13]	JIANG Zong-li, FAN Ke, ZHANG Jin-li. Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(1): 133-139.
[14]	ZHANG Wei-qi, TANG Yi-feng, LI Lin-yan, HU Fu-yuan. Image Stream From Paragraph Method Based on Scene Graph [J]. Computer Science, 2022, 49(1): 233-240.
[15]	LIN Zhen-xian, ZHANG Meng-kai, WU Cheng-mao, ZHENG Xing-ning. Face Image Inpainting with Generative Adversarial Network [J]. Computer Science, 2021, 48(9): 174-180.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Speech Enhancement Based on Time-Frequency Domain GAN

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0