面向电台通信的CLU-Net语音增强网络

doi:10.11896/jsjkx.230700200

Abstract

Abstract: In order to overcome the adverse effects of environmental and channel noise on speech communication quality in radio systems and improve the speech quality of radio communication,this paper proposes a deep separable network called CLU-Net(channel attention and LSTM-based U-Net),which adopts the deep U-shape architecture and long short-term memory(LSTM).In the network,deep separable convolution is used to implement low-complexity feature coding.The combination of attention mechanisms and LSTM can pay attention to the relationship between different convolution channels and the context of clean speech simultaneously and obtain the clean speech characteristic with fewer parameters.Varieties of noisy speech datasets are tested,including public and self-built sets using noise collected in different environments and radio systems.The results of the simulation experiment on the VoiceBank-DEMAND dataset indicate that the proposed method outperforms similar speech enhancement models in terms of objective metrics such as PESQ and STOI.Field experimental results show that the enhancement scheme can effectively suppress different environmental and radio noise types.The performance under low signal-to-noise ratios is superior to that of the same kind of enhancement networks.

Key words: Radio communication, Voice enhancement, Deep separable convolution, Attention mechanism

CLC Number:

TP391

YAO Yao, YANG Jibin, ZHANG Xiongwei, LI Yihao, SONG Gongkunkun. CLU-Net Speech Enhancement Network for Radio Communication[J].Computer Science, 2024, 51(9): 338-345.

References

[1]WANG Y P,WEI G H,PAN X D,et al.Prediction model and experiment of out-of-band dual-band interference of communication station[J].Acta Electronica Sinica,2019,47(4):826-831.
[2]LI S,CAO F.Research on end-to-end framework model analysis and trend of intelligent speech technology[J].Computer Science,2022,49(S1):331-336.
[3]PASCUAL S,BONAFONTE A,SERRA J.SEGAN:Speech Enhancement Generative Adversarial Network[C]//Conference of the International Speech Communication Association.2017:3642-3646.
[4]PANDEY A,WANG D.TCNN:Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2019).Brighton,UK,2019:6875-6879.
[5]PANDEY A,WANG D L.Densely connected neural networkwith dilated convolutions for real-time speech enhancement in the time domain[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:6629-6633.
[6]FAN J Y,YANG J B,ZHANG X W,et al.Single-channel speech enhancement based on multi-head attention mechanism in U-net network[J].Acta Acoustica Sinica,2022,47(6):703-716.
[7]LI L,ZHU Y,ZHU Z.Automatic Modulation ClassificationUsing ResNeXt-GRU With Deep Feature Fusion[J].IEEE Tran-sactions on Instrumentation and Measurement,2023,72:1-10.
[8]CHOLLET F.Xception:Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258.
[9]BENGIO Y,SIMARD P,FRASCONI P,Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Instrumentation and Measurement,1994,5(2):157-166.
[10]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[11]BANG J Y,SUN M,ZHANG X W,et al.Lightweight Model for Bone-Conducted Speech Enhancement Based on Convolution Network and Residual Long Short-Time Memory Network[J].Journal of Data Acquisition & Processing,2021,36(5):921-931.
[12]ZHANG Q,SONG Q,NI Z,et al.Time-frequency attention for monaural speech enhancement[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2022:7852-7856.
[13]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[14]TOLOOSHAMS B,GIRI R,SONG A H,et al.Channel-atten-tion dense u-net for multichannel speech enhancement[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).Barcelona,Spain.IEEE,2020:836-840.
[15]ZHU X,LI J,LIU Y,et al.A Survey on Model Compression for Large Language Models[J].arXiv:2308.07633,2023.
[16]ANDREW G H,MENGLONG Z,BO C,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[17]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremelyefficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[18]ZENG Y,LI Y,ZHOU Z,et al.Domestic activities classification from audio recordings using multi-scale dilated depthwise separable convolutional network[C]//2021 IEEE 23rd International Workshop on Multimedia Signal Processing(MMSP).IEEE,2021:1-5.
[19]TAN K,WANG D L.A convolutional recurrent neural network for real-time speech enhancement[C]//Interspeech 2018.2018:3229-3233.
[20]LE X,CHEN H,CHEN K,et al.DPCRN:Dual-path convolution recurrent network for single channel speech enhancement[C]//Interspeech 2021,22nd Annual Conference of the International Speech Communication Association.Brno,Czechia,2021:2811-2815.
[21]DEFOSSEZ A,SYNNAEVE G,ADI Y.Real time speech en-hancement in the waveform domain[C]//Interspeech 2020,21st Annual Conference of the International Speech Communication Association,Virtual Event.2020:3291-3295.
[22]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[23]FU J,LIU J,TIAN H,et al.Dual attention network for scenesegmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3146-3154.
[24]PARK H J,KANG B H,SHIN W,et al.Manner:Multi-view attention network for noise erasure[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).Singapore,IEEE,2022:7842-7846.
[25]LI Y,WANG W,CHEN H,et al.Few-shot speaker identifica-tion using depthwise separable convolutional network with channel attention[J].arXiv:2204.11180,2022.
[26]VALENTINI-BOTINHAO C,WANG X,TAKAKI S,et al.Investigating RNN-based speech enhancement methods for noise-robust text-to-speech[C]//SSW.2016:146-152.
[27]WANG D,ZHANG X.Thchs-30:A free chinese speech corpus[J].arXiv:1512.01882,2015.
[28]RIX A W,BEERENDS J G,HOLLIER M P,et al.Perceptualevaluation of speech quality(PESQ)－a new method for speecn quality assessment of telephone networks and codecs[C]//Proceedings of the 26th International Conference on Acoustics,Speech,and Signal Processing.Utah:IEEE,2001:749-752.
[29]TAAL C H,HENDRIKS R C,HEUSDENS R,et al.An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(7):2125-2136.
[30]HU Y,LOIZOU P C.Evaluation of objective quality measuresfor speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,16(1):229-238.
[31]MACARTNEY C,WEYDE T.Improved speech enhancementwith the Wave-U-Net[J].arXiv:1811.11307,2018.
[32]FU S W,LIAO C F,TSAO Y,et al.Metricgan:Generative adversarial networks based black-box metric scores optimization for speech enhancement[C]//International Conference on Machine Learning.PMLR,2019:2031-2041.
[33]YIN D,LUO C,XIONG Z,et al.Phasen:A phase-and-harmo-nics-aware speech enhancement network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):9458-9465.
[34]ZHANG Q Q,AARON M N,WANG M J,et al.Deepmmse:A deep learning approach to mmse-based noise power spectral density estimation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,IEEE,2020,28(1):1404-1415.
[35]WANG K,HE B,ZHU W P.TSTNN:Two-Stage Transformer Based Neural Network for Speech Enhancement in the Time Domain[C]//2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2021).Toronto,ON,Canada,2021:7098-7102.
[36]KONG Z,PING W,DANTREY A,et al.Speech denoising in the waveform domain with self-attention[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2022:7867-7871.

Related Articles 15

[1]	LI Yunchen, ZHANG Rui, WANG Jiabao, LI Yang, WANG Ziqi, CHEN Yao. Re-parameterization Enhanced Dual-modal Realtime Object Detection Model [J]. Computer Science, 2024, 51(9): 162-172.
[2]	HU Pengfei, WANG Youguo, ZHAI Qiqing, YAN Jun, BAI Quan. Night Vehicle Detection Algorithm Based on YOLOv5s and Bistable Stochastic Resonance [J]. Computer Science, 2024, 51(9): 173-181.
[3]	LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[4]	LI Zhe, LIU Yiyang, WANG Ke, YANG Jie, LI Yafei, XU Mingliang. Real-time Prediction Model of Carrier Aircraft Landing Trajectory Based on Stagewise Autoencoders and Attention Mechanism [J]. Computer Science, 2024, 51(9): 273-282.
[5]	LIU Qilong, LI Bicheng, HUANG Zhiyong. CCSD:Topic-oriented Sarcasm Detection [J]. Computer Science, 2024, 51(9): 310-318.
[6]	LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[7]	ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[8]	WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[9]	XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[10]	PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[11]	ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[12]	WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[13]	ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[14]	CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323.
[15]	BAI Wenchao, BAI Shuwen, HAN Xixian, ZHAO Yubo. Efficient Query Workload Prediction Algorithm Based on TCN-A [J]. Computer Science, 2024, 51(7): 71-79.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

CLU-Net Speech Enhancement Network for Radio Communication

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0