Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230700083-6.doi: 10.11896/jsjkx.230700083
• Artificial Intelligenc • Previous Articles Next Articles
LIU Xiaohu1, CHEN Defu1, LI Jun2, ZHOU Xuwen1, HU Shan1, ZHOU Hao1
CLC Number:
[1]HANSEN J H L,HASAN T.Speaker recognition by machines and humans:A tutorial review[J].IEEE Signal Processing Magazine,2015,32(6):74-99. [2]CAMPBELL J P,SHEN W,CAMPBELL W M,et al.Forensic speaker recognition[J].IEEE Signal Processing Magazine,2009,26(2):95-103. [3]CHAMPOD C,MEUWLY D.The inference of identity in forensic speaker recognition[J].Speech Communication,2000,31(2/3):193-203. [4]TOGNERI R,PULLELLA D.An overview of speaker identification:Accuracy and robustness issues[J].IEEE Circuits and Systems Magazine,2011,11(2):23-61. [5]BAI Z,ZHANG X L.Speaker recognition based on deep lear-ning:An overview[J].Neural Networks,2021,140:65-99. [6]SNYDER D,GARCIA-ROMERO D,SELL G,et al.X-vectors:Robust dnn embeddings for speaker recognition[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Calgary:IEEE,2018:5329-5333. [7]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE onference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. [8]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:1492-1500. [9]GAO S H,CHENG M M,ZHAO K,et al.Res2net:A newmulti-scale backbone architecture[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(2):652-662. [10]ZHOU T,ZHAO Y,WU J.Resnext and res2net structures for speaker verification[C]//2021 IEEE Spoken Language Techno-logy Workshop(SLT).Shenzhen:IEEE,2021:301-307. [11]KIM J,SHIM H,HEO J,et al.RawNeXt:Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Proces-sing(ICASSP 2022).Singapore:IEEE,2022:7647-7651. [12]DESPLANQUES B,THIENPONDT J,DEMUYNCK K.Ecapa-tdnn:Emphasized channel attention,propagation and aggregation in tdnn based speaker verification[J].arXiv:2005.07143,2020. [13]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [14]GONG X,LU Y,ZHOU Z,et al.Layer-wise fast adaptation for end-to-end multi-accent speech recognition[J].arXiv:2204.09883,2022. [15]GULATI A,QIN J,CHIU C C,et al.Conformer:Convolution-augmented transformer for speech recognition[J].arXiv:2005.08100,2020. [16]SAFARI P,INDIA M,HERNANDO J.Self-attention encodingand pooling for speaker recognition[J].arXiv:2008.01077,2020. [17]MARY N J M S,UMESH S,KATTA S V.S-vectors and TESA:Speaker embeddings and a speaker authenticator based on transformer encoder[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,30:404-413. [18]WANG R,AO J,ZHOU L,et al.Multi-view self-attention based transformer for speaker recognition[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).Singapore:IEEE,2022:6732-6736. [19]ZHANG Y,LV Z,WU H,et al.Mfa-conformer:Multi-scale feature aggregation conformer for automatic speaker verification[J].arXiv:2203.15249,2022. [20]SANG M,ZHAO Y,LIU G,et al.Improving Transformer-Based Networks with Locality for Automatic Speaker Verification[C]//2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2023).Rhodes Island:IEEE,2023:1-5. [21]BA J L,KIROS J R,HINTON G E.Layer normalization[J].arXiv:1607.06450,2016. [22]HENDRYCKS D,GIMPEL K.Gaussian error linear units(gelus)[J].arXiv:1606.08415,2016. [23]SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:4510-4520. [24]OKABE K,KOSHINAKA T,SHINODA K.Attentive statistics pooling for deep speaker embedding[J].arXiv:1803.10963,2018. [25]NAGRANI A,CHUNG J S,ZISSERMAN A.Voxceleb:a large-scale speaker identification dataset[J].arXiv:1706.08612,2017. [26]CHUNG J S,NAGRANI A,ZISSERMAN A.Voxceleb2:Deep speaker recognition[J].arXiv:1806.05622,2018. [27]WANG H,WANG Y,ZHOU Z,et al.Cosface:Large margin cosine loss for deep face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:5265-5274. [28]HAN B,CHEN Z,QIAN Y.Local information modeling with self-attention for speaker verification[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).Singapore:IEEE,2022:6727-6731. [29]ZHANG N,WANG J,HONG Z,et al.DT-SV:A Transformer-based Time-domain Approach for Speaker Verification[C]//2022 International Joint Conference on Neural Networks(IJCNN).Padua:IEEE,2022:1-7. [30]WANG F,SONG Z,JIANG H,et al.MACCIF-TDNN:MultiAspect Aggregation of Channel and Context Interdependence Features in TDNN-Based Speaker Verification[C]//2021 IEEE Automatic Speech Recognition and Understanding Workshop(ASRU).Cartagena:IEEE,2021:214-219. |
[1] | QUE Yue, GAN Menghan, LIU Zhiwei. Object Detection with Receptive Field Expansion and Multi-branch Aggregation [J]. Computer Science, 2024, 51(6A): 230600151-6. |
[2] | ZHANG Lanxin, XIANG Ling, LI Xianze, CHEN Jinpeng. Intelligent Fault Diagnosis Method for Rolling Bearing Based on SAMNV3 [J]. Computer Science, 2024, 51(6A): 230700167-6. |
[3] | LI Zekai, BAI Zhengyao, XIAO Xiao, ZHANG Yihan, YOU Yilin. Point Cloud Upsampling Network Incorporating Transformer and Multi-stage Learning Framework [J]. Computer Science, 2024, 51(6): 231-238. |
[4] | ZHANG Feng, HUANG Shixin, HUA Qiang, DONG Chunru. Novel Image Classification Model Based on Depth-wise Convolution Neural Network andVisual Transformer [J]. Computer Science, 2024, 51(2): 196-204. |
[5] | TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117. |
[6] | YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136. |
[7] | LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5. |
[8] | DOU Zhi, HU Chenguang, LIANG Jingyi, ZHENG Liming, LIU Guoqi. Lightweight Target Detection Algorithm Based on Improved Yolov4-tiny [J]. Computer Science, 2023, 50(6A): 220700006-7. |
[9] | WANG Xianwang, ZHOU Hao, ZHANG Minghui, ZHU Youwei. Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network [J]. Computer Science, 2023, 50(5): 155-160. |
[10] | YANG Bin, LIANG Jing, ZHOU Jiawei, ZHAO Mengci. Study on Interpretable Click-Through Rate Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(5): 12-20. |
[11] | YIN Haitao, WANG Tianyou. Image Denoising Algorithm Based on Deep Multi-scale Convolution Sparse Coding [J]. Computer Science, 2023, 50(4): 133-140. |
[12] | ZHANG Dehui, DONG Anming, YU Jiguo, ZHAO Kai andZHOU You. Speech Enhancement Based on Generative Adversarial Networks with Gated Recurrent Units and Self-attention Mechanisms [J]. Computer Science, 2023, 50(11A): 230200203-9. |
[13] | CHEN Jiajun, CHEN Wei, ZHAO Lei. Road Network Topology-aware Trajectory Representation Learning [J]. Computer Science, 2023, 50(11): 114-121. |
[14] | ZHANG Jingyuan, WANG Hongxia, HE Peisong. Multitask Transformer-based Network for Image Splicing Manipulation Detection [J]. Computer Science, 2023, 50(1): 114-122. |
[15] | JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186. |
|