具有仿冒攻击检测的鲁棒性说话人识别

doi:10.11896/jsjkx.210500147

计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 531-536.doi: 10.11896/jsjkx.210500147

具有仿冒攻击检测的鲁棒性说话人识别

郭星辰, 俞一彪

苏州大学电子信息学院江苏苏州 215000

出版日期:2022-06-10 发布日期:2022-06-08
通讯作者: 俞一彪(yuyb@suda.edu.cn)
作者简介:(450357854@qq.com)

Robust Speaker Verification with Spoofing Attack Detection

GUO Xing-chen, YU Yi-biao

Department of Electronics and Information,Soochow University,Suzhou,Jiangsu 215000,China

Online:2022-06-10 Published:2022-06-08
About author:GUO Xing-chen,born in 1994,postgra-duate.Her main research interests include speaker identification and counterfeit attack detection.
YU Yi-biao,born in 1962,professor.His main research interests include voice signal processing,multimedia communication and information hiding.

摘要/Abstract

摘要： 仿冒攻击严重影响说话人识别系统的安全应用。文中提出了一种具有录音回放仿冒攻击检测能力的说话人识别系统,该系统采用前端攻击检测与后端说话人确认的串联结构,并通过信道频响分析和说话人个性特征分析提出了一种信道频响差强化倒谱系数(Channel frequency response Difference Enhancement Cepstral Coefficient,CDECC),该特征参数通过三阶多项式非线性频率尺度变换同时强化语音信号低频段和高频段的频谱分量,能够有效反映不同输入信道频率响应和不同说话人语音频谱的差异。基于ASVspoof 2017 2.0 数据库的非特定说话人文本无关录音回放攻击检测的实验表明,采用CDECC的录音回放攻击检测等错率(EER)为25.03%,相比基线系统下降了10％。通过在说话人确认的前端嵌入录音回放攻击检测模块,说话人识别系统的错误接受率(FAR)明显下降,系统EER从3.32%下降为1.01%,鲁棒性得到有效提升。

关键词: CDECC, 录音回放攻击检测, 说话人确认, 说话人识别

Abstract: Spoofing attacks seriously affect the security application of speaker verification system.This paper proposes a speaker verification system with replay attack detection capability,which has a series connection structure of front-end attack detection and back-end speaker verification.In addition,this paper proposes a channel frequency response difference enhancement cepstral coefficient(CDECC) through channel frequency response analysis and speaker personality analysis.The CDECC enhances the low and high frequency bands of the speech signal spectrum by the third-order polynomial nonlinear frequency transform,which can effectively reflect the channel frequency response difference of different input channels and the speech spectrum difference of different speakers.The speaker and text independent replay attack detection experiment based on ASVspoof 2017 2.0 dataset shows that the equal error rate(EER) of CDECC based replay attack detection is 25.03%,which is 10.00% lower than the baseline system.By embedding the replay attack detection module at the front end of the speaker verification,the speaker verification system's false acceptance rate(FAR) is significantly reduced,the system's EER is reduced from 3.32% to 1.01%,and the robustness is effectively improved.

Key words: CDECC, Replay attack detection, Speaker recognition, Speaker verification

中图分类号:

TP370

郭星辰, 俞一彪. 具有仿冒攻击检测的鲁棒性说话人识别[J]. 计算机科学, 2022, 49(6A): 531-536. https://doi.org/10.11896/jsjkx.210500147

GUO Xing-chen, YU Yi-biao. Robust Speaker Verification with Spoofing Attack Detection[J]. Computer Science, 2022, 49(6A): 531-536. https://doi.org/10.11896/jsjkx.210500147

参考文献

[1] NAIR R,SALAM N.A reliable speaker verification systembased on LPCC and DTW[C]//2014 IEEE International Conference on Computational Intelligence and Computing Research.IEEE,2014:1-4.
[2] BALASINGAM M D,KUMAR C S.Refining Cosine Distance Features for Robust Speaker Verification[C]//2018 International Conference on Communication and Signal Processing(ICCSP).IEEE,2018:0152-0155.
[3] YAMAGISHI J,KINNUNEN T H,EVANS N,et al.Introduction to the Issue on Spoofing and Countermeasures for Automa-tic Speaker Verification[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(4):585-587.
[4] TIAN X,LEE S W,WU Z,et al.An exemplar-based approach to frequency warping for voice conversion[J].IEEE/ACM Tran-sactions on Audio,Speech,and Language Processing,2017,25(10):1863-1876.
[5] LI S L,WANG Y,GAN J Y.Recorded speech detection algo-rithm[J].Signal Processing,2017,33(1):95-101.
[6] KAMBLE M R,PATIL H A.Novel energy separation based instantaneous frequency features for spoof speech detection[C]//2017 25th European Signal Processing Conference(EUSIPCO).IEEE,2017:106-110.
[7] PATIL H A,KAMBLE M R,PATEL T B,et al.Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection[C]//INTERSPEECH.2017:12-16.
[8] JELIL S,DAS R K,PRASANNA S R M,et al.Spoof Detection Using Source,Instantaneous Frequency and Cepstral Features[C]//Interspeech.2017:22-26.
[9] JI Z,LI Z Y,LI P,et al.Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017[C]//INTERSPEECH.2017:87-91.
[10] NAGARSHETH P,KHOURY E,PATIL K,et al.Replay Attack Detection Using DNN for Channel Discrimination[C]//Interspeech.2017:97-101.
[11] ALLURI K R,ACHANTA S,KADIRI S R,et al.SFF Anti-Spoofer:IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017[C]//Interspeech.2017:107-111.
[12] WANG X,XIAO Y,ZHU X.Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing[C]//INTERSPEECH.2017:32-36.
[13] PENG H,LONG F,DING C.Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min-redundancy[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005(8):1226-1238.
[14] WITKOWSKI M,KACPRZAK S,ZELASKO P,et al.AudioReplay Attack Detection Using High-Frequency Features[C]//INTERSPEECH.2017:27-31.
[15] KAMBLE M R,PATIL H A.Analysis of Reverberation viaTeager Energy Features for Replay Spoof Speech Detection[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2019).IEEE,2019.
[16] KAMBLE M R,PATIL H A.Detection of Replay Spoof Speech Using Teager Energy Feature Cues[J].Computer Speech & Language,2020,65:101140.
[17] TODISCO M,DELGADO H,EVANS N.Constant Q cepstralcoefficients:A spoofing countermeasure for automatic speaker verification[J].Computer Speech & Language,2017,45:516-535.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

具有仿冒攻击检测的鲁棒性说话人识别

Robust Speaker Verification with Spoofing Attack Detection

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

Metrics

本文评价

推荐阅读 0

[1]	花明, 李冬冬, 王喆, 高大启. 基于帧级特征的端到端说话人识别 End-to-End Speaker Recognition Based on Frame-level Features 计算机科学, 2020, 47(10): 169-173. https://doi.org/10.11896/jsjkx.190800054
[2]	罗元,孙龙. 一种新的鲁棒声纹特征提取与融合方法 New Method of Robust Voiceprint Feature Extraction and Fusion 计算机科学, 2016, 43(8): 297-299. https://doi.org/10.11896/j.issn.1002-137X.2016.08.060
[3]	李晶皎安冬杨丹王骄. 噪声环境下说话人识别的TEO-CFCC特征参数提取方法 TEo-CrCC Characteristic Parameter Extraction Method for Speaker Recognition in Noisy Environments 计算机科学, 2012, 39(12): 198-203.
[4]	. 一种基于并行覆盖前馈优先神经网络的说话人识别方法计算机科学, 2008, 35(8): 125-128.