计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 177-184.doi: 10.11896/jsjkx.221000024

• 计算机图形学&多媒体 • 上一篇    下一篇

一种三维度基于改进MFCC特征模型的AI克隆语音源鉴定方法

王学光1, 诸珺文1, 张爱新2   

  1. 1 华东政法大学刑事法学院 上海 200042
    2 上海交通大学网络空间安全学院上海 200240
  • 收稿日期:2022-10-07 修回日期:2022-11-27 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 王学光(wangxueguang@ecupl.edu.cn)
  • 基金资助:
    国家重点研发计划(2017YFB0802103)

Three-dimensional AI Clone Speech Source Identification Method Based on Improved MFCCFeature Model

WANG Xueguang1, ZHU Junwen1, ZHANG Aixin2   

  1. 1 College of Criminal Justice,East China University of Political Science and Law,Shanghai 200042,China
    2 School of Cyber Science and Engineering,Shanghai Jiao Tong University,Shanghai,200240,China
  • Received:2022-10-07 Revised:2022-11-27 Online:2023-11-15 Published:2023-11-06
  • About author:WANG Xueguang,born in 1975,Ph.D,professor,is a member of China Computer Federation.His main research interests include computer networks,big data application and electronic data.
  • Supported by:
    National Key R & D Program of China(2017YFB0802103).

摘要: AI克隆语音技术的出现将对现代社会法治秩序造成致命冲击。近年来研究人员仅关注了AI合成语音与样本语音内容相同领域的研究,而对AI合成语音与样本内容不同的检材的鉴定研究却甚少,相关鉴定内容无法识别。为此,提出了一种三维度基于改进MFCC特征模型对AI克隆语音源进行鉴定。首先对先前研究人员人工分析的AI克隆语音特性进行验证,总结出可识别的“共振峰F5异常活跃”与“能量、共振峰、音高曲线异常突变”的特征。其次基于AI克隆语音的特征运用二阶差分修正MFCC系数并采用“逆差逻辑推演法”将能量、共振峰、音高曲线突变特性进一步量化采样,将其定义为语音鉴定的特征向量三元组。然后以特征向量三元组为输入,运用D-S证据合成规则将三组检材与样本比对的结果融合。最后形成三维度基于改进MFCC特征参量的检材评定模型。人群随机采样实验结果表明,该AI克隆语音源鉴定方法对以同一人为克隆源所合成的AI克隆语音鉴定的平均概率为67.324%,标准差为7.32%,鉴定效果很好。

关键词: AI克隆语音, MFCC特征, 三维度语音建模, 语音源鉴定

Abstract: The emergence of AI cloned voice technology will have a fatal impact on the legal order of modern society.In recent years,researchers have only focused on the research in the field of AI-synthesized speech containing the same sample speech content,but little research has been done on the identification of AI-synthesized speech containing the content that is different from the sample content.Thus,this paper proposes a three-dimensional model to identify AI cloned speech sources based on an improved MFCC feature model.Firstly,it verifies the characteristics of artificially analyzed AI cloned speech by previous scholars,and summarize the characteristics of “abnormally active formant F5” and “abnormal mutation of energy,formant and pitch curve” for computer identification.Secondly,it uses the second-order difference to correct the MFCC coefficients based on the characte-ristics of AI cloned speech,and use the “inverse logic deduction method” to further quantify and sample the mutation characteristics of energy,formants,and pitch curves,and define them as feature vector ternary of speech recognition.After that,it takes the feature vector triples as input,and uses the D-S evidence synthesis rule to fuse the results of the comparison of the three groups of inspection materials with the samples.Finally,a three-dimensional material evaluation model based on improved MFCC characteristic parameters is formed.After the random sampling experiment of the crowd,the AI clone source identification method has an average probability of 67.324% with a standard deviation of 7.32% for the identification of AI clones synthesized with the same human clone source,which is very effective.

Key words: AI Clone speech, MFCC feature, Three-dimensional modeling, Speech source identification

中图分类号: 

  • TP391
[1]CASADO-VARA R,MARTIN DEL REY A,PÉREZ-PALAU D,et al.Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training[J].Mathematics,2021,9(4):421.
[2]JIANG Y,WANG Y J,LIN Q,et al.A Memory Model forImage Recognition and Classification based on Convolutional Neural Network and Bayesian Decision [J].Scientia Sinica(Technologica),2017,47(9):977-984.
[3]NGOC H L,XUAN T K.A New Approach in Elementary Chinese Pronunciation Test Using AI Voice Recognition at Hcmue[C]//13th International Conference on Education and New Learning Technologies.2021.
[4]WANG C,TEO T,JANSSEN M.Public and Private Value Crea-tion Using Artificial Intelligence:An Empirical Study of AI Voice Robot Users in Chinese Public Sector[J].International Journal of Information Management,2021,61(4):102401.
[5]JIN F.Output Analysis in Voice Interaction in AI Environment[J].Informatica,2019,43(3),321-324.
[6]XU Z H,CHEN B,ZHANG H,et al.Speech Synthesis Adaption Method Based on Phoneme-Level Speaker Embedding Under Small Data[J].Chinese Journal of Computers,2022,45(5):1003-1017.
[7]YUAN Z.A Comparative Study on the Voiceprint Characteristics of Voice Changers and Normal Voices[J].Journal of Jiangxi Police Institute,2021(6):38-47.
[8]YU J Q,JIAN Z H,XU J,et al.Spoofing Speech Detection Algorithm based on Joint Feature and Random Forest[J].Telecommunications Science,2022,38(6):91-99.
[9]CHEN Z Q,WANG H.Research on Speech Identity Recognition of Synthetic Speech[J].Guangdong Public Security Science and Technology,2021,29(3):43-46.
[10]ZHANG X H,YANG L M.Voiceprint Identification Analysis of Speech Synthesis:Based on the Voice of Two AI Virtual Announcers[J].Chinese Journal of Forensic Sciences,2022(2):69-72.
[11]LEE G T,NAM H,KIM S H,et al.Deep Learning based Cough Detection Camera Using Enhanced Features[J].Expert Systems with Applications,2022,206(15):117811.
[12]HANILI C,KINNUNEN T,SAHIDULLAH M,et al.Classi-fiers for Synthetic Speech Detection:A Comparison[C]//ISCA.Dresden:ISCA,2015:2057-2061.
[13]SAHIDULLAH M,KINNUNEN T,HANILCI C.A Comparison of Features for Synthetic Speech Detection[C]//ISCA.Dresden:lSCA,2015:2087-2091.
[14]WANG X G,ZHU J W,ZHANG A X.Identification Method of Voiceprint Identity Based on MFCC Features[J].Computer Science,2021,48(12):343-348.
[15]WANG X G,ZHU J W,ZHANG A X.Identification Method of Voiceprint Identity Based on ARIMA Prediction of MFCC Features[J].Computer Science,2022,49(5):92-97.
[16]STEPHENS R G,DUNN J C,HAYES B K,et al.A test of two processes:The effect of training on deductive and inductive reasoning[J].Cognition,2020,199:104223.
[17]CHEN D,XIANG P,JIA F.Performance Measurement of Ope-ration and Maintenance for Infrastructure Mega-Project Based on Entropy Method and DS Evidence Theory[J].Ain Shams Engineering Journal,2022,13(2):101591.
[18]LUO H,YAN G H,ZHANG M,et al.A Multi-Relational Network Important Node Mining Method based on Evidence Theory[J].Chinese Journal of Computers,2020,43(12):2398-2413.
[19]ZHANG C.Research on Mesaurement Method of ElectronicData Uncertainty[D].Chongqing:Chongqing University of Posts and Telecommunications,2021.
[20]CAI H,GUO H L.Research on Fruit Recognition Based onMulti-Classifier DS Evidence Theory Fusion[J].Journal of Chinese Agricultural Mechanization,2021,42(2):184-189.
[21]WANG C D,YE Q,YAO L,et al.Analysis of Network Malicious Behavior and Feature Association Based on Big-Data[J].Journal of Taiyuan University of Technology,2018,49(2):264-273.
[22]XU L Y,ZHANG B F,XU W M,et al.Evidence Ullage Analysis in D-S Theory and Development[J].Journal of Software,2004(1):69-75.
[23]DAVIS S,MERMELSTEIN P.Comparison of Parametric Re-presentations for Monosyllabic Word Recognition in Contin-uously Spoken Sentences[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1980,28(4):357-366.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!