Computer Science ›› 2023, Vol. 50 ›› Issue (11): 177-184.doi: 10.11896/jsjkx.221000024

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Three-dimensional AI Clone Speech Source Identification Method Based on Improved MFCCFeature Model

WANG Xueguang1, ZHU Junwen1, ZHANG Aixin2   

  1. 1 College of Criminal Justice,East China University of Political Science and Law,Shanghai 200042,China
    2 School of Cyber Science and Engineering,Shanghai Jiao Tong University,Shanghai,200240,China
  • Received:2022-10-07 Revised:2022-11-27 Online:2023-11-15 Published:2023-11-06
  • About author:WANG Xueguang,born in 1975,Ph.D,professor,is a member of China Computer Federation.His main research interests include computer networks,big data application and electronic data.
  • Supported by:
    National Key R & D Program of China(2017YFB0802103).

Abstract: The emergence of AI cloned voice technology will have a fatal impact on the legal order of modern society.In recent years,researchers have only focused on the research in the field of AI-synthesized speech containing the same sample speech content,but little research has been done on the identification of AI-synthesized speech containing the content that is different from the sample content.Thus,this paper proposes a three-dimensional model to identify AI cloned speech sources based on an improved MFCC feature model.Firstly,it verifies the characteristics of artificially analyzed AI cloned speech by previous scholars,and summarize the characteristics of “abnormally active formant F5” and “abnormal mutation of energy,formant and pitch curve” for computer identification.Secondly,it uses the second-order difference to correct the MFCC coefficients based on the characte-ristics of AI cloned speech,and use the “inverse logic deduction method” to further quantify and sample the mutation characteristics of energy,formants,and pitch curves,and define them as feature vector ternary of speech recognition.After that,it takes the feature vector triples as input,and uses the D-S evidence synthesis rule to fuse the results of the comparison of the three groups of inspection materials with the samples.Finally,a three-dimensional material evaluation model based on improved MFCC characteristic parameters is formed.After the random sampling experiment of the crowd,the AI clone source identification method has an average probability of 67.324% with a standard deviation of 7.32% for the identification of AI clones synthesized with the same human clone source,which is very effective.

Key words: AI Clone speech, MFCC feature, Three-dimensional modeling, Speech source identification

CLC Number: 

  • TP391
[1]CASADO-VARA R,MARTIN DEL REY A,PÉREZ-PALAU D,et al.Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training[J].Mathematics,2021,9(4):421.
[2]JIANG Y,WANG Y J,LIN Q,et al.A Memory Model forImage Recognition and Classification based on Convolutional Neural Network and Bayesian Decision [J].Scientia Sinica(Technologica),2017,47(9):977-984.
[3]NGOC H L,XUAN T K.A New Approach in Elementary Chinese Pronunciation Test Using AI Voice Recognition at Hcmue[C]//13th International Conference on Education and New Learning Technologies.2021.
[4]WANG C,TEO T,JANSSEN M.Public and Private Value Crea-tion Using Artificial Intelligence:An Empirical Study of AI Voice Robot Users in Chinese Public Sector[J].International Journal of Information Management,2021,61(4):102401.
[5]JIN F.Output Analysis in Voice Interaction in AI Environment[J].Informatica,2019,43(3),321-324.
[6]XU Z H,CHEN B,ZHANG H,et al.Speech Synthesis Adaption Method Based on Phoneme-Level Speaker Embedding Under Small Data[J].Chinese Journal of Computers,2022,45(5):1003-1017.
[7]YUAN Z.A Comparative Study on the Voiceprint Characteristics of Voice Changers and Normal Voices[J].Journal of Jiangxi Police Institute,2021(6):38-47.
[8]YU J Q,JIAN Z H,XU J,et al.Spoofing Speech Detection Algorithm based on Joint Feature and Random Forest[J].Telecommunications Science,2022,38(6):91-99.
[9]CHEN Z Q,WANG H.Research on Speech Identity Recognition of Synthetic Speech[J].Guangdong Public Security Science and Technology,2021,29(3):43-46.
[10]ZHANG X H,YANG L M.Voiceprint Identification Analysis of Speech Synthesis:Based on the Voice of Two AI Virtual Announcers[J].Chinese Journal of Forensic Sciences,2022(2):69-72.
[11]LEE G T,NAM H,KIM S H,et al.Deep Learning based Cough Detection Camera Using Enhanced Features[J].Expert Systems with Applications,2022,206(15):117811.
[12]HANILI C,KINNUNEN T,SAHIDULLAH M,et al.Classi-fiers for Synthetic Speech Detection:A Comparison[C]//ISCA.Dresden:ISCA,2015:2057-2061.
[13]SAHIDULLAH M,KINNUNEN T,HANILCI C.A Comparison of Features for Synthetic Speech Detection[C]//ISCA.Dresden:lSCA,2015:2087-2091.
[14]WANG X G,ZHU J W,ZHANG A X.Identification Method of Voiceprint Identity Based on MFCC Features[J].Computer Science,2021,48(12):343-348.
[15]WANG X G,ZHU J W,ZHANG A X.Identification Method of Voiceprint Identity Based on ARIMA Prediction of MFCC Features[J].Computer Science,2022,49(5):92-97.
[16]STEPHENS R G,DUNN J C,HAYES B K,et al.A test of two processes:The effect of training on deductive and inductive reasoning[J].Cognition,2020,199:104223.
[17]CHEN D,XIANG P,JIA F.Performance Measurement of Ope-ration and Maintenance for Infrastructure Mega-Project Based on Entropy Method and DS Evidence Theory[J].Ain Shams Engineering Journal,2022,13(2):101591.
[18]LUO H,YAN G H,ZHANG M,et al.A Multi-Relational Network Important Node Mining Method based on Evidence Theory[J].Chinese Journal of Computers,2020,43(12):2398-2413.
[19]ZHANG C.Research on Mesaurement Method of ElectronicData Uncertainty[D].Chongqing:Chongqing University of Posts and Telecommunications,2021.
[20]CAI H,GUO H L.Research on Fruit Recognition Based onMulti-Classifier DS Evidence Theory Fusion[J].Journal of Chinese Agricultural Mechanization,2021,42(2):184-189.
[21]WANG C D,YE Q,YAO L,et al.Analysis of Network Malicious Behavior and Feature Association Based on Big-Data[J].Journal of Taiyuan University of Technology,2018,49(2):264-273.
[22]XU L Y,ZHANG B F,XU W M,et al.Evidence Ullage Analysis in D-S Theory and Development[J].Journal of Software,2004(1):69-75.
[23]DAVIS S,MERMELSTEIN P.Comparison of Parametric Re-presentations for Monosyllabic Word Recognition in Contin-uously Spoken Sentences[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1980,28(4):357-366.
[1] KOU Xi-chao, ZHANG Hong-rui, FENG Jie, ZHENG Ya-yu. Distortion Correction Algorithm for Complex Document Image Based on Multi-level TextDetection [J]. Computer Science, 2021, 48(12): 249-255.
[2] WANG Jing-wen and LIU Hong. Three-dimensional Modeling of Soybean Leaf Based on Area Constraint [J]. Computer Science, 2013, 40(10): 301-304.
[3] TANG Yun,DENG Fei,HUANG Di-long. Three-dimensional Geological Modeling Based on Qt and OpenGL [J]. Computer Science, 2011, 38(2): 281-283.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!