基于DC-CNN的电子伪装语音还原研究

doi:10.11896/j.issn.1002-137X.2019.08.030

计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 183-188.doi: 10.11896/j.issn.1002-137X.2019.08.030

基于DC-CNN的电子伪装语音还原研究

王永全^1,2, 施正昱^1,2,3, 张晓⁴

(华东政法大学刑事司法学院上海201620)¹
(华东政法大学信息科学与技术系上海201620)²
(复旦大学大数据学院上海200433)³
(公安部第三研究所信息网络安全公安部重点实验室上海200120)⁴

收稿日期:2018-10-05 出版日期:2019-08-15 发布日期:2019-08-15
通讯作者: 王永全(1964-),男,博士,教授,博士生导师,主要研究方向为网络空间安全、大数据与人工智能,E-mail:wangyongquan@ecupl.edu.cn
作者简介:施正昱(1996-),女,硕士生,主要研究方向为大数据与人工智能;张晓(1987-),女,硕士,助理研究员,主要研究方向为信息网络安全、电子数据与声像资料司法鉴定
基金资助:
2014年国家社会科学基金重大项目(第二批)(14ZDB147),公安部科技强警基础工作专项项目(2017GABJC33),教育部2017年第二批“云数融合科教创新”基金课题(2017B06106),华东政法大学《人工智能导论》通识重点课程建设项目(A-0312-18-174794)

Study on Restoration of Electronic Disguised Voice Based on DC-CNN

WANG Yong-quan^1,2, SHI Zheng-yu^1,2,3, ZHANG Xiao⁴

(School of Criminal Justice,East China University of Political Science and Law,Shanghai 201620,China)¹
(Department of Information Science and Technology,East China University of Political Science and Law,Shanghai 201620,China)²
(School of Data Science,Fudan University,Shanghai 200433,China)³
(Key Laboratory of Information Network Security of Ministry of Public Security,The Third Research Institute of the Ministry of Public Security,Shanghai 200120,China)⁴

Received:2018-10-05 Online:2019-08-15 Published:2019-08-15

摘要/Abstract

摘要： 针对电子伪装语音还原研究在还原模型的构建方面并无突破性进展的状况,提出了一种基于扩大的因果卷积神经网络(Dilated Casual-Convolution Neural Network,DC-CNN)的电子伪装语音还原模型。该还原模型以DC-CNN为框架,对电子伪装语音历史采样点的声学信息与还原因子进行卷积和非线性映射运算。同时模型的神经网络采用跃层连接技术以优化深层传递,再经过压扩转换后输出还原语音。该模型具有非线性映射性、扩展性、多适应性与条件性、并发性等明显特点。在实验分析中,以3个基本变声功能:音调(pitch)、节拍(tempo)和速度(rate)对钢琴曲和英文语音分别进行电子伪装变声处理,再经模型还原,将还原语音与原始语音进行声纹特征比对、LPC数据分析和语音同一性的人耳测听辨识,结果表明,还原语音与原始语音的声纹特征十分吻合,且实现了高质量的共振峰波形复原,钢琴曲和英文语音的共振峰参数总体还原拟合率分别达到79.03%和79.06%,远超电子伪装语音与原始语音35%的相似比例,这说明该模型能有效削减语音中的电子伪装特征,较好地实现了电子伪装的钢琴曲和英文语音的还原。

关键词: DC-CNN, 电子伪装语音, 还原因子, 还原语音, 门激活单元

Abstract: Aiming at the fact that there is no breakthrough in modeling for the electronic disguised voicer estoration,this paper proposed a new model based on Dilated Casual-Convolution Neural Network (DC-CNN) for restoring electronic disguised voice.DC-CNN is used as the framework of restoring model,and convolution and nonlinear mapping are performed on the historical sampling acoustic information and restoring factors of the electronic disguised voice.Meanwhile,the model’s neural network adopts skip-connection for deep transmission and outputs the restoring voice after companding transformation.The model has obvious characteristics such as nonlinear mapping,expansibility,adaptability and conditionality,concurrency,etc.In the experiment,the original voice was processed by three basic disguised functions:pitch,tempo and rate.Then,voiceprint features comparison,LPC analysis and voice identity of human audiometry recognition were made between restoring voice and original voice.The voiceprint of the restoringvoice fits that of the original voice perfectly,and high quality formant waveform restoration is achieved.The piano music’s and English voice’sgeneral restoring fitting rates of the formant’s parameters are 79.03% and 79.06% respectively,which are much higher than the similarity of electronic disguised voice to original voice.The results turn out that this model can minify the electronic disguised characteristics effectively and it is efficient on the restoration of electronic disguised piano music and English voice

Key words: DC-CNN, Electronic disguised voice, Gated activation units, Restoring voice, Restoring factor

中图分类号:

TP391

王永全, 施正昱, 张晓. 基于DC-CNN的电子伪装语音还原研究[J]. 计算机科学, 2019, 46(8): 183-188. https://doi.org/10.11896/j.issn.1002-137X.2019.08.030

WANG Yong-quan, SHI Zheng-yu, ZHANG Xiao. Study on Restoration of Electronic Disguised Voice Based on DC-CNN[J]. Computer Science, 2019, 46(8): 183-188. https://doi.org/10.11896/j.issn.1002-137X.2019.08.030

参考文献

[1]张翠玲,赵晓波.电声伪装语音的声学研究[C]∥第七届中国语音学学术会议暨语音学前沿问题国际论坛.北京,2006.
[2]ZHANG C L,TAN T J,LIU S.Study on Automatic Speaker Recognition of Disguised Voices [J].Forensic Science and Technology,2007(2):18-21.(in Chinese) 张翠玲,谭铁军,刘昇.伪装语音的自动话者识别研究[J].刑事技术,2007(2):18-21.
[3]GONZALEZ R,KANERVISTO A,HAUTAMÄKI V,et al. Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification[J].arXiv:1804.08910,2018.
[4]TAO D Y.Study on Speaker Recognition Under Electronic Disguised Voices[D].Nanjing:Nanjing University of Posts and Telecommunications,2016.(in Chinese) 陶定元.电子伪装语音下的说话人识别方法研究[D].南京:南京邮电大学,2016.
[5]LI Y P,TAO D Y,LIN L.Study on Electronic Disguised Voice Speaker Recognition Based on DTW Model Compensation [J].Computer Technology and Development,2017(1):93-96.(in Chinese) 李燕萍,陶定元,林乐.基于DTW模型补偿的伪装语音说话人识别研究[J].计算机技术与发展,2017(1):93-96.
[6]ZHANG G Q,JIN Y Z,LIU H W,et al.Study on Changing Rules of Electronic Disguised Voice [J].Evidence Science,2010,18(4):503-509.(in Chinese) 张桂清,金怡珠,刘红伟,等.电子伪装语音的变声规律研究[J].证据科学,2010,18(4):503-509.
[7]OORD A,KALCHBRENNER N,VINYALS O,et al.Conditio- nal Image Generation with PixelCNNDecoders[J].arXiv:1606.05328,2016.
[8]OORD A,DIELEMAN S,ZEN H,et al.WaveNet:A Generative Model for Raw Audio[J].arXiv:1609.03499,2016.
[9]CHEN K,ZHANG W,DUBNOV S,et al.The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation[J].arXiv:1811.08380,2018.
[10]YIN W,KANN K,YU M,et al.Comparative Study of CNN and RNN for Natural Language Processing[J].arXiv:1702.01923,2017.
[11]FU W B,SUN T,LIANG J,et al.Review of Principle and Application of Deep Learning[J].COMPUTER SCIENCE,2018,45(s1):24-28,53.(in Chinese) 付文博,孙涛,梁藉,等.深度学习原理及应用综述[J].计算机科学,2018,45(s1):24-28,53.
[12]伍宏,传顾宇,凌震华.基于深度卷积神经网络的语音参数合成器[C]∥第十四届全国人机语音通讯学术会议.江苏,2017.
[13]YU F,KOLTUN V.Multi-Scale Context Aggregation by Dilated Convolutions [C]∥International Conference on Learning Representations.2016.
[14]WANG Z,JI S.Smoothed Dilated Convolutions for Improved Dense Prediction[C]∥ACM SIGKDD Conference on Know-ledge Discovery and Data Mining.London,2018.
[15]TANAKA M.Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network[J].arXiv:1810.01829,2018.
[16]王永全.声像资料司法鉴定实务[M].北京:法律出版社,2013.
[17]MCCANE B,SZYMANSKI L.Some Approximation Bounds for Deep Networks[J].arXiv:1803.02956,2018.
[18]LIU G,XU C,CHEN S Y,et al.Image Classification with Stacked Restricted Boltzmann Machines and Hybrid Neural Network [J].Journal of Chinese Computer Systems,2017,38(9):2146-2151.(in Chinese) 刘罡,徐超,陈思义,等.结合深度置信网络与混合神经网络的图像分类方法[J].小型微型计算机系统,2017,38(9):2146-2151.
[19]赵力.语音信号处理[M].北京:机械工业出版社,2009:72.

相关文章 15

[1]	陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2]	王明, 武文芳, 王大玲, 冯时, 张一飞. 生成链接树:一种高数据真实性的反事实解释生成方法 Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity 计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158
[3]	张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[4]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[5]	宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲. 基于无监督集群级的科技论文异质图节点表示学习方法 Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level 计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[6]	柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[7]	郑文萍, 刘美麟, 杨贵. 一种基于节点稳定性和邻域相似性的社区发现算法 Community Detection Algorithm Based on Node Stability and Neighbor Similarity 计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[8]	吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇. 基于异质信息网的短文本特征扩充方法 Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network 计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241
[9]	徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[10]	聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[11]	曹晓雯, 梁美玉, 鲁康康. 基于细粒度语义推理的跨媒体双路对抗哈希学习模型 Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model 计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[12]	周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[13]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[14]	曲倩文, 车啸平, 曲晨鑫, 李瑾如. 基于信息感知的虚拟现实用户临场感研究 Study on Information Perception Based User Presence in Virtual Reality 计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200
[15]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于DC-CNN的电子伪装语音还原研究

Study on Restoration of Electronic Disguised Voice Based on DC-CNN

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0