计算机科学 ›› 2019, Vol. 46 ›› Issue (5): 286-289.doi: 10.11896/j.issn.1002-137X.2019.05.044

• 图形图像与模式识别 • 上一篇    下一篇

融合CFCC和Teager能量算子倒谱参数的语音识别

史燕燕, 白静   

  1. (太原理工大学信息与计算机学院 太原030024)
  • 发布日期:2019-05-15
  • 作者简介:史燕燕(1994-),女,硕士生,主要研究方向为语音信号处理,E-mail:690742874@qq.com;白 静(1965-),女,博士,教授,硕士生导师,主要研究方向为语音信号处理,E-mail:bj613@126.com(通信作者)。
  • 基金资助:
    山西省青年科技研究基金,山西省科技攻关(社会发展)项目资助。

Speech Recognition Combining CFCC and Teager Energy Operators Cepstral Coefficients

SHI Yan-yan, BAI Jing   

  1. (College of Information and Computer,Taiyuan University of Technology,Taiyuan 030024,China)
  • Published:2019-05-15

摘要: 针对现有表征语音特性的特征提取不完善的问题,提出了一种耳蜗滤波倒谱系数(Cochlear Filter Cepstral Coefficients,CFCC)和Teager能量算子倒谱参数(Teager Energy Operators Cepstral Coefficients,TEOCC)相互融合的方法。该方法将表征人耳听觉特性的CFCC和体现非线性能量特性的TEOCC的融合特征应用到语音识别系统中,并联合主成分分析(Principal Components Analysis,PCA)对该融合特征进行特征选择和优化,最后通过支持向量机进行语音识别。实验结果表明:该融合特征与单一特征相比具有更佳的语音识别性能,结合PCA后其语音识别的准确率平均提高了3.7%。

关键词: Teager能量算子倒谱参数, 耳蜗滤波倒谱系数, 语音识别, 主成分分析

Abstract: In view of the imperfection of the existing features which represent the speech characteristics,this paper proposed a mutual integration method based on Cochlear Filter Cepstral Coefficients and Teager Energy Operators Cepstral Coefficients.First,the fusion feature of CFCC that reflects human auditory characteristics and TEOCC that embodies nonlinear energy characteristics is applied to speech recognition system.Then principal component analysis is applied to the selection and optimization of fusion features.Finally,support vector machine is used for speech recognition.The results show that the proposed fusion features can achieve better speech recognition performance than single feature,and after combining PCA,the accuracy rate of speech recognition is increased by 3.7% on average.

Key words: CFCC, PCA, Speech recognition, TEOCC

中图分类号: 

  • TN912.34
[1]GAO Y.Cochlear Filter Cepstral Feature in Speech recognition[D].Taiyuan:Taiyuan University of Technology,2011.(in Chinese)高扬.耳蜗滤波器倒谱特征在语音识别中的应用[D].太原:太原理工大学,2011.
[2]WANG L,MINAMI K,YAMAMOTO K,et al.Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions[J].IEICE Transactions on Information & Systems,2010,93-D(9):2397-2406.
[3]LI Q.An auditory-based transform for audio signal processing[C]∥IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,2009(WASPAA’09).IEEE,2009:181-184.
[4]LI Q,HUANG Y.An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions[J].IEEE Transactions on Audio Speech & Language Processing,2011,19(6):1791-1801.
[5]LI Z Q,GAO Y.Robust speaker identification based on CFCC and phase information[J].Computer Engineering and Applications,2015,51(17):228-232.(in Chinese)李作强,高勇.基于CFCC和相位信息的鲁棒性说话人辨识[J].计算机工程与应用,2015,51(17):228-232.
[6]PATEL T B,PATIL H.Combining Evidences from Mel Cepstral,Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs.Spoofed Speech[C]∥The Conference of International Speech Communication Association.2015.
[7]PATEL T B,PATIl H A.Cochlear Filter and InstantaneousFrequency Based Features for Spoofed Speech Detection[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(4):618-631.
[8]BANDELA S R,KUMAR T K.Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC[C]∥International Conference on Computing,Communication and Networking Technologies.IEEE Computer Society,2017:1-5.
[9]SREERAJ V V,RAJAN R.Automatic dialect recognition using feature fusion[C]∥International Conference on Trends in Electronics and Informatics.2017:435-439.
[10]LI J J,AN D,YANG D,et al.TEO-CFCC Characteristic Para-meter Extraction Method for Speaker Recognition in Noisy Environments[J].Computer Science,2012,39(12):195-197.(in Chinese)李晶皎,安冬,杨丹,等.噪声环境下说话人识别的TEO-CFCC特征参数提取方法[J].计算机科学,2012,39(12):195-197.
[11]WU D,CAO J,WANG J H.Speaker recognition based on adapted Gaussian mixture model and static and dynamic auditory feature fusion[J].Optics and Precision Engineering,2013,21(6):1598-1604.(in Chinese)吴迪,曹洁,王进花.基于自适应高斯混合模型与静动态听觉特征融合的说话人识别[J].光学精密工程,2013,21(6):1598-1604.
[12]KAISER J F.On a simple algorithm to calculate the ‘energy’of a signal[C]∥International Conference on Acoustics,Speech,and Signal Processing.IEEE,2002:381-384.
[13]WANG M R,ZHOU P,JING X X.Mixed Peramaters of Mel Frequency Cepstral and Short-time TEO Energy in Speaker Re-cognition[J].Microelectronics & Computer,2016,33(1):144-148.(in Chinese)王茂蓉,周萍,景新幸.MFCC和短时TEO能量的混合参数应用于说话人识别[J].微电子学与计算机,2016,33(1):144-148.
[14]LI J,ZHOU P,DU Z R.Application of short-time TEO energy in noisy speech endpoint detection[J].Computer Engineering and Applications,2013,49(12):144-147.(in Chinese)李杰,周萍,杜志然.短时TEO能量在带噪语音端点检测中的应用[J].计算机工程与应用,2013,49(12):144-147.
[15]JIANG H H,HU B.Speech Emotion Recognition in Mandarin based on PCA and SVM[J].Computer Science,2015,42(11):270-273.(in Chinese)蒋海华,胡斌.基于PCA和SVM的普通话语音情感识别[J].计算机科学,2015,42(11):270-273.
[16]YUE Q Q,ZHOU P,JING X X.The Auditory Feature Extraction Algorithm Based on Power-law Nonlinearity Function[J].Microelectronics and Computers,2015(6):163-166.(in Chinese)岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015(6):163-166.
[1] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[2] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[3] 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏.
Grassberger熵随机森林在窃电行为检测的应用
Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection
计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032
[4] 程高峰, 颜永红.
多语言语音识别声学模型建模方法最新进展
Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods
计算机科学, 2022, 49(1): 47-52. https://doi.org/10.11896/jsjkx.210900013
[5] 杨润延, 程高峰, 刘建.
基于端到端语音识别的关键词检索技术研究
Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition
计算机科学, 2022, 49(1): 53-58. https://doi.org/10.11896/jsjkx.210800269
[6] 吴善杰, 王新.
基于AGA-DBSCAN优化的RBF神经网络构造煤厚度预测方法
Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks
计算机科学, 2021, 48(7): 308-315. https://doi.org/10.11896/jsjkx.200800110
[7] 胡昕彤, 沙朝锋, 刘艳君.
基于随机投影和主成分分析的网络嵌入后处理算法
Post-processing Network Embedding Algorithm with Random Projection and Principal Component Analysis
计算机科学, 2021, 48(5): 124-129. https://doi.org/10.11896/jsjkx.200500058
[8] 王艺皓, 丁洪伟, 李波, 保利勇, 张颖婕.
基于聚类与特征融合的蛋白质亚细胞定位预测
Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion
计算机科学, 2021, 48(3): 206-213. https://doi.org/10.11896/jsjkx.200200081
[9] 冯安然, 王旭仁, 汪秋云, 熊梦博.
基于PCA和随机树的数据库异常访问检测
Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree
计算机科学, 2020, 47(9): 94-98. https://doi.org/10.11896/jsjkx.190800056
[10] 郑纯军, 王春立, 贾宁.
语音任务下声学特征提取综述
Survey of Acoustic Feature Extraction in Speech Tasks
计算机科学, 2020, 47(5): 110-119. https://doi.org/10.11896/jsjkx.190400122
[11] 崔阳, 刘长红.
基于PIFA的语音识别系统评测平台
PIFA-based Evaluation Platform for Speech Recognition System
计算机科学, 2020, 47(11A): 638-641. https://doi.org/10.11896/jsjkx.200500097
[12] 张经, 杨健, 苏鹏.
语音识别中单音节识别研究综述
Survey of Monosyllable Recognition in Speech Recognition
计算机科学, 2020, 47(11A): 172-174. https://doi.org/10.11896/jsjkx.200200006
[13] 张明月, 王静.
基于深度学习的交互似然目标跟踪算法
Interactive Likelihood Target Tracking Algorithm Based on Deep Learning
计算机科学, 2019, 46(2): 279-285. https://doi.org/10.11896/j.issn.1002-137X.2019.02.043
[14] 高忠石, 苏旸, 柳玉东.
基于PCA-LSTM的入侵检测研究
Study on Intrusion Detection Based on PCA-LSTM
计算机科学, 2019, 46(11A): 473-476.
[15] 王鹏飞, 张杭.
欠定条件下基于主成分的亚采样信号重构
Sub-sampling Signal Reconstruction Based on Principal Component Under Underdetermined Conditions
计算机科学, 2019, 46(10): 103-108. https://doi.org/10.11896/jsjkx.190700195
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!