语音识别中单音节识别研究综述

doi:10.11896/jsjkx.200200006

计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 172-174.doi: 10.11896/jsjkx.200200006

• 计算机图形学&多媒体 • 上一篇下一篇

语音识别中单音节识别研究综述

张经, 杨健, 苏鹏

大理大学数学与计算机学院云南大理 671003

出版日期:2020-11-15 发布日期:2020-11-17
通讯作者: 杨健(sbjc1215@126.com)
作者简介:zhang_gold@163.com
基金资助:
云南省哲学社会科学规划项目项目(YB2017072);云南省地方高校联合基金面上项目(2018FH 001-064)

Survey of Monosyllable Recognition in Speech Recognition

ZHANG Jing, YANG Jian, SU Peng

School of Mathematics and Computer Science,Dali University,Dali,Yunnan 671003,China

Online:2020-11-15 Published:2020-11-17
About author:ZHANG Jing,born in 1997,postgra-duate.Her research interests include speech recognition and deep neural network.
YANG Jian,born in 1976,Ph.D,asso-ciate professor,is a member of China Computer Federation.His research interests include speech recognition and deep neural network.
Supported by:
This work was supported by the Yunnan Philosophy and Social Sciences Planning Project (YB2017072) and General Project of Joint Fund of Local Colleges and Universities in Yunnan Province(2018FH 001-064).

摘要/Abstract

摘要： 声学模型建模可实现对语音信号的处理和特征抽取,是语音识别过程中必不可少的基础性工作,同时也是影响语音识别整体性能的一个重要因素。在语音识别中,选择合适的建模基元能使后续系统获得更高的准确率和更强的鲁棒性。音节是汉语等汉藏语系的最小发音单位,针对其发音特点,研究使用音节作为汉藏语系语音识别的建模基元,再提取相应的特征进行识别就有着尤为重要的意义。针对单音节识别目前的研究进展,首先介绍了基于有限状态矢量量化的算法,以及其改进算法在单音节识别中的研究成果;然后介绍了基于隐马尔可夫模型的算法,并详细介绍了将隐马尔可夫模型与其他算法相结合的音节识别研究成果;接着介绍了基于神经网络的算法;最后总结并提出了单音节识别研究未来发展的重要方向。

关键词: 单音节识别, 人工神经网络, 矢量量化, 隐马尔可夫模型, 语音识别

Abstract: Acoustic model modeling realizes the processing of speech signals and feature extraction,which is an essential basic work in the process of speech recognition and an important factor affecting the overall performance of speech recognition.In speech recognition,selecting appropriate modeling primitives can make subsequent systems obtain higher accuracy and stronger robustness.Syllable is the smallest pronunciation unit of Sino-Tibetan languages such as Chinese.According to its pronunciation characteristics,it is of great significance to study the use of syllable as the modeling element of Sino-Tibetan language speech re-cognition and to extract the corresponding features for recognition.In view of the current research progress of monosyllabic re-cognition,this paper first introduces the algorithm based on finite state vector quantization and the research results of its improved algorithm in monosyllabic recognition.Then the algorithm based on hidden Markov model is introduced,and the syllable recognition research results combining hidden Markov model with other algorithms are introduced in details,and then the algorithm based on neural network is introduced.Finally,the important development direction of monosyllabic recognition research in the future is summarized and proposed.

Key words: Artificial neural network, Hidden Markov model, Monosyllable recognition, Speech recognition, Vector quantization

中图分类号:

TN912.34

张经, 杨健, 苏鹏. 语音识别中单音节识别研究综述[J]. 计算机科学, 2020, 47(11A): 172-174. https://doi.org/10.11896/jsjkx.200200006

ZHANG Jing, YANG Jian, SU Peng. Survey of Monosyllable Recognition in Speech Recognition[J]. Computer Science, 2020, 47(11A): 172-174. https://doi.org/10.11896/jsjkx.200200006

参考文献

[1] LI J,XU M X,ZHANG J Y,etal.Comparison of Acoustic Model Elements in Chinese Continuous Speech Recognition:Syllable,Phoneme and Vowel[C]//Proceedings of the 6th National Conference on Man-machine Voice Communication.2001:391-395.
[2] MA Z X,WANG H,LI X.Overview of Speech RecognitionTechnology[J].Journal of Changji University,2006(3):93-97.
[3] ZHAN X Y,WU J P,ZHANG Y W.Optimum Vetor Quantizational Codebook Design for Speeker Recognition[J].International Conference on Signal Processing Proceedings,2004,7:14-16.
[4] HOU L M,ZHANG J Q.Design of a Speaker Recognition System Based on Chinese Monosyllables[J].Journal of Lanzhou University,1996(4):81-85.
[5] JIA Z J.Research and Implementation of Chinese Speech Recognition Technology[D].Tianjin:Tianjin University of Technology,2013.
[6] CHU S C,TSAI P W,PAN J S.Cat swarm optimization[C]//9th Pacific kim International Conference on Artificial Intelligence.Berlin:Springer,2006:854-858.
[7] LU Y H.Research on Key Issues of Speech Recognition Technology[D].Shaanxi:Shaanxi Normal University,2014.
[8] CAO G B.Research on Continuous Speech Recognition Techno-logy Based on HMM[D].Nanjing:Nanjing University of Science and Technology,2018.
[9] DEIVAPALAN P G.A segmented syllable-based isolated word recognizer for indian languages [D].Indian Institute of Techno-logy Madras,2008.
[10] THANGARAJAN R,NATARAJAN A M,SELVAM M.Word and triphone based approaches in continuous speech recognition for Tamil language[J].WSEAS Transactions on Signal Proces-sing,2008,4(3):76-85.
[11] WANG Z Y,XIAO X.HMM speech recognition model based on segment length distribution[J].Acta Electronica Sinica,2004(1):46-49.
[12] CHAO H,YANG Z L,LIU W J.Improved algorithm of syllable-based acoustic model in Chinese speech recognition[J].Journal of Computer Applications,2013,33(6):1742-1745.
[13] CAO H.A New Chinese Monosyllable Recognition Method[J].Journal Publishing Center of Tsinghua University Press,1990(4):87-92.
[14] WU Y.Design and Implementation of Chinese Speech SyllableRecognition Algorithm Based on SVM and HMM[D].Univer-sity of Electronic Science and Technology of China,2010.
[15] ZHOU N,ZHAO Y,LI Y Q,et al.Study on Continuous Speech Recognition Based on Bottleneck Features for Lhasa-Tibetan Dialect[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2018,54(2):249-254.
[16] WU J F.Research and Application of RNN-DNN Speech Recognition System[D].South China University of Technology,2018.
[17] ZHAO Y B,SUN S H.A TTRNN-based Chinese syllable full syllable recognition method[J].Journal of Harbin Institute of Technology,2001(2):213-216.
[18] PAUL J W.Backpropagation through time what it does and how to do it[J].Proceedings of the IEEE,1990,78(10):575-580.
[19] ZHU S,CHEN D W,HUANG T Y.Feature parameter curve methods for high-performance NN-based speech recognition[C]//IEEE International Conference on Acoustics.IEEE,2002.

相关文章 15

[1]	王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[2]	宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3]	费星瑞, 谢逸. 基于HMM-NN的用户点击流识别 Click Streams Recognition for Web Users Based on HMM-NN 计算机科学, 2022, 49(7): 340-349. https://doi.org/10.11896/jsjkx.210600127
[4]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[5]	王欣, 向明月, 李思颖, 赵若成. 基于隐马尔可夫模型的铁路出行团体关系预测研究 Relation Prediction for Railway Travelling Group Based on Hidden Markov Model 计算机科学, 2022, 49(6A): 247-255. https://doi.org/10.11896/jsjkx.210500001
[6]	程高峰, 颜永红. 多语言语音识别声学模型建模方法最新进展 Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods 计算机科学, 2022, 49(1): 47-52. https://doi.org/10.11896/jsjkx.210900013
[7]	杨润延, 程高峰, 刘建. 基于端到端语音识别的关键词检索技术研究 Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition 计算机科学, 2022, 49(1): 53-58. https://doi.org/10.11896/jsjkx.210800269
[8]	池昊宇, 陈长波. 基于神经网络的循环分块大小预测 Prediction of Loop Tiling Size Based on Neural Network 计算机科学, 2020, 47(8): 62-70. https://doi.org/10.11896/jsjkx.191200180
[9]	王鹏, 苏伟, 张久文, 刘映杰, 王臻睿. 基于船舶自动识别系统与人工神经网络的船舶载重预测 Prediction of Vessel Load Based on Vessel Automatic Identification System and Artificial Neural Network 计算机科学, 2020, 47(6A): 49-53. https://doi.org/10.11896/JsJkx.191000074
[10]	郑纯军, 王春立, 贾宁. 语音任务下声学特征提取综述 Survey of Acoustic Feature Extraction in Speech Tasks 计算机科学, 2020, 47(5): 110-119. https://doi.org/10.11896/jsjkx.190400122
[11]	张成伟, 罗凤娥, 代毅. 基于数据挖掘的指定航班计划延误预测方法 Prediction Method of Flight Delay in Designated Flight Plan Based on Data Mining 计算机科学, 2020, 47(11A): 464-470. https://doi.org/10.11896/jsjkx.200600001
[12]	崔阳, 刘长红. 基于PIFA的语音识别系统评测平台 PIFA-based Evaluation Platform for Speech Recognition System 计算机科学, 2020, 47(11A): 638-641. https://doi.org/10.11896/jsjkx.200500097
[13]	史燕燕, 白静. 融合CFCC和Teager能量算子倒谱参数的语音识别 Speech Recognition Combining CFCC and Teager Energy Operators Cepstral Coefficients 计算机科学, 2019, 46(5): 286-289. https://doi.org/10.11896/j.issn.1002-137X.2019.05.044
[14]	岳鑫, 杜军威, 胡强, 王延平. 一种故障树结构匹配算法及其应用 Fault Tree Structure Matching Algorithm and Its Application 计算机科学, 2018, 45(9): 202-206. https://doi.org/10.11896／j.issn.1002-137X.2018.09.033
[15]	宫法明,朱朋海. 基于自适应隐马尔可夫模型的石油领域文档分词 Word Segmentation Based on Adaptive Hidden Markov Model in Oilfield 计算机科学, 2018, 45(6A): 97-100.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

语音识别中单音节识别研究综述

Survey of Monosyllable Recognition in Speech Recognition

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0