计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 108-111.
郑纯军1,2, 贾宁2
ZHENG Chun-jun1,2, JIA Ning2
摘要: 现代人大多忽略了朗读的重要性,然而对于5~12岁的儿童,朗读不仅是学习过程中必备的技能,还是陶冶情操的有效手段。由于朗读语音信号的特征与评价标准之间存在着非线性关系,递归神经网络虽然适用于时间序列的预测,但是对长时间跨度的预测效果有限。基于此,根据儿童朗读语音特点及其评价体系,设计了一种基于DeepSpeech与三层长短期记忆(Long Short-Term Memory,LSTM)神经网络相结合的模型。首先,在添加注意力机制的基础上,提出朗读语音评价的准确性和流利性度量,以频谱图作为特征提取的输入,其中,朗读评价的准确性采用改进后的Deep Speech以提高音素识别的准确率,流利性评价将频谱图送至三层LSTM模型中以呈现时间序列的影响;然后,将结果送入注意力机制进行权重调节;最终,将计算的总评价结果用于儿童朗读语音的评分。使用“出口成章”软件提供的儿童朗读语料库和TensorFlow平台进行实验。结果表明,与传统的模型相比,此模型不仅可以精确判断朗读的正确性和朗读的流利性,而且其评价模型获得的评分结果较准确。
中图分类号:
[1]孙丽妍.如何培养小学生的语文朗读能力[J].语文建设,2018,12:97. [2]BERTIN-MAHIEUX T,ELLIS D P W.Large-scale cover songrecognition using hashed chroma landmarks[C]∥2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).NY,USA:IEEE,2011:117-120. [3]OORD A V D,DIELEMAN S,ZEN H G,et al.Wavenet:A ge-nerative model for raw audio[C]∥arXiv:1609.03499.2016. [4]EZZAT S,EL GAYAR N,GHANEM M M.Sentiment analysis of call centre audio conversations using text classification[J].International Journal of Computer Information Systems and Industrial Management Applications,2012,4(1):619-627. [5]韩文静,李海峰,阮华斌,等.语音情感识别研究进展综述[J].软件学报,2014,25(1):37-50. [6]TRABELSI I,AYED D B.On the Use of Different Feature Extraction Methods for Linear and Non Linear kernels[J].Computer Science,2014. [7]PALO H K,MOHANTY M N,CHANDRA M.Computational Vision and Robotics[C]∥Advances in Intelligent Systems and Computing.2015:63-70. [8]RODDY C.Emotion recognition in human-computer interaction[J].Signal Processing Magazine,IEEE,2001,18(1):32-80. [9]GEROSA M,LEE S,GIULIANI D,et al.Analyzing children’s speech:An acoustic study of consonants and consonant-vowel transition[C]∥Proc.IEEE Int.Conf.Acoustics,Speech and Signal Processing,2006(ICASSP 2006).2006:1393-1396. [10]YILDIRIM S,NARAYANAN S,BOYD D,et al.Acoustic analy-sis of preschool children’s speech[C]∥Proc.15th ICPhS.2003:949-952. [11]LI,RUSSELL M J.An analysis of the causes of increased error rates in children’s speech recognition[C]∥Seventh Internatio-nal Conference on Spoken Language Processing.2002. [12]AFAVI S,NAJAFIAN M,HANANI A,et al.Speaker recognition for children’s speech[J].arXiv:1609.07498,2016. [13]LI P C,SONG Y,MCLOUGHLIN I V,et al.An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition[C]∥Interspeech.2018:3087-3091 [14]BADSHAH A M,AHMAD J,RAHIM N,et al.Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network[C]∥International Conference on Platform Technology & Service.IEEE,2017. [15]ETIENNE C,FIDANZA G,PETROVSKII A,et al.CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation[J].Computer Science,2018. [16]CUMMINS N,AMIRIPARIAN S,HAGERER G.An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech[C]∥ACM on Multimedia Conference.2017. [17]KANG J,ZHANG W Q,LIU J.Gated recurrent units based hybrid acoustic models for robust speech recognition[C]∥2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).IEEE,2016. [18]HUANG Y S,CHOU S Y,YANG Y H.Pop Music Highligh-ter:Marking the Emotion Keypoints[J].arXiv:1802.10495,2018. [19]MIRSAMADI S,BARSOUM E,ZHANG C.Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention[C]∥ICASSP.IEEE,2017. [20]PASSALIS N,TEFAS A.Neural bag-of-features learning[J].Pattern Recognition,2017,64:277-294. [21]KAUSHIK L,SANGWAN A,HANSEN J H.Automatic audio sentiment extraction using keyword spotting[C]∥Sixteenth Annual Conference of the International Speech Communication Association.2015. [22]AMODEI D,ANUBHAI R,BATTENBERG E,et al.DeepSpeech 2:End-to-End Speech Recognition in English and Mandarin[J].Computer Science,2015. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[3] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[4] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[5] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[6] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[8] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[9] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[10] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[11] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[12] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[13] | 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075 |
[14] | 彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093 |
[15] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
|