基于Deep Speech与多层LSTM的儿童朗读语音评价模型

Abstract

Abstract: Most modern people ignore the importance of reading.However,for children aged 5~12,reading aloud is not only an essential skill in the learning process,but also an effective means of cultivating sentiment.Since there is a nonlinear relationship between the characteristics of the spoken speech signal and the evaluation criteria,the recurrent neural network is suitable for time series prediction,but its prediction effect is limited for long-term span.According to the characteristics of children’s spoken speech and its evaluation system,a new model combining Deep Speech and three-la-yer LSTM (Long Short-Term Memory) neural network was designed.Firstly,on the basis of adding attention mechanism,the accuracy and fluency measure of speech evaluation are put forward,and the spectrum map is used as the input of feature extraction.Among them,the accuracy of reading uses the new version of Deep Speech to improve the accuracy of phoneme recognition.For fluency evaluation,the spectrogram is sent to the three-layer LSTM model to present the effects of the time series.Then,the results are sent to the attention mechanism for weight adjustment,and finally the total evaluation results are used for the evaluation of children's spoken speech.The experiment uses the children’s reading corpus,which is provided by the “export chapter” software,and the experimental environment uses the TensorFlow platform.The experimental results show that compared with the traditional model,this model can accurately judge the correctness of spoken speech and the fluency of reading aloud,and the scoring results obtained by its evaluation model are more accurate.

Key words: Attention mechanism, DeepSpeech, Evaluation of spoken speech models, Long Short-Term Memory, Spectrogram

CLC Number:

TP183

ZHENG Chun-jun, JIA Ning. Children’s Reading Speech Evaluation Model Based on Deep Speech and Multi-layer LSTM[J].Computer Science, 2019, 46(11A): 108-111.

References

[1]孙丽妍.如何培养小学生的语文朗读能力[J].语文建设,2018,12:97.
[2]BERTIN-MAHIEUX T,ELLIS D P W.Large-scale cover songrecognition using hashed chroma landmarks[C]∥2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).NY,USA:IEEE,2011:117-120.
[3]OORD A V D,DIELEMAN S,ZEN H G,et al.Wavenet:A ge-nerative model for raw audio[C]∥arXiv:1609.03499.2016.
[4]EZZAT S,EL GAYAR N,GHANEM M M.Sentiment analysis of call centre audio conversations using text classification[J].International Journal of Computer Information Systems and Industrial Management Applications,2012,4(1):619-627.
[5]韩文静,李海峰,阮华斌,等.语音情感识别研究进展综述[J].软件学报,2014,25(1):37-50.
[6]TRABELSI I,AYED D B.On the Use of Different Feature Extraction Methods for Linear and Non Linear kernels[J].Computer Science,2014.
[7]PALO H K,MOHANTY M N,CHANDRA M.Computational Vision and Robotics[C]∥Advances in Intelligent Systems and Computing.2015:63-70.
[8]RODDY C.Emotion recognition in human-computer interaction[J].Signal Processing Magazine,IEEE,2001,18(1):32-80.
[9]GEROSA M,LEE S,GIULIANI D,et al.Analyzing children’s speech:An acoustic study of consonants and consonant-vowel transition[C]∥Proc.IEEE Int.Conf.Acoustics,Speech and Signal Processing,2006(ICASSP 2006).2006:1393-1396.
[10]YILDIRIM S,NARAYANAN S,BOYD D,et al.Acoustic analy-sis of preschool children’s speech[C]∥Proc.15th ICPhS.2003:949-952.
[11]LI,RUSSELL M J.An analysis of the causes of increased error rates in children’s speech recognition[C]∥Seventh Internatio-nal Conference on Spoken Language Processing.2002.
[12]AFAVI S,NAJAFIAN M,HANANI A,et al.Speaker recognition for children’s speech[J].arXiv:1609.07498,2016.
[13]LI P C,SONG Y,MCLOUGHLIN I V,et al.An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition[C]∥Interspeech.2018:3087-3091
[14]BADSHAH A M,AHMAD J,RAHIM N,et al.Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network[C]∥International Conference on Platform Technology & Service.IEEE,2017.
[15]ETIENNE C,FIDANZA G,PETROVSKII A,et al.CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation[J].Computer Science,2018.
[16]CUMMINS N,AMIRIPARIAN S,HAGERER G.An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech[C]∥ACM on Multimedia Conference.2017.
[17]KANG J,ZHANG W Q,LIU J.Gated recurrent units based hybrid acoustic models for robust speech recognition[C]∥2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).IEEE,2016.
[18]HUANG Y S,CHOU S Y,YANG Y H.Pop Music Highligh-ter:Marking the Emotion Keypoints[J].arXiv:1802.10495,2018.
[19]MIRSAMADI S,BARSOUM E,ZHANG C.Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention[C]∥ICASSP.IEEE,2017.
[20]PASSALIS N,TEFAS A.Neural bag-of-features learning[J].Pattern Recognition,2017,64:277-294.
[21]KAUSHIK L,SANGWAN A,HANSEN J H.Automatic audio sentiment extraction using keyword spotting[C]∥Sixteenth Annual Conference of the International Speech Communication Association.2015.
[22]AMODEI D,ANUBHAI R,BATTENBERG E,et al.DeepSpeech 2:End-to-End Speech Recognition in English and Mandarin[J].Computer Science,2015.

Related Articles 15

[1]	ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2]	DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4]	XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[5]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[6]	ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[7]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[9]	WANG Xin-tong, WANG Xuan, SUN Zhi-xin. Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network [J]. Computer Science, 2022, 49(8): 314-322.
[10]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11]	WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[12]	JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[13]	XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
[14]	PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247.
[15]	ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Children’s Reading Speech Evaluation Model Based on Deep Speech and Multi-layer LSTM

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0