宽窄带语谱图融合分带投影的特定人汉语词汇识别

doi:10.11896/j.issn.1002-137X.2016.11A.049

摘要/Abstract

摘要： 提出一种基于宽窄带语谱图融合分带投影的方法对特定人二字汉语词汇进行识别。该方法将图像处理技术应用到语音识别领域,在图像特征提取过程中,首先对窄带语谱图进行等宽度分带行投影和二进宽度分带行投影,并将其分别作为窄带语谱图的第1个特征集合和第2个特征集合,同时将窄带语谱图进行再次图像傅里叶变换之后进行等宽度行投影,作为第3个特征集合。然后对宽带语谱图进行等宽度分带列投影,作为第4个特征集合。将上述特征集合作为识别的特征向量,以支持向量机为分类器进行特定人二字汉语词汇整体识别。采用1000个语音样本进行仿真实验,结果表明,采用前3个特征集合的特征向量对特定人二字汉语词汇识别的正确识别率可达92.4%,采用第4个特征集合的特征值对特定人二字词汇识别的正确识别率可达80%,而采用上述4个特征集合的特征值融合对特定人二字汉语词汇识别的正确识别率可达95.4%。该特征融合的方法为汉语词汇的识别提供了新的思路。

关键词: 语音识别,语谱图,特征融合,行投影,列投影,支持向量机(SVM)

Abstract: A method based on broadband and narrowband spectrogram fusion with zoning projection of specific two words Chinese lexical recognition was presented.In the process of image feature extraction,the image processing technique is applied to the speech recognition field.Firstly,equal width zoning line projection and binary width zoning line projection are carried out to the narrowband spectrogram,and they are set respectively as the narrowband spectrogram of the first characteristic set and the second characteristic set.Meanwhile,equal width zoning line projection is carried out again to the narrowband spectrogram after Fourier transform,treating it as the third feature set.Then,equal width column projection is carried out to the broadband spectrogram,regarding it as the fourth feature set.The above three feature sets are used as feature vectors to support vector machine(SVM) as a classifier for the overall recognition of specific two words Chinese vocabulary.1000 voice samples are used in simulation experiment.The results show that the correct recognition rate of the two words Chinese word recognition by the first three feature sets is 92.4％.The correct recognition rate of two words vocabulary recognition using fourth feature sets is 80％.The correct recognition rate of the two words Chinese word recognition by using the feature value fusion of the above four features can reach 95.4％.This method of feature fusion provides a new way of thinking of Chinese vocabulary overall recognition.

Key words: Speech recognition,Spectrogram,Feature fusion,Line projection,Column projection,Support vector machine(SVM)

魏莹,王双维,潘迪,张玲,许廷发,梁士利. 宽窄带语谱图融合分带投影的特定人汉语词汇识别[J]. 计算机科学, 2016, 43(Z11): 215-219. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.049

WEI Ying, WANG Shuang-wei, PAN Di, ZHANG Ling, XU Ting-fa and LIANG Shi-li. Specific Two Words Chinese Lexical Recognition Based on Broadband and Narrowband Spectrogram Feature Fusion with Zoning Projection[J]. Computer Science, 2016, 43(Z11): 215-219. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.049

参考文献

[1] 蔡莲红,黄德智,蔡锐.现代语音技术基础与应用[M].北京:清华大学出版社,2003
[2] 潘凌云,孙达传,吴美朝.语音识别中基于语谱图的语音音素分割方法[J].杭州大学学报(自然科学版),1995,22(1):42-46
[3] Zue V W,Lamel L F.An Expert Spectrogram Reader:A Know-ledge—Based Approach to Speech Recognition[C]∥IEEE International Conference on Acoustics,Speech,and Signal Proces-sing.1986:1197-1200
[4] Klatt D H,Stevens K N.On the Automatic Recognition of Continuous Speech:Implications from a Spectrogram—Reading Experiment[J].IEEE Transactions on Audio and Electroacoustics,1973,21(3):210-217
[5] Riley M D.Schematizing Spectrograms for Speech Recognition[J].J.Acoust.Soc.Am.,1983,73(1):36-46
[6] Kingsbury B E D,Morgan N,Greenberg S.Robust speech recognition using the modulation spectrogram[J].Speech Commcination,1998,25(1-3):117-132
[7] Hiroaki H,Kensaku A,Yuji S,et al.Sound Source Separationwith Two Spectrograms by Image Processing[J].IEEJ Transactions on Electronics,Information and Systems,2005,124(12):2439-2445
[8] Khunarsal P,Lursinsap C,Raicharoen T P.Singing Voice Re-cognition Based on Mat-Chin of Spectrogram Pattern [C]∥Proceedings of International Joint Conference on Neural Networks.2009:1595-1599
[9] Shirin B,Richard R.A performance monitoring approach to fusing enhanced spectrogram channels in robust speech recognition[C]∥Proceedings of the Annual Conference of the International Speech Communication Association.2011:477-480
[10] Zhang Jin-song,Keikichi H.Tone nucleus modeling for Chinese lexical tone recognition[J].Speech Communication,2004,42(3/4):447-466
[11] Zhang Hua-ping,Liu Qun.Automatic recognition of Chinesepersonal name based on role tagging[J].Chinese Journal of Computers,2004,27(1):85-91
[12] Zhang S X,Gales M J F.Structured SVMs for automatic speech recognition[J].IEEE Transactions on Audio,Speech and Language Processing,2013,21(5):544-555
[13] Neammalai P,Phimoltares S,Lursinsap C.Speech and MusicClassification using Hybrid Form of Spectrogram and Fourier Transformation[C]∥ 2014 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association(APSIPA).2014:1-6
[14] 马义德,袁敏,齐春亮,等.基于 PCNN 的语谱图特征提取在说话人识别中的应用[J].计算机工程与应用,2005,41(20):81-84
[15] Awais M M,Waqas A,Masud S,et al.Continuous ArabicSpeech Segmentation using FFT Spectrogram[C]∥Innovations in Information Technology.2006
[16] Kensaku A,Akira O.Reduction of Noise in Speech Signalsthrough Image Processing Using the Spectrogram[J].IEEJ Transactions on Electronics,Information and Systems,2006,126(12):1483-1489
[17] Ajmera P K,Djmera,Jadhav D V,et al.Text-independentSpeaker Identification Using Radon and Discrete Cosine Transforms based Features from Speech Spectrogram[J].Pattern Recognition,2011,44(10/11):2749-2759
[18] Kekre H B,Athawale A,Desai M.International Conference and Workshop on Emerging Trends in Technology[C]∥ICWET Conference Proceedings.2011:171-174
[19] Steinberg R,O’Shaughnessy D.Segmentation of a speech spectrogram using mathematical morphology.ICASSP[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing Proceedings.2008:1637-1640
[20] Wu Di,Zhao He-ming,Huang Cheng-wei,et al.Speech EndpointDetection in Low-SNRs Environment Based on Perception Spectrogram Structure Boundary Parameter[J].Chinese Journal of Acoustical,2014,39(4):428-440
[21] Wang K C.The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech[J].Journal Citation Reports,2014,14(9):16692-16714
[22] Xu Sen,Zhao Xu,Duan Cheng-hua,et al.A Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels Recognition[J].Applied Mechanics and Materials,2014(571/572):665-671
[23] Dutta,Tridibesh.Dynamic time warping based approach to text-dependent speaker identification using spectrograms[C]∥Proceedings-1st International Congress on Image and Signal Processing,CISP.2008:354-360
[24] 赵力.语音信号处理[M].北京:机械工业出版社,2009:29-30
[25] Zhang Yue.The Research on Spectrogram of a particular group of Small-Vocabulary Recognition Algorithm[D].Changchun:Northeast Normal University,2013
[26] Hll D L,Llinas J.Handbook of Multisensor Data Fusion[M]∥The Electrical Engineering and Applied Signal Processing Series,CRC Press,2001
[27] Liu Tong-ming,Xia Zu-xun,Xie Hong-cheng.Data Fusion Technology with Application[M].Beijing:National Defence Industry Press,1998
[28] Blum R S,Xue Z,Zhang Z.Multisensor Image Fusion and itsApplications[M].Boca Raton,CRC Press,2005
[29] 张家騄.汉语人机语言通讯基础[M].上海科学技术出版社,2010:328-332
[30] 李明宇.现代汉语常用词表[M].北京:商务印书馆出版社,2008
[31] 邓乃扬,田英杰.数据挖掘中的新方法:支持向量机[M].科学出版社,2009
[32] Chang C C,Lin C J.A Library for Support Vector Machines[M].National Taiwan University,2001

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed