计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 59-64.doi: 10.11896/jsjkx.210900007
李昭奇, 黎塔
LI Zhao-qi, LI Ta
摘要: 样例关键词识别是将语音关键词片段与语音流中的片段匹配的任务。在低资源或零资源的情况下,样例关键词识别通常采用基于动态时间规正的方法。近年来,神经网络声学词嵌入已成为一种常用的样例关键词识别方法,但神经网络的方法受限于标注数据数量。使用wav2vec预训练可以减少神经网络对数据量的依赖,提升系统的性能。使用wav2vec模型提取的预训练特征直接替换梅尔频率倒谱系数特征后,在SwitchBoard语料库中提取的数据集上使双向长短时记忆网络的神经网络声学词嵌入系统的平均准确率提高了11.1%,等精度召回值提高了10.0%。将wav2vec特征与梅尔频率倒谱系数特征相融合以提取嵌入向量的方法进一步提高了系统的性能,与仅使用wav2vec的方法相比,融合方法的平均准确率提高了5.3%,等精度召回值提高了2.5%。
中图分类号:
[1]ITAKURAF.Minimum prediction residual principle applied to speech recognition[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1975,23(1):67-72. [2]SETTLE S,LEVIN K,KAMPERH,et al.Query-by-examplesearch with discriminative neural acoustic word embeddings[C] //Proc. Interspeech.Stockholm,Sweden,2017:2874-2878. [3]SHAH N,SREERAJ R,MADHAVI M C,et al.Query-By-Example Spoken Term Detection Using Generative Adversarial Network[C]//Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).IEEE,2020:644-648. [4]HAZEN T J,SHEN W,WHITE C.Query-by-example spoken term detection using phonetic posteriorgram templates[C] // IEEE Workshop on Automatic Speech Recognition & Understanding.Merano,Italy,2009:421-426. [5]ZHANG Y D,GLASS J R.Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams[C]//IEEE Workshop on Automatic Speech Recognition & Understanding.Merano,Italy,2009:398-403. [6]MA M,WU H,WANG X,et al.Acoustic word embedding system for code-switching query-by-example spoken term detection[C]//12th International Symposium on Chinese Spoken Language Processing (ISCSLP).IEEE,2021. [7]CHEN H J,LEUNG C C,XIE L,et al.Unsupervised bottleneck features for low-resource query-by-example spoken term detection[C]//Proc.Interspeech.San Francisco,USA,2016:923-927. [8]YUAN Y G,LEUNG C C,XIE L,et al.Pairwise learning using multi-lingual bottleneck features for lowresource query-by-example spoken term detection[C]//IEEE International Confe-rence on Acoustics,Speech and Signal Processing (ICASSP).New Orleans,USA,2017:5645-5649. [9]RAM D,MICULICICH L,BOURLARD H.Multilingual bot-tleneck features for query by-example spoken term detection[C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).Sentosa,Singapore,2019:621-628. [10]RAM D,MICULICICH L,BOURLARD H.Neural networkbased end-to-end query by example spoken term detection[J].IEEE/ACM Transactions on Audio,Speech.and Language Processing,2020,28:1416-1427. [11]LEVIN K,JANSEN A,VAN DURME B.Segmental acousticindexing for zero resource keyword search[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).Brisbane,Australia,2015:5828-5832. [12]CHUNG Y A,WU C C,SHEN C H,et al.Audio word2vec:Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder[C]//Proc.Interspeech.San Francisco,USA,2016:765-769. [13]MÜLLER M.Dynamic time warping[M]//Information Retrie-val for Music and Motion.Berlin:Springer,2007:69-84. [14]DHANANJAY R,AFSANEH A,HERV B.I Sparse subspacemodeling for query by example spoken term detection[J].IEEE/ACM Trans.Audio,Speech,Lang.Process.,2018,26(6):1130-1143. [15]ZHAN J,HE Q,SU J,et al.A Stage Match for Query-by-Example Spoken Term Detection Based On Structure Information of Query[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP 2021).IEEE,2021:6833-6837. [16]HE W J,WANG W R,LIVESCU K.Multi-view recurrent neural acoustic word embeddings[C]//Proc.ICLR. Toulon,France,2017. [17]JUNG M,LIM H,GOO J,et al.Additional shared decoder onsiamese multi-view encoders for learning acoustic word embeddings[C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).Sentosa,Singapore,2019:629-636. [18]AUDHKHASI K,ROSENBERG A,SETHY A,et al.End-to-end asr-free keyword search from speech[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(8):1351-1359. [19]KAMPER H,LIVESCU K,GOLDWATER S.An embeddedsegmental k-means model for unsupervised segmentation and clustering of speech[C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).Okinawa,Japan,2017:719-726. [20]SCHNEIDER S,BAEVSKI A,COLLOBERT R,et al. wav2vec:Unsupervised pre-training for speech recognition[C]//Proc.Interspeech.Graz,Austria,2019:3465-3469. [21]BAEVSKI A,AULI M,MOHAMED A.Effectiveness of self-supervised pre-training for asr[C]//International Conference on Acoustics,Speech and Signal Processing (ICASSP).Barcelona,Spain,2020:7694-7698. [22]RIVIÈRE M,JOULIN A,MAZARÈ P E,et al.Unsupervised pretraining transfers well across languages[C]//International Conference on Acoustics,Speech and Signal Processing (ICASSP).Virtual Barcelona,Spain,2020:7414-7418. [23]HOFFER E,AILON N.Deep metric learning using triplet network[C]//International Workshop on Similarity-based Pattern Recognition.Cham:Springer,2015:84-92. [24]GODFREY J J,HOLLIMAN E C,MCDANIE L J.SWITCHBOARD:telephone speech corpus for research and development[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).San Francisco,USA,1992:517-520. [25]POVEY D,GHOSHAL A.The Kaldi Speech Recognition Toolkit[C]//IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU).Big Island,USA,2011:1-14. [26]ABADI M,AGARWAL A,BARHAM P,et al.Tensorflow:Large-scale machine learning on heterogeneous distributed systems [EB/OL].(2016-3-16) [2021-08-31].https://arxiv.org/abs/1603.04467. [27]PANAYOTOV V,CHEN G,POVEY D,et al.Librispeech:an asr corpus based on public domain audio books[C]//IEEE International Conference on Acoustics,Speech and Signal Proces-sing (ICASSP).Brisbane,Austrlia,2015:5206-5210. [28]SETTLE S,LIVESCU K.Discriminative acoustic word embeddings:Tecurrent neural network-based approaches[C]//2016 IEEE Spoken Language Technology Workshop (SLT).IEEE,2016:503-510. |
[1] | 程章桃, 钟婷, 张晟铭, 周帆. 基于图学习的推荐系统研究综述 Survey of Recommender Systems Based on Graph Learning 计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072 |
[2] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[3] | 齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126 |
[4] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[5] | 孙晓寒, 张莉. 基于评分区域子空间的协同过滤推荐算法 Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace 计算机科学, 2022, 49(7): 50-56. https://doi.org/10.11896/jsjkx.210600062 |
[6] | 刘卫明, 安冉, 毛伊敏. 基于聚类和WOA的并行支持向量机算法 Parallel Support Vector Machine Algorithm Based on Clustering and WOA 计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040 |
[7] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[8] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[9] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[10] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[11] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[12] | 陈章辉, 熊贇. 基于解耦-检索-生成的图像风格化描述生成模型 Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate 计算机科学, 2022, 49(6): 180-186. https://doi.org/10.11896/jsjkx.211100129 |
[13] | 徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105 |
[14] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
[15] | 罗俊仁, 张万鹏, 陆丽娜, 陈璟. 即时策略博弈在线对抗规划方法综述 Survey on Online Adversarial Planning for Real-time Strategy Game 计算机科学, 2022, 49(6): 287-296. https://doi.org/10.11896/jsjkx.210600168 |
|