Computer Science ›› 2022, Vol. 49 ›› Issue (1): 59-64.doi: 10.11896/jsjkx.210900007
• Multilingual Computing Advanced Technology • Previous Articles Next Articles
LI Zhao-qi, LI Ta
CLC Number:
[1]ITAKURAF.Minimum prediction residual principle applied to speech recognition[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1975,23(1):67-72. [2]SETTLE S,LEVIN K,KAMPERH,et al.Query-by-examplesearch with discriminative neural acoustic word embeddings[C] //Proc. Interspeech.Stockholm,Sweden,2017:2874-2878. [3]SHAH N,SREERAJ R,MADHAVI M C,et al.Query-By-Example Spoken Term Detection Using Generative Adversarial Network[C]//Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).IEEE,2020:644-648. [4]HAZEN T J,SHEN W,WHITE C.Query-by-example spoken term detection using phonetic posteriorgram templates[C] // IEEE Workshop on Automatic Speech Recognition & Understanding.Merano,Italy,2009:421-426. [5]ZHANG Y D,GLASS J R.Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams[C]//IEEE Workshop on Automatic Speech Recognition & Understanding.Merano,Italy,2009:398-403. [6]MA M,WU H,WANG X,et al.Acoustic word embedding system for code-switching query-by-example spoken term detection[C]//12th International Symposium on Chinese Spoken Language Processing (ISCSLP).IEEE,2021. [7]CHEN H J,LEUNG C C,XIE L,et al.Unsupervised bottleneck features for low-resource query-by-example spoken term detection[C]//Proc.Interspeech.San Francisco,USA,2016:923-927. [8]YUAN Y G,LEUNG C C,XIE L,et al.Pairwise learning using multi-lingual bottleneck features for lowresource query-by-example spoken term detection[C]//IEEE International Confe-rence on Acoustics,Speech and Signal Processing (ICASSP).New Orleans,USA,2017:5645-5649. [9]RAM D,MICULICICH L,BOURLARD H.Multilingual bot-tleneck features for query by-example spoken term detection[C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).Sentosa,Singapore,2019:621-628. [10]RAM D,MICULICICH L,BOURLARD H.Neural networkbased end-to-end query by example spoken term detection[J].IEEE/ACM Transactions on Audio,Speech.and Language Processing,2020,28:1416-1427. [11]LEVIN K,JANSEN A,VAN DURME B.Segmental acousticindexing for zero resource keyword search[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).Brisbane,Australia,2015:5828-5832. [12]CHUNG Y A,WU C C,SHEN C H,et al.Audio word2vec:Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder[C]//Proc.Interspeech.San Francisco,USA,2016:765-769. [13]MÜLLER M.Dynamic time warping[M]//Information Retrie-val for Music and Motion.Berlin:Springer,2007:69-84. [14]DHANANJAY R,AFSANEH A,HERV B.I Sparse subspacemodeling for query by example spoken term detection[J].IEEE/ACM Trans.Audio,Speech,Lang.Process.,2018,26(6):1130-1143. [15]ZHAN J,HE Q,SU J,et al.A Stage Match for Query-by-Example Spoken Term Detection Based On Structure Information of Query[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP 2021).IEEE,2021:6833-6837. [16]HE W J,WANG W R,LIVESCU K.Multi-view recurrent neural acoustic word embeddings[C]//Proc.ICLR. Toulon,France,2017. [17]JUNG M,LIM H,GOO J,et al.Additional shared decoder onsiamese multi-view encoders for learning acoustic word embeddings[C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).Sentosa,Singapore,2019:629-636. [18]AUDHKHASI K,ROSENBERG A,SETHY A,et al.End-to-end asr-free keyword search from speech[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(8):1351-1359. [19]KAMPER H,LIVESCU K,GOLDWATER S.An embeddedsegmental k-means model for unsupervised segmentation and clustering of speech[C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).Okinawa,Japan,2017:719-726. [20]SCHNEIDER S,BAEVSKI A,COLLOBERT R,et al. wav2vec:Unsupervised pre-training for speech recognition[C]//Proc.Interspeech.Graz,Austria,2019:3465-3469. [21]BAEVSKI A,AULI M,MOHAMED A.Effectiveness of self-supervised pre-training for asr[C]//International Conference on Acoustics,Speech and Signal Processing (ICASSP).Barcelona,Spain,2020:7694-7698. [22]RIVIÈRE M,JOULIN A,MAZARÈ P E,et al.Unsupervised pretraining transfers well across languages[C]//International Conference on Acoustics,Speech and Signal Processing (ICASSP).Virtual Barcelona,Spain,2020:7414-7418. [23]HOFFER E,AILON N.Deep metric learning using triplet network[C]//International Workshop on Similarity-based Pattern Recognition.Cham:Springer,2015:84-92. [24]GODFREY J J,HOLLIMAN E C,MCDANIE L J.SWITCHBOARD:telephone speech corpus for research and development[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).San Francisco,USA,1992:517-520. [25]POVEY D,GHOSHAL A.The Kaldi Speech Recognition Toolkit[C]//IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU).Big Island,USA,2011:1-14. [26]ABADI M,AGARWAL A,BARHAM P,et al.Tensorflow:Large-scale machine learning on heterogeneous distributed systems [EB/OL].(2016-3-16) [2021-08-31].https://arxiv.org/abs/1603.04467. [27]PANAYOTOV V,CHEN G,POVEY D,et al.Librispeech:an asr corpus based on public domain audio books[C]//IEEE International Conference on Acoustics,Speech and Signal Proces-sing (ICASSP).Brisbane,Austrlia,2015:5206-5210. [28]SETTLE S,LIVESCU K.Discriminative acoustic word embeddings:Tecurrent neural network-based approaches[C]//2016 IEEE Spoken Language Technology Workshop (SLT).IEEE,2016:503-510. |
[1] | BAO Fei-long,GAO Guang-lai,YAN Xue-liang and WANG Wei-hua. Research on Mongolian Spoken Term Detection Method Based on Segmentation Recognition [J]. Computer Science, 2013, 40(9): 208-211. |
|