Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230300212-5.doi: 10.11896/jsjkx.230300212
• Image Processing & Multimedia Technolog • Previous Articles Next Articles
WANG Yifan, ZHANG Xuefang
CLC Number:
[1]BALTRUAITIS T,AHUJA C,MORENCY L P.Multimodal machine learning:A survey and taxonomy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(2):423-443. [2]KEEGAN M.The Most Surveilled Cities in the World[EB/OL]https://www.usnews.com/news/cities/articles/2020-08-14/the-top-10-most-surveilled-cities-in-the-world. [3]中国互联网网络信息中心.第50次中国互联网络发展状况统计报告[R/OL].(2022-08-31)[2022-09-10].http://www3.cnnic.cn/NMediaFile/2022/1020/MAIN16662586615125EJOL1VKDF.pdf. [4]Cisco.Cisco Annual Internet Report(2018-2023) White Paper[R].2020. [5]RADFORD A,KIM J W,HALLACYC,et al.Learning trans-ferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763. [6]OpenAI,GPT-4 Technical Report[R].2023. [7]AREVALO J,SOLORIO T,MONTES-Y-GÓMEZ M,et al.Ga-ted multimodal units for information fusion[J].arXiv:1702.01992,2017. [8]CHO K,VAN MERRIËNBOER B,BAHDANAU D,et al.Onthe properties of neural machine translation:Encoder-decoder approaches[J].arXiv:1409.1259,2014. [9]ELMAN J L.Finding Structure in Time[J].Cognitive Science,1990,14(2):179-211. [10]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Memory[J].Neural Computation,1997,9(8):1735-1780. [11]MIKOLOV T,CHEN K,CORRADOG,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [12]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543. [13]MCCANN B,BRADBURY J,XIONG C,et al.Learned in translation:Contextualized word vectors[J/OL].https://proceedings.neurips.cc/paper_files/paper/2017/hash/20c86a628232a67e7bd46f76fba7ce12-Abstract.html. [14]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford-Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035. [15]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [16]ARANDJELOVIC R,ZISSERMAN A.Look,listen and learn[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:609-617. [17]KONG Q,CAO Y,IQBAL T,et al.Panns:Large-scale pre-trained audio neural networks for audio pattern recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:2880-2894. [18]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? anew model and the kinetics dataset[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308. [19]FEICHTENHOFER C,FAN H,MALIK J,et al.Slowfast networks for video recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6202-6211. [20]BERTASIUS G,WANG H,TORRESANI L.Is space-time attention all you need for video understanding?[C]//ICML.2021,2(3):4 [21]LAN Z,CHEN M,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].arXiv:1909.11942,2019. [22]NOJAVANASGHARI B,GOPINATH D,KOUSHIKJ,et al.Deep multimodal fusion for persuasiveness prediction[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction.2016:284-288. [23]WANG H,MEGHAWAT A,MORENCYL P,et al.Select-additive learning:Improving generalization in multimodal sentiment analysis[C]//2017 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2017:949-954. [24]FAN H,MURRELL T,WANG H,et al.PyTorchVideo:A deep learning library for video understanding[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:3783-3786. |
[1] | TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117. |
[2] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
[3] | CHENG Gao-feng, YAN Yong-hong. Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods [J]. Computer Science, 2022, 49(1): 47-52. |
[4] | YANG Run-yan, CHENG Gao-feng, LIU Jian. Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition [J]. Computer Science, 2022, 49(1): 53-58. |
[5] | GAN Chuang, WU Gui-xing, ZHAN Qing-yuan, WANG Peng-kun, PENG Zhi-lei. Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition [J]. Computer Science, 2022, 49(1): 181-186. |
[6] | ZHENG Chun-jun, WANG Chun-li, JIA Ning. Survey of Acoustic Feature Extraction in Speech Tasks [J]. Computer Science, 2020, 47(5): 110-119. |
[7] | ZHANG Jing, YANG Jian, SU Peng. Survey of Monosyllable Recognition in Speech Recognition [J]. Computer Science, 2020, 47(11A): 172-174. |
[8] | CUI Yang, LIU Chang-hong. PIFA-based Evaluation Platform for Speech Recognition System [J]. Computer Science, 2020, 47(11A): 638-641. |
[9] | SHI Yan-yan, BAI Jing. Speech Recognition Combining CFCC and Teager Energy Operators Cepstral Coefficients [J]. Computer Science, 2019, 46(5): 286-289. |
[10] | WEI Ying, WANG Shuang-wei, PAN Di, ZHANG Ling, XU Ting-fa and LIANG Shi-li. Specific Two Words Chinese Lexical Recognition Based on Broadband and Narrowband Spectrogram Feature Fusion with Zoning Projection [J]. Computer Science, 2016, 43(Z11): 215-219. |
[11] | LI Wei-lin, WEN Jian and MA Wen-kai. Speech Recognition System Based on Deep Neural Network [J]. Computer Science, 2016, 43(Z11): 45-49. |
[12] | SUN Zhi-yuan, LU Cheng-xiang, SHI Zhong-zhi and MA Gang. Research and Advances on Deep Learning [J]. Computer Science, 2016, 43(2): 1-8. |
[13] | YANG Dan, CHEN Mo, SUN Liang-xu and WANG Gang. Multi-layer Temporal Data Model Supporting Multi-modality Fusion Entity Search in Heterogeneous Information Spaces [J]. Computer Science, 2015, 42(4): 147-150. |
[14] | DONG Jun-jian,MAO Qi-rong,HU Su-li and ZHAN Yong-zhao. Sub-coding and Entire-coding Jointly Penalty Based Sparse Representation Dictionary Learning [J]. Computer Science, 2014, 41(10): 122-127. |
[15] | LIU Wan-feng,HU Jun and YUAN Wei-wei. Research on Technology of Voice Instruction Recognition for Air Traffic Control Communication [J]. Computer Science, 2013, 40(7): 131-137. |
|