Computer Science ›› 2026, Vol. 53 ›› Issue (2): 245-252.doi: 10.11896/jsjkx.241200067
• Computer Grapnics & Multimedia • Previous Articles Next Articles
GUO Xingxing1,2, XIAO Yannan1,2, WEN Peizhi1,2,3, XU Zhi1,2, HUANG Wenming1,2,3
CLC Number:
| [1]WANG J,QIAN X,ZHANG M,et al.Seeing what you said:Talking face generation guided by a lip reading expert[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14653-14662. [2]CHEN L,MADDOX R K,DUAN Z,et al.Hierarchical cross-modal talking face generation with dynamic pixel-wise loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7832-7841. [3]PRAJWAL K R,MUKHOPADHYAY R,NAMBOODIRI V P,et al.A lip sync expert is all you need for speech to lip generation in the wild[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:484-492. [4]ZHOU H,SUN Y,WU W,et al.Pose-controllable talking face generation by implicitly modularized audio-visual representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4176-4186. [5]GUO Y,CHEN K,LIANG S,et al.Ad-Nerf:Audio driven neural radiance fields for talking head synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:5784-5794. [6]MUKHOPADHYAY S,SURI S,GADDE R T,et al.Diff2lip:Audio conditioned diffusion models for lip-synchronization[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:5292-5302. [7]MISTRY D S,KULKARNI A V.Overview:Speech Recognition Technology,Mel-Frequency Cepstral Coefficients(MFCC),Artificial Neural Network(ANN)[J/OL].https://www.ijert.org/research/overview-speech-recognition-technology-mel-frequency-cepstral-coefficients-mfcc-artificial-neural-network-ann-IJERTV2IS100586.pdf. [8]TRAN T,LUNDGREN J.Drill Fault Diagnosis Based on the Scalogram and Mel Spectrogram of Sound Signals Using Artificial Intelligence[J].IEEE Access,2020,8:203655-203666. [9]LI H,QIU K,CHEN L,et al.SCAttNet:Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images[J].IEEE Geoscience and Remote Sensing Letters,2020,18(5):905-909. [10]QIN Z,ZHANG P,WU F,et al.Fcanet:Frequency channel attention networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:783-792. [11]GUO M H,XU T X,LIU J J,et al.Attention mechanisms in computer vision:A survey[J].Computational Visual Media,2022,8(3):331-368. [12]CHUNG J S,ZISSERMAN A.Out of time:automated lip sync in the wild[C]//Computer Vision-ACCV 2016 Workshops:ACCV 2016 International Workshops.2017:251-263. [13]JI Y,YU Y Q.Optimization algorithm for speech facial video generation based on dense convolutional generative adversarial networks and keyframes[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(3):986-992. [14]AFOURAS T,CHUNG J S,SENIOR A,et al.Deep audio-visual speech recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,44(12):8717-8727. [15]SON C J,SENIOR A,VINYALS O,et al.Lip reading sentences in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6447-6456. [16]CHUNG J,ZISSERMAN A.Lip reading in profile[C]//Ritish Machine Vision Conference.British Machine Vision Association and Society for Pattern Recognition,2017. [17]ZHAO Y,XU R,SONG M.A cascade sequence-to-sequence model for chinese mandarin lip reading[C]//Proceedings of the 1st ACM International Conference on Multimedia in Asia.2019:1-6. [18]ZHAO Y,XU R,WANG X,et al.Hearing lips:Improving lip reading by distilling speech recognizers[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:6917-6924. [19]PARK S J,KIM M,HONG J,et al.Synctalkface:Talking face generation with precise lip-syncing via audio-lip memory[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:2062-2070. [20]LIANG B,PAN Y,GUO Z,et al.Expressive talking head generation with granular audio-visual control[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:3387-3396. [21]DUCHI J,HAZAN E,SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of machine learning research,2011,12(7):2121-2159. |
| [1] |
CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping.
Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic Features [J]. Computer Science, 2026, 53(2): 322-330. |
| [2] | ZHANG Jing, PAN Jinghao, JIANG Wenchao. Background Structure-aware Few-shot Knowledge Graph Completion [J]. Computer Science, 2026, 53(2): 331-341. |
| [3] |
ZHUO Tienong, YING Di, ZHAO Hui.
Research on Student Classroom Concentration Integrating Cross-modal Attention and Role Interaction [J]. Computer Science, 2026, 53(2): 67-77. |
| [4] | XU Jingtao, YANG Yan, JIANG Yongquan. Time-Frequency Attention Based Model for Time Series Anomaly Detection [J]. Computer Science, 2026, 53(2): 161-169. |
| [5] | HAN Lei, SHANG Haoyu, QIAN Xiaoyan, GU Yan, LIU Qingsong, WANG Chuang. Constrained Multi-loss Video Anomaly Detection with Dual-branch Feature Fusion [J]. Computer Science, 2026, 53(2): 236-244. |
| [6] | JI Sai, QIAO Liwei, SUN Yajie. Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images [J]. Computer Science, 2026, 53(2): 253-263. |
| [7] | LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205. |
| [8] | FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215. |
| [9] | WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240. |
| [10] | CHEN Qian, CHENG Kaixuan, GUO Xin, ZHANG Xiaoxia, WANG Suge, LI Yanhong. Bidirectional Prompt-Tuning for Event Argument Extraction with Topic and Entity Embeddings [J]. Computer Science, 2026, 53(1): 278-284. |
| [11] | PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281. |
| [12] | GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319. |
| [13] | LIU Jian, YAO Renyuan, GAO Nan, LIANG Ronghua, CHEN Peng. VSRI:Visual Semantic Relational Interactor for Image Caption [J]. Computer Science, 2025, 52(8): 222-231. |
| [14] | LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102. |
| [15] | LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150. |
|
||