Computer Science ›› 2025, Vol. 52 ›› Issue (12): 231-238.doi: 10.11896/jsjkx.250100094
• Artificial Intelligence • Previous Articles Next Articles
LIU Weijie, TANG Zecheng, LI Juntao
CLC Number:
| [1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010. [2]KOH H Y,JU J X,LIU M,et al.An empirical survey on long document summarization:Datasets,models,and metrics[J].ACM computing surveys,2022,55(8):1-35. [3]WANG J,LEONG C T,WANG J S,et al.Instruct once,chat consistently in multiple rounds:An efficient tuning framework for dialogue[J].arXiv:2402.06967,2024. [4]BELTAGY I,PETERS M E,COHAN A.Longformer:Thelong-document transformer[J].arXiv:2004.05150,2020. [5]WANG S,LI B Z,KHABSA M,et al.Linformer:Self-attention with linear complexity[J].arXiv:2006.04768,2020. [6]KITAEV N,KAISER Ł,LEVSKAYA A.Reformer:The efficient transformer[J].arXiv:2001.04451,2020. [7]XIAO G X,TIAN Y D,CHEN B D,et al.Efficient streaming language models with attention sinks[J].arXiv:2309.17453,2023. [8]CHEN S Y,WONG S,CHEN L J,et al.Extending context window of large language models via positional interpolation[J].arXiv:2306.15595,2023. [9]LU Y,ZHOU X,HE W,et al.Longheads:Multi-head attention is secretly a long context processor[J].arXiv:2402.10685,2024. [10]DAI Z H,YANG Z L,YANG Y M,et al.Transformer-xl:Attentive language models beyond a fixed-length context[J].ar-Xiv:1901.02860,2019. [11]BERTSCH A,ALON U,NEUBIG G,et al.Unlimiformer:Long-range transformers with unlimited length input[C]//Advances in Neural Information Processing Systems.2024. [12]YU H F,ZHANG Y,BI W,et al.Trams:Training-free memory selection for long-range language modeling[J].arXiv:2310.15494,2023. [13]WU Y H,RABE M N,HUTCHINS D,et al.Memorizing transformers[J].arXiv:2203.08913,2022. [14]WANG W Z,DONG L,CHENG H,et al.Augmenting language models with long-term memory[C]//Advances in Neural Information Processing Systems.2024. [15]RUBIN O,BERANT J.Long-range language modeling withself-retrieval[J].arXiv:2306.13421,2023. [16] TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models[J].arXiv:2302.13971,2023. [17]ZHANG R R,HAN J M,LIU C,et al.Llama-adapter:Efficient fine-tuning of language models with zero-init attention[J].ar-Xiv:2303.16199,2023. [18]SU J L,AHMED M,LU Y,et al.Roformer:Enhanced transformer with rotary position embedding[J].Neurocomputing,2024,568:127063. [19]HU E J,SHEN Y L,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].arXiv:2106.09685,2021. [20]FU Y,PANDA R,NIU X Y,et al.Data engineering for scaling language models to 128k context[J].arXiv:2402.10171,2024. [21]TWORKOWSKI S,STANISZEWSKI K,PACEK M,et al.Focused transformer:Contrastive training for context scaling[C]//Advances in Neural Information Processing Systems.2024. [22]RAE J W,POTAPENKO A,JAYAKUMAR S M,et al.Compressive transformers for long-range sequence modelling[J].arXiv:1911.05507,2019. [23]ZHU Y K,KIROS R,ZEMEL R,et al.Aligning books and mo-vies:Towards story-like visual explanations by watching movies and reading books[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:19-27. [24]MERITY S,XIONG C M,BRADBURY J,et al.Pointer sentinel mixture models[J].arXiv:1609.07843,2016. [25]AZERBAYEV Z,SCHOELKOPF H,PASTER K,et al.Llemma:An open language model for mathematics[J].arXiv:2310.10631,2023. [26]YEN H,GAO T Y,CHEN D Q.Long-context language mode-ling with parallel context encoding[J].arXiv:2402.16617,2024. [27]JOHNSON J,DOUZE M,JEÉGOU H.Billion-scale similarity search withgpus[J].IEEE Transactions on Big Data,2019,7(3):535-547. [28]CHEN Y K,QIAN S J,TANG H T,et al.Longlora:Efficient fine-tuning of long-context large language models[J].arXiv:2309.12307,2023. [29]PENG B W,QUESNELLE J,FAN H L,et al.Yarn:Efficient context window extension of large language models[J].arXiv:2309.00071,2023. [30]ABDIN M,JACOBS S A,AWAN A A,et al.Phi-3 technical report:A highly capable language model locally on your phone[J].arXiv:2404.14219,2024. [31]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in neural information processing systems,2020,33:1877-1901. [32]PRESS O,SMITH N A,LEWIS M.Train short,test long:Attention with linear biases enables input length extrapolation[J].arXiv:2108.12409,2021. [33]LEWIS P,PEREZ E,PIKTUS A,et al.Retrieval-augmentedgeneration for knowledge-intensivenlp tasks[J].Advances in Neural Information Processing Systems,2020,33:9459-9474. [34]IZACARD G,GRAVE E.Leverag- ing passage retrieval withgenerative models for open domain question answering[J].ar-Xiv:2007.01282,2020. [35]RAM O,LEVINE Y,DALMEDIGOS I,et al.In-context retrie-val-augmented language models[J].Transactions of the Association for Computational Linguistics,2023,11:1316-1331. [36]YU W H,ITER D,WANG S H,et al.Generate rather than retrieve:Large language models are strong context generators[J].arXiv:2209.10063,2022. [37]ASAI A,WU Z Q,WANG Y Z,et al.Self-rag:Learning to retrieve,generate,and critique through self-reflection[J]. arXiv:2310.11511,2023. [38]GUU K,LEE K,TUNG Z,et al.Realm:Retrieval- augmented language model pre-training[J].arXiv:2002.08909,2020. [39]KHANDELWAL U,LEVY O,JURAFSKY D,et al.Generalization through memorization:Nearest neighbor language models[J].arXiv:1911.00172,2019. |
| [1] | CHENG Zhangtao, HUANG Haoran, XUE He, LIU Leyuan, ZHONG Ting, ZHOU Fan. Event Causality Identification Model Based on Prompt Learning and Hypergraph [J]. Computer Science, 2025, 52(9): 303-312. |
| [2] | LIU Le, XIAO Rong, YANG Xiao. Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction [J]. Computer Science, 2025, 52(8): 277-287. |
| [3] | ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225. |
| [4] | LIU Yanlun, XIAO Zheng, NIE Zhenyu, LE Yuquan, LI Kenli. Case Element Association with Evidence Extraction for Adjudication Assistance [J]. Computer Science, 2025, 52(2): 222-230. |
| [5] | XU Siyao, ZENG Jianjun, ZHANG Weiyan, YE Qi, ZHU Yan. Dependency Parsing for Chinese Electronic Medical Record Enhanced by Dual-scale Collaboration of Large and Small Language Models [J]. Computer Science, 2025, 52(2): 253-260. |
| [6] | ZHANG Peng, ZHANG Daojuan, CHEN Kai, ZHAO Yufei, ZHANG Yingjie, FEI Kexiong. Enhancing NLP Robustness Against Attacks with Retrieval-augmented Classification and Decoupled Representations [J]. Computer Science, 2025, 52(12): 428-434. |
| [7] | XIA Peng, ZHANG Yijun, QI Ji. Multi-agent Collaborative Code Generation Technology Driven by Large Language Models [J]. Computer Science, 2025, 52(11A): 241200033-9. |
| [8] | YUAN Tianhao, WANG Yongjun, WANG Baoshan, WANG Zhongyuan. Review of Artificial Intelligence Generated Content Applications in Natural Language Processing [J]. Computer Science, 2025, 52(11A): 241200156-12. |
| [9] | WEI Hao, ZHANG Zongyu, DIAO Hongyue, DENG Yaochen. Review of Application of Information Extraction Technology in Digital Humanities [J]. Computer Science, 2025, 52(11A): 250600198-10. |
| [10] | ZHAO Hongyi, LI Zhiyuan, BU Fanliang. Multi-language Embedding Graph Convolutional Network for Hate Speech Detection [J]. Computer Science, 2025, 52(11A): 241200023-8. |
| [11] | FU Juan. Research on Application of Deep Learning-based Natural Language Processing Technology inIntelligent Translation Systems [J]. Computer Science, 2025, 52(11A): 241000037-6. |
| [12] | ZHANG Jiawei, WANG Zhongqing, CHEN Jiali. Multi-grained Sentiment Analysis of Comments Based on Text Generation [J]. Computer Science, 2025, 52(10): 239-246. |
| [13] | ZHANG Jian, LI Hui, ZHANG Shengming, WU Jie, PENG Ying. Review of Pre-training Methods for Visually-rich Document Understanding [J]. Computer Science, 2025, 52(1): 259-276. |
| [14] | GUO Zhiqiang, GUAN Donghai, YUAN Weiwei. Word-Character Model with Low Lexical Information Loss for Chinese NER [J]. Computer Science, 2024, 51(8): 272-280. |
| [15] | LI Bin, WANG Haochang. Implementation and Application of Chinese Grammatical Error Diagnosis System Based on CRF [J]. Computer Science, 2024, 51(6A): 230900073-6. |
|
||