Computer Science ›› 2024, Vol. 51 ›› Issue (11): 213-228.doi: 10.11896/jsjkx.231000037
• Artificial Intelligence • Previous Articles Next Articles
YAO Tianlei, CHEN Xiliang, YU Peiyi
CLC Number:
[1] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving Language Understanding by Generative Pre-Training[J].Computation and Language,2017,4(6):212-220. [2] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atariwith Deep Reinforcement Learning[C]//Proceedings of the Deep Learning Workshop at NIPS.San Diego:NIPS,2013:812-826. [3] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Advances in Neural Information Processing Systems.San Diego:NIPS,2017:5998-6008. [4] CHEN L,LU K,RAJESWARAN A,et al.Decision Transfor-mer:Reinforcement Learning via Sequence Modeling[C]//International Conference on Learning Representations.Washington DC,2021:3307-3319. [5] JANNER M,LI Q,LEVINE S.Reinforcement Learning as OneBig Sequence Modeling Problem[C]//Proceedings of the Annual Conference on Neural Information Processing Systems.San Diego:NIPS,2021:1213-1225. [6] LI H,UMAR N,CHEN R,et al.Deep Reinforcement Learning[C]//ICASSP 2018-2019IEEE International Conference on Acoustics,Speech and Signal Processing.NewYork:ICASSP,2018:2432-2449. [7] HOPFIELD J J.Neural networks and physical systems-withemergent collective computational abilities[J].Proceedings of the National Academy of Sciences of the United States of America,2018,79:2554-2558. [8] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Transla-tion by Jointly Learning to Align and Translate[C]//International Conference on Learning Representations.Washington DC,2015:1409-1420. [9] URIEL S,HAGGAI R,YOTAM E,et al.Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce[C]//Proceedings of the 15th ACM International Conference on Web Search and Data Mining(WSDM).2022:937-946. [10] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[C]//Advances in Neural Information Processing Systems.San Diego:NIPS,2017:2054-2068. [11] SILVER D,LEVER G,HESS N,et al.Deterministic Policy Gradient Algorithms[C]//International Conferenceon Machine Learning.New York:ICML,2014:1892-1904. [12] SKORDILIS E,MOGHADDASS R,FARHAT M T,et al.AGenerative Reinforcement Learning Framework for Predictive Analytics[C]//2023 Annual Reliability and Maintainability Symposium(RAMS.2023:1-7. [13] ZHAO S Y,GROVER A.Decision Stacks:Flexible Reinforce-ment Learning via ModularGenerative Models[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems(NIPS ’23).2023:80306-80323. [14] GOODFELLOW I,POUGET J,MIRZA M,et al.GenerativeAdversarial Nets[C]//Neural Information Processing Systems MIT Press.San Diego:NIPS,2014:3844-3852. [15] ZHANG B,SENNRICH R.A Lightweight Recurrent Network for Sequence Modeling[C]//Proceeding of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1538-1548. [16] BO P,ERIC A,QUENTIN A,et al.RWKV:Reinventing RNNs for the Transformer Era[J].arXiv.2305.13048,2023. [17] KHOI M N,QUANG P,BINH T N.Adaptive-saturated RNN:Remember more with less instability[J].arXiv:2304.11790,2023. [18] HOCHREITER S.The Vanishing Gradient Problem DuringLearning Recurrent Neural Nets and Problem Solutions[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,1998,6(2):107-116. [19] CHEN J K,QIU X P,LIU P F,et al.Meta Multi-Task Learning for Sequence Modeling[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.Menlo Park:AAAI,2018. [20] LIU Y J,MENG F D,ZHANG J C,et al.GCDT:A Global Context Enhanced Deep Transition Architecture for Sequence Labeling[C]//Annual Meeting of the Association for Computational Linguistics.Stroudsburg:ACL,2019:426-436. [21] SUTSKEVERI,VINYALS O,LE Q.Sequence to SequenceLearning with Neural Networks[C]//Advances in Neural Information Processing Systems 34-35th Conference on Neural Information Processing Systems.San Diego:NIPS,2016:3844-3852. [22] CHO K,MERRIENBOER B,GULCEHRE C,et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Confe-rence on Empirical Methodsin Natural Language Processing.Stroudsburg:ACL,2014:96-112. [23] CHAUDHARI S,MITHAL V,POLATKANG,et al.An Attentive Survey of Attention Models[J].ACM Transactions on Intelligent Systems and Technology(TIST),2021,12(5):1-32. [24] TOOMARIAN N,BARHEN J.Fast temporal neural learningusing teacher forcing[C]//IJCNN-91-Seattle International Joint Conference on Neural Networks.1991:817-822. [25] LIN Z,FFENG M,SANTOS C,et al.A Structured Self-attentive Sentence Embedding[J].arXiv:1703.03130,2017. [26] WICKENS C.Attention:Theory,Principles,Models and Applications[J].International Journal of Human-Computer Interaction,2021,37(5):403-417. [27] CORDONNIER J,LOUKAS A,JAGGI M.Multi-Head Attention:Collaborate Instead ofConcatenate[J].arXiv:2006.16362,2020. [28] SUNDERMEYER M,SCHLUTER R,NEY H.LSTM NeuralNetworks for Language Modeling[C]//Annual conference of the International Speech Communication Association.Baixas:ISCA,2012:106-119. [29] LIU Y,SHAO Z,HOFFMANN N.Global Attention Mecha-nism:Retain Information to Enhance Channel-Spatial Interactions[J].arXiv.2112.05561,2021. [30] LUONG M,PHAM H,MANNING C.Effective Approaches to Attention-based Neural Machine Translation[C]//Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2015:2067-2081. [31] ROMBACH R,BLATTMANN A,LORENZ D,et al.High-Re-solution Image Synthesis with Latent Diffusion Models[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:10674-10685. [32] MING K,JAESEOK J,YOUNG J.Diffusion Models alreadyhave a Semantic Latent Space[C]//International Conference on Learning Representations(ICLR).2023:312-325. [33] LIU L,LIU X,GAO J,et al.Understanding the Difficulty ofTraining Transformers[C]//Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2020:1667-1679. [34] KALYAN K,RAJASEKHARAN A,SANGEETHA S.AM-MUS:A Survey of Transformer-based Pretrained Models in Natural Language Processing[J].arXiv:2108.05542,2021. [35] ZHANG C,LI C,ZHANG C,et al.One Small Step for Generative AI,One Giant Leap for AGI:A Complete Survey on Chat-GPT in AIGC Era[J].arXiv:2304.06488,2023. [36] DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:ACL,2019:4171-4186. [37] GARG D,HEJNA J,GEIST M,et al.Extreme Q-Learning:MaxEnt RL without Entropy[J].arXiv:2301.02328,2023. [38] WANG Y T,PAN Y H,YAN M,et al.A Survey on ChatGPT:AI-Generated Contents,Challenges,and Solutions[C]//IEEE Open Journal of the Computer Society.2023:1-20. [39] WU Y N,ZHOU Q,ZHANG T H,et al.Discovery of Potent,Selective,and Orally Bioavailable Inhibitors against Phospho-diesterase-9,a Novel Target for the Treatment of Vascular Dementia[J].Journal of Medicinal Chemistry,2019,62(8):4218-4224. [40] BILGRAM V,LAARMANN F.Accelerating Innovation WithGenerative AI:AI-Augmented Digital Prototyping and Innovation Methods[J].IEEE Engineering Management Review,2023,51(2):18-25. [41] XU H,JIANG L,LI J.et al.Offline RL with No-OOD Actions:In-Sample Learning viaImplicit Value Regularization[J].arXiv:2303.15810,2023. [42] WANG H N,LIU N,ZHANG Y Y,et al.A Reviewof Deep Reinforcement Learning[J].Frontiers of Information Technology &Electronic Engineering,2020,21(12):63-82. [43] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518:529-533. [44] HAUSKNECHT M,STONE P.Deep Recurrent Q-Learning for Partially Observable MDPs[C]//AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents.Menlo Park:AAAI,2015:1-9. [45] DUAN Y,SCHULMAN J,CHEN X.et al.RL2:Fast Reinforcement Learning via SlowReinforcement Learning[J].arXiv:1611.02779,2016. [46] LI X,LI L,GAO J.et al.Recurrent Reinforcement Learning:A Hybrid Approach[J].arXiv:1509.03044,2015. [47] STAMATELIS G,KALOUPTSIDIS N.Active hypothesis testing in unknown environments using recurrent neural networks and model free reinforcement learning[J].arXiv:2303.10623,2023. [48] QUERIDO G,SARDINHA A,MELO F.Learning to Perceive in Deep Model-Free Reinforcement Learning[J].arXiv:2301.03730,2023. [49] D’ALONZO M,RUSSELL R.Symmetry Detection in Trajectory Data for More Meaningful Reinforcement Learning Representations[C]//Appears in Proceedings of AAAI FSS-22 Sympo-sium.Menlo Park:AAAI,2022:1452-1468. [50] HE H,CHEN J B,XU K,et al.Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning[J].arXiv:2305.18459,2023. [51] ADA S.E,OZTOP E,EMRE U.Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning[J].arXiv:2307.04726,2023. [52] BOGDAN M,WALTER A,BAUTISTAM,et al.Value function estimation using conditiona ldiffusion models for control[J].arXiv:2306.07290,2023. [53] FELIPE N,TIM F,JOAO F.Extracting Reward Functions from Diffusion Models[J].arXiv:2306.01804,2023. [54] SARTHAK M,ORBINIAN A,STEFAN B,et al.DiffusionBased Representation Learning[C]//Proceedings of the 40 th International Conference on Machine Learning.2023:24963-24982. [55] FENG Y S,LI J.A Review of Research on Deep Learning Based on the Development of Representation Learning[J].Microcontrollers & Embedded Systems,2022,22(11):3-6. [56] WU S,XIAO X,DING Q,et al.Adversarial Sparse Transformer for Time Series Forecasting[C]//Neural Information Processing Systems.San Diego:NIPS,2020:844-856. [57] LIM B,ARIK S,LOEFF N,et al.Temporal Fusion Transfor-mers for Interpretable Multi-horizon Time Series Forecasting[J].International Journal of Forecasting,2021,37(4):1748-1764. [58] WU N,GREEN B,XUE B,et al.Deep Transformer Models for Time Series Forecasting:The Influenza Prevalence Case[J].arXiv:2001.08317,2020. [59] TANG Y,HA D.The Sensory Neuron as a Transformer:Permutation-Invariant Neural Networks for Reinforcement Lear-ning[C]//Neural Information Processing Systems.San Diego:NIPS,2021:384-397. [60] KURIN V,IGL M,ROCKTSCHEL T,et al.My Body is aCage:the Role of Morphology in Graph Based Incompatible Control[C]//International Conference on Learning Representations.Washington DC:ICLR,2021:471-484. [61] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations.Washington DC:ICLR,2021:571-583. [62] PARISOTTO E,SONG H,RAE J,et al.Stabilizing Transfor-mers for Reinforcement Learning[C]//International Conference on Machine Learning.New York:ICML,2019:1423-1436. [63] DAI Z,YANG Z,YANG Y,et al.Transformer-XL:Attentive Language Models beyonda Fixed-Length Context[C]//Annual Meeting of the Association for Computational Linguistics.Stroudsburg:ACL,2019:932-947. [64] BANINO A,BADIA A,WALKER J,et al.CoBERL:Contrastive BERT for Reinforcement Learning[C]//International Conference on Learning Representation.Washington DC:ICLR,2021:1074-1083. [65] HU S,ZHU F,CHANG X,et al.UPDeT:Universal Multi-agentReinforcement Learning via Policy Decoupling with Transfor-mers[C]//International Conference on Learning Representations.Washington DC:ICLR,2021:720-734. [66] SCHMIDHUBER J.Reinforcement Learning Upside Down:Don’t Predict Rewards-Just Map Them to Actions[J].arXiv:1912.02875,2019. [67] WANG K,ZHAO H,LUO X,et al.Bootstrapped Transformer for Offline Reinforcement Learning[C]//Thirty-Sixth Confe-rence on Neural Information Processing Systems.NewOrleans.San Diego:NIPS,2022:1244-1258. [68] FURUTA H,MATSUO Y,GU S.Generalized Decision Transformer for Offline Hindsight Information Matching[C]//International Conference on Learning Representations.Washington DC:ICLR,2021:784-796. [69] YAMAGATA T,KHALIL A,SANTOS R.Q-learning Decision Transformer:Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL[J].arXiv:2209.03993,2022. [70] XU M,SHEN Y,ZHUANG S,et al.Prompting Decision Transformer for Few-Shot Policy Generalization[C]//International Conference on Machine Learning.New York:ICML,2022:206-222. [71] LASKIN M,WANG L.In-context Reinforcement Learningwith Algorithm Distillation[J].arXiv:2210.14215,2022. [72] MENG L,GOODWIN M,YAZIDI A.Deep ReinforcementLearning with Swin Transformer[J].arXiv:2206.15269,2022. [73] MAO H Y,ZHAO R,CHEN H,et al.Transformer in Transformer as Backbone for Deep Reinforcement Learning[J].ar-Xiv:2212.14538,2022. [74] HU S,SHEN L,ZHANG Y,et al.Graph Decision Transformer[J].arXiv:2303.03747,2023. [75] ESSLINGER K,PLATT R,AMATO C.Deep Transformer Q-Networks for Partially Observable Reinforcement Learning[J].arXiv:20222206.01078,2022. [76] ZHENG Q,HENAFF M,AMOS B.et al.Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories[J].arXiv:2210.06518,2022. [77] ZHU D,WANG Y,SCHMIDHUBER J.et al.Guiding OnlineReinforcement Learning with Action-Free Offline Pre-training[J].arXiv:2301.12876,2023. [78] LEE K,NACHUM O,YANG M,et al.Multi-Game DecisionTransformers[C]//Neural Information Processing Systems.San Diego:NIPS,2022:1844-1852. [79] REED S,ZOLNA K,PARISOTTO E,et al.A Generalist Agent[J/OL].Transactions on Machine Learning Research,2022:2835-8856.https://openreview.net/forum?id=1ikK0kHjvj. [80] MELO L.Transformers are Meta-Reinforcement Learners[J].arXiv:2206.06614,2022. [81] YUAN H,ZHANG C,WANG H.et al.Plan4MC:Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks[J].arXiv:20232303.16563,2023. [82] ZHU X Z,CHEN Y T,TIAN H,et al.Ghost in the Minecraft:Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Me-mory[J].arXiv:2305.17144,2023. [83] BAI Y,JONES A,NDOUSSE K,et al.Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback[J].arXiv:2204.05862,2022. [84] RAMAMURTHY R,AMMANABROLU P,BRANTLEY K,et al.Is Reinforcement Learning(Not) for Natural Language Processing:Benchmarks,Baselines,and Building Block for Natural Language Policy Optimization[J].arXiv:2210.01241,2022. [85] GAO L,SCHULMAN J,HILTON J.Scaling Laws for Reward Model Over optimization[J].arXiv:2210.10760,2022. [86] GLAESE A,MCALEESE N,TRKEBACZ M,et al.Improving alignment of dialogue agents via targeted human judgements[J].arXiv:2209.14375,2022. [87] LU P,QIU L,CHANG K,et al.Dynamic Prompt L-earning via Policy Gradient for Semi-structured Mathematical Reasoning[J].arXiv:2209.14610,2022. [88] TEAM A,BAUER A,et al.Human-Timescale Adaptation in an Open-Ended Task Space[J].arXiv:2301.07608,2023. [89] COLEMAN M,RUSSAKOVSKY O,ALLEN C,et al.Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning[C]//ICML 2023 Workshop:Sampling and Optimization in Discrete Space.2023. [90] PARISOTTO E,SALAKHUTDINOV R.Efficient Transfor-mers in Reinforcement Learning using Actor-Learner Distillation[C]//International Conference on Learning Representations.Washington DC:ICLR,2021:107-123. [91] SILVA D D,MILLS N,EI A M,et al.ChatGPT and GenerativeAI Guidelines for Addressing Academic Integrity and Augmenting Pre-Existing Chatbots[C]//2023 IEEE International Conference on Industrial Technology(ICIT).2023:1-6. [92] RAHMAYANTI S R,FATICHAH C,SUCIATI N,et al.Sketch Generation From Real Object Images Using Generative Adversarial Network and Deep Reinforcement Learning[C]//2021 13th International Conference on Information & Communication Technology and System(ICTS).2021:134-139. [93] AYDIN A,SURER E.Using Generative Adversarial Nets onAtari Games for Feature Extraction in Deep Reinforcement Learning[C]//2020 28th Signal Processing and Communications Applications Conference(SIU).2020:1-4. [94] YU C,WANG F.Generative AI:How It Changes Our Lives?Take Vision & Language as an Example[C]//2023 Interna-tional VLSI Symposium on Technology,Systems and Applications(VLSI-TSA/VLSI-DAT).2023:1-11. [95] LIU X,YANG H,GAO J,et al.FinRL:deep reinforcementlearning framework to automate trading in quantitative finance[C]//Proceedings of the Second ACM International Conference on AI in Finance.2022:264-278. [96] DALAL G,DVIJOTHAM K,VECERÍK M.et al.Safe Exploration in Continuous Action Spaces[J].arXiv:1801.08757,2018. [97] ZHOU B S,ZHU Y.A Counting Method based on Deep Reinforcement Learning Combined with Generative Adversarial Network[C]//2022 International Conference on Machine Learning,Cloud Computing and Intelligent Mining(MLCCIM).2022:431-434. |
[1] | ZHU Fukun, TENG Zhen, SHAO Wenze, GE Qi, SUN Yubao. Semantic-guided Neural Network Critical Data Routing Path [J]. Computer Science, 2024, 51(9): 155-161. |
[2] | YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232. |
[3] | WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272. |
[4] | ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330. |
[5] | TANG Ying, WANG Baohui. Study on SSL/TLS Encrypted Malicious Traffic Detection Algorithm Based on Graph Neural Networks [J]. Computer Science, 2024, 51(9): 365-370. |
[6] | LIU Renyu, CHEN Xin, SHANG Honghui, ZHANG Yunquan. Optimization of Atomic Kinetics Monte Carlo Program TensorKMC Based on Machine Learning Atomic Potential Functions [J]. Computer Science, 2024, 51(9): 23-30. |
[7] | CHEN Liang, SUN Cong. Deep-learning Based DKOM Attack Detection for Linux System [J]. Computer Science, 2024, 51(9): 383-392. |
[8] | LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271. |
[9] | XU Bei, LIU Tong. Semi-supervised Emotional Music Generation Method Based on Improved Gaussian Mixture Variational Autoencoders [J]. Computer Science, 2024, 51(8): 281-296. |
[10] | CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323. |
[11] | CHENG Xuefeng, DONG Minggang. Dynamic Multi-objective Optimization Algorithm Based on RNN Information Accumulation [J]. Computer Science, 2024, 51(8): 333-344. |
[12] | ZENG Zihui, LI Chaoyang, LIAO Qing. Multivariate Time Series Anomaly Detection Algorithm in Missing Value Scenario [J]. Computer Science, 2024, 51(7): 108-115. |
[13] | HU Haibo, YANG Dan, NIE Tiezheng, KOU Yue. Graph Contrastive Learning Incorporating Multi-influence and Preference for Social Recommendation [J]. Computer Science, 2024, 51(7): 146-155. |
[14] | HAN Bing, DENG Lixiang, ZHENG Yi, REN Shuang. Survey of 3D Point Clouds Upsampling Methods [J]. Computer Science, 2024, 51(7): 167-196. |
[15] | XU Xiaohua, ZHOU Zhangbing, HU Zhongxu, LIN Shixun, YU Zhenjie. Lightweight Deep Neural Network Models for Edge Intelligence:A Survey [J]. Computer Science, 2024, 51(7): 257-271. |
|