计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 182-191.doi: 10.11896/jsjkx.190200352

• 人工智能 • 上一篇    下一篇

深度强化学习中稀疏奖励问题研究综述

杨惟轶1,白辰甲2,蔡超1,赵英男2,刘鹏2   

  1. (中国联通网络技术研究院 北京100048)1;
    (哈尔滨工业大学计算机科学与技术学院 哈尔滨150001)2
  • 收稿日期:2019-02-24 出版日期:2020-03-15 发布日期:2020-03-30
  • 通讯作者: 白辰甲(bai_chenjia@163.com)
  • 基金资助:
    国家自然科学基金(61671175,61672190)

Survey on Sparse Reward in Deep Reinforcement Learning

ANG Wei-yi1,BAI Chen-jia2,CAI Chao1,ZHAO Ying-nan2,LIU Peng2   

  1. (China Unicom Network Technology Research Institute, Beijing 100048, China)1;
    (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)2
  • Received:2019-02-24 Online:2020-03-15 Published:2020-03-30
  • About author:YANG Wei-yi,born in 1993,postgra-duate.Her main research interests include machine learning,internet of things and reinforcement learning. BAI Chen-jia,born in 1993,Ph.D,is member of China Computer Federation.His main research interests include reinforcement learning and neural network.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61671175, 61672190).

摘要: 强化学习作为机器学习的重要分支,是在与环境交互中寻找最优策略的一类方法。强化学习近年来与深度学习进行了广泛结合,形成了深度强化学习的研究领域。作为一种崭新的机器学习方法,深度强化学习同时具有感知复杂输入和求解最优策略的能力,可以应用于机器人控制等复杂决策问题。稀疏奖励问题是深度强化学习在解决任务中面临的核心问题,在实际应用中广泛存在。解决稀疏奖励问题有利于提升样本的利用效率,提高最优策略的水平,推动深度强化学习在实际任务中的广泛应用。文中首先对深度强化学习的核心算法进行阐述;然后介绍稀疏奖励问题的5种解决方案,包括奖励设计与学习、经验回放机制、探索与利用、多目标学习和辅助任务等;最后对相关研究工作进行总结和展望。

关键词: 强化学习, 人工智能, 深度强化学习, 深度学习, 稀疏奖励

Abstract: As an important research direction of machine learning,reinforcement learning is a kind of method of finding out the optimal policy by interacting with the environment.In recent years,deep learning is widely used in reinforcement learning algorithm,forming a new research field named deep reinforcement learning.As a new machine learning method,deep reinforcement learning has the ability to perceive complex inputs and solve optimal policies.It is applied to robot control and complex decision-making problems.The sparse reward problem is the core problem of reinforcement learning in solving practical tasks.Sparse reward problem exists widely in practical applications.Solving the sparse reward problem is conducive to improving the sample-efficiency and the quality of optimal policy,and promoting the application of deep reinforcement learning to practical tasks.Firstly,an overview of the core algorithm of deep reinforcement learning was given.Then five solutions of sparse reward problem were introduced,including reward design and learning,experience replay,exploration and exploitation,multi-goal learning and auxiliary tasks.Finally,the related researches were summarized and prospected.

Key words: Artificial intelligence, Deep learning, Deep reinforcement learning, Reinforcement learning, Sparse reward

中图分类号: 

  • TP181
[1]SUTTON R S,BARTO A G.Reinforcement learning:An intro- duction[M].MIT Press,US,2018.
[2]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].nature,2015,521(7553):436.
[3]LI Y.Deep reinforcement learning:An overview[J].arXiv: 1701.07274,2017.
[4]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484.
[5]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354.
[6]SILVER D,HUBERT T,SCHRITTWIESER J,et al.A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J].Science,2018,362(6419):1140-1144.
[7]PLAPPERT M,ANDRYCHOWICZ M,RAY A,et al.Multi- goal reinforcement learning:Challenging robotics environments and request for research[J].arXiv:1802.09464,2018.
[8]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized ex- perience replay[J].arXiv:1511.05952,2015.
[9]LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J].The International Journal of Robotics Research,2018,37(4/5):421-436.
[10]ISELE D,RAHIMI R,COSGUN A,et al.Navigating occluded intersections with autonomous vehicles using deep reinforcement learning[C]∥2018 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2018:2034-2039.
[11]BELLMAN R.A Markovian decision process[J].Journal of Mathematics and Mechanics,1957,6(5):679-684.
[12]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[13]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529.
[14]HASSELT H V.Double Q-learning[C]∥Advances in Neural Information Processing Systems.2010:2613-2621.
[15] HASSELT H V,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]∥Thirtieth AAAI Confe-rence on Artificial Intelligence.2016.
[16]HORGAN D,QUAN J,BUDDEN D,et al.Distributed prioritized experience replay[C]∥International Conference onLear-ning Representations.2018.
[17]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Ar- chitectures for Deep Reinforcement Learning[C]∥International Conference on Machine Learning.2016:1995-2003.
[18]BELLEMARE M G,DABNEY W,MUNOS R.A distributional perspective on reinforcement learning[C]∥International Conference on Machine Learning.2017:449-458.
[19]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]∥Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[20]DE ASIS K,HERNANDEZ-GARCIA J F,HOLLAND G Z, et al.Multi-step reinforcement learning:A unifying algorithm[C]∥Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[21]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[C]∥International Conference on Learning Representations.2018.
[22]PRECUP D,SUTTON R S,DASGUPTA S.Off-policy temporal-difference learning with function approximation[C]∥International Conference on Machine Learning.2001:417-424.
[23]BROWNE C B,POWLEY E,WHITEHOUSE D,et al.A survey of monte carlo tree search methods[J].IEEE Transactions on Computational Intelligence and AI in Games,2012,4(1):1-43.
[24]SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]∥International Conference on Machine Learning.2014.
[25]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]∥International conference on machine learning.2016:1928-1937.
[26]WYMANN B,ESPIÉ E,GUIONNEAU C,et al.Torcs,the open racing car simulator[J].Software,2000,4(6).
[27]TODOROV E,EREZ T,TASSA Y.Mujoco:A physics engine for model-based control[C]∥2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2012:5026-5033.
[28]KEMPKA M,WYDMUCH M,RUNC G,et al.Vizdoom:A doom-based ai research platform for visual reinforcement lear-ning[C]∥2016 IEEE Conference on Computational Intelligence and Games (CIG).IEEE,2016:1-8.
[29]BEATTIE C,LEIBO J Z,TEPLYASHIN D,et al.Deepmind lab[J].arXiv:1612.03801,2016.
[30]BABAEIZADEH M,FROSIO I,TYREE S,et al.Reinforcement learning through asynchronous advantage actor-critic on a gpu[C]∥International Conference on Learning Representations.2017.
[31]ESPEHOLT L,SOYER H,MUNOS R,et al.IMPALA:Scala- ble Distributed Deep-RL with Importance Weighted Actor-Learner Architectures[C]∥International Conference on Machine Learning.2018:1406-1415.
[32]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust Region Policy Optimization[C]∥International Conference on Machine Learning.2015,37:1889-1897.
[33]WU Y,MANSIMOV E,GROSSE R B,et al.Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation[C]∥Advances in neural information processing systems.2017:5279-5288.
[34]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[J].arXiv:1707.06347,2017.
[35]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensional continuous control using generalized advantage estimation[C]∥International Conference on Learning Representations.2016.
[36]NACHUM O,NOROUZI M,XU K,et al.Bridging the gap between value and policy based reinforcement learning[C]∥Advances in Neural Information Processing Systems.2017:2775-2785.
[37]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[C]∥International Conference on Learning Representations.2016.
[38]FUJIMOTO S,HOOF H,MEGER D.Addressing Function Approximation Error in Actor-Critic Methods[C]∥International Conference on Machine Learning.2018:1582-1591.
[39]HAUSKNECHT M,STONE P.Deep reinforcement learning in parameterized action space[C]∥International Conference on Learning Representations.2016.
[40]STONE P.What’s hot at RoboCup[C]∥Thirtieth AAAI Conference on Artificial Intelligence.2016.
[41]HAARNOJA T,TANG H,ABBEEL P,et al.Reinforcement learning with deep energy-based policies[C]∥International Conference on Machine Learning.2017:1352-1361.
[42]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]∥International Conference on Machine Learning.2018:1856-1865.
[43]SCHULMAN J,CHEN X,ABBEEL P.Equivalence between policy gradients and soft q-learning[J].arXiv:1704.06440,2017.
[44]GU S,LILLICRAP T,GHAHRAMANI Z,et al.Q-prop:Sample-efficient policy gradient with an off-policy critic[C]∥International Conference on Learning Representations.2017.
[45]O’DONOGHUE B,MUNOS R,KAVUKCUOGLU K,et al. Combining policy gradient and Q-learning[C]∥International Conference on Learning Representations.2017.
[46]WANG Z,BAPST V,HEESS N,et al.Sample efficient actor-critic with experience replay[C]∥International Conference on Learning Representations.2017.
[47]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear- ning[J].Computer Science,2018,45(7):1-6.
[48]OPENAI.Faulty Reward Functions in the Wild[EB/OL].ht- tps://blog.openai.com/faulty-reward-functions.2017.
[49]RUSSELL S,NORVIG P.Artificial Intelligence A Modern Approach 3rd Edition Pdf[J].Hong Kong:Pearson Education Asia,2011.
[50]AMODEI D,OLAH C,STEINHARDT J,et al.Concrete Problems in AI Safety[J].arXiv:1606.06565,2016.
[51]NG A Y,RUSSELL S J.Algorithms for inverse reinforcement learning[C]∥ICML.2000,1:2.
[52]ZIEBART B D,MAAS A L,BAGNELL J A,et al.Maximum entropy inverse reinforcement learning[C]∥AAAI Conference on Artificial Intelligence.2008:1433-1438.
[53]AGHASADEGHI N,BRETL T.Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals[C]∥2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2011:1561-1566.
[54]FINN C,LEVINE S,ABBEEL P.Guided cost learning:Deep inverse optimal control via policy optimization[C]∥International Conference on Machine Learning.2016:49-58.
[55]HADFIELD-MENELL D,MILLI S,ABBEEL P,et al.Inverse reward design[C]∥Advances in neural information processing systems.2017:6765-6774.
[56]CHRISTIANO P F,LEIKE J,BROWN T,et al.Deep reinforcement learning from human preferences[C]∥Advances in Neural Information Processing Systems.2017:4299-4307.
[57]ZHANG K F,YU Y.Methodologies for Imitation Learning via Inverse Reinforcement Learning:A Review[J].Journal of Computer Research and Development,2019,56(2):254-261.
[58]HOU Y,LIU L,WEI Q,et al.A novel DDPG method with prioritized experience replay[C]∥2017 IEEE International Confe-rence on Systems,Man,and Cybernetics (SMC).IEEE,2017:316-321.
[59]TAVAKOLI A,PARDO F,KORMUSHEV P.Action branching architectures for deep reinforcement learning[C]∥Thirty-Se-cond AAAI Conference on Artificial Intelligence.2018.
[60]HORGAN D,QUAN J,BUDDEN D,et al.Distributed prioritized experience replay[C]∥International Conference on Lear-ning Representations.2018.
[61]DE BRUIN T,KOBER J,TUYLS K,et al.Experience selection in deep reinforcement learning for control[J].The Journal of Machine Learning Research,2018,19(1):347-402.
[62]BAI C J,LIU P,ZHAO W,et al.Active Sampling for Deep Q-Learning Based on TD-error Adaptive Correction[J].Journal of Computer Research and Development,2019,56(2):262-280.
[63]CHAPELLE O,LI L.An empirical evaluation of thompson sampling[C]∥Advances in neural information processing systems.2011:2249-2257.
[64]KOLTER J Z,NG A Y.Near-Bayesian exploration in polyno- mial time[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:513-520.
[65]OSBAND I,BLUNDELL C,PRITZEL A,et al.Deep explora- tion via bootstrapped DQN[C]∥Advances in neural information processing systems.2016:4026-4034.
[66]BELLEMARE M,SRINIVASAN S,OSTROVSKI G,et al.Unifying count-based exploration and intrinsic motivation[C]∥Advances in Neural Information Processing Systems.2016:1471-1479.
[67]OSTROVSKI G,BELLEMARE M G,VAN DEN OORD A,et al.Count-based exploration with neural density models[C]∥Proceedings of the 34th International Conference on Machine Learning.2017:2721-2730.
[68]VAN OORD A,KALCHBRENNER N,KAVUKCUOGLU K. Pixel Recurrent Neural Networks[C]∥International Conference on Machine Learning.2016:1747-1756.
[69]SALIMANS T,KARPATHY A,CHEN X,et al.Pixelcnn++:A pixelcnn implementation with discretized logistic mixture likelihood and other modifications[C]∥International Conference on Learning Representations (ICLR).2017.
[70]TANG H,HOUTHOOFT R,FOOTE D,et al.#Exploration:A study of count-based exploration for deep reinforcement learning[C]∥Advances in Neural Information Processing Systems.2017:2753-2762.
[71]HOUTHOOFT R,CHEN X,DUAN Y,et al.Vime:Variational information maximizing exploration[C]∥Advances in Neural Information Processing Systems.2016:1109-1117.
[72]STADIE B C,LEVINE S,ABBEEL P.Incentivizing exploration in reinforcement learning with deep predictive models[J].arXiv:1507.00814,2015.
[73]PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-dri- ven Exploration by Self-supervised Prediction[C]∥International Conference on Machine Learning.2017:2778-2787.
[74]BURDA Y,EDWARDS H,PATHAK D,et al.Large-scale study of curiosity-driven learning[C]∥International Conference on Learning Representations (ICLR).2019.
[75]BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[C]∥International Conference on Learning Representations (ICLR).2019.
[76]FU J,CO-REYES J,LEVINE S.Ex2:Exploration with exemplar models for deep reinforcement learning[C]∥Advances in Neural Information Processing Systems.2017:2577-2587.
[77]OSBAND I,ASLANIDES J,CASSIRER A.Randomized prior functions for deep reinforcement learning[C]∥Advances in Neural Information Processing Systems.2018:8626-8638.
[78]CONTI E,MADHAVAN V,SUCH F P,et al.Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[C]∥Advances in Neural Information Processing Systems.2018:5032-5043.
[79]GUPTA A,MENDONCA R,LIU Y X,et al.Meta-reinforce- ment learning of structured exploration strategies[C]∥Advances in Neural Information Processing Systems.2018:5307-5316.
[80]ANDRYCHOWICZ M,WOLSKI F,RAY A,et al.Hindsight experience replay[C]∥Advances in Neural Information Processing Systems.2017:5048-5058.
[81]SUTTON R S,MODAYIL J,DELP M,et al.Horde:A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction[C]∥The 10th International Confe-rence on Autonomous Agents and Multiagent Systems.2011:761-768.
[82]SCHAUL T,HORGAN D,GREGOR K,et al.Universal value function approximators[C]∥International Conference on Machine Learning.2015:1312-1320.
[83]RAUBER P,UMMADISINGU A,MUTZ F,et al.Hindsight policy gradients[C]∥International Conference on Learning Representations (ICLR).2019.
[84]FANG M,ZHOU C,SHI B,et al.DHER:Hindsight Experience Replay for Dynamic Goals[C]∥International Conference on Learning Representations (ICLR).2019.
[85]LANKA S,WU T.ARCHER:Aggressive Rewards to Counter bias in Hindsight Experience Replay[J].arXiv:1809.02070,2018.
[86]NAIR A V,PONG V,DALAL M,et al.Visual reinforcement learning with imagined goals[C]∥Advances in Neural Information Processing Systems.2018:9209-9220.
[87]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[88]SCHMIDHUBER J.Powerplay:Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem[J].Frontiers in psychology,2013,4:313.
[89]FLORENSA C,HELD D,WULFMEIER M,et al.Reverse curriculum generation for reinforcement learning[C]∥International conference on Robot Learning.2017.
[90]FLORENSA C,HELD D,GENG X,et al.Automatic goal genera- tion for reinforcement learning agents[C]∥International Conference on Machine Learning.2018:1514-1523.
[91]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Gene- rative adversarial nets[C]∥Advances in Neural Information Processing Systems.2014:2672-2680.
[92]SUKHBAATAR S,LIN Z,KOSTRIKOV I,et al.Intrinsic motivation and automatic curricula via asymmetric self-play[C]∥International Conference on Learning Representations (ICLR).2018.
[93]JADERBERG M,MNIH V,CZARNECKI W M,et al.Rein- forcement learning with unsupervised auxiliary tasks[C]∥International Conference on Learning Representations (ICLR).2017.
[94]MIROWSKI P,PASCANU R,VIOLA F,et al.Learning to navi- gate in complex environments[C]∥International Conference on Learning Representations (ICLR).2017.
[95]MIROWSKI P,GRIMES M,MALINOWSKI M,et al.Learning to navigate in cities without a map[C]∥Advances in Neural Information Processing Systems.2018:2424-2435.
[96]PARISOTTO E,SALAKHUTDINOV R.Neural map:Struc- tured memory for deep reinforcement learning[C]∥Internatio-nal Conference on Learning Representations.2018.
[97]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[98]GU S,LILLICRAP T,SUTSKEVER I,et al.Continuous deep q-learning with model-based acceleration[C]∥International Conference on Machine Learning.2016:2829-2838.
[99]XU Z,VAN HASSELT H P,SILVER D.Meta-gradient reinforcement learning[C]∥Advances in Neural Information Processing Systems.2018:2402-2413.
[100]NACHUM O,GU S S,LEE H,et al.Data-efficient hierarchical reinforcement learning[C]∥Advances in Neural Information Processing Systems.2018:3307-3317.
[101]TENENBAUM J.Building machines that learn and think like people[C]∥Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems.2018:5-5.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[5] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[7] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[8] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[9] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[10] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[11] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[12] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[13] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[15] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!