基于二次学习的行为克隆优化方法

doi:10.11896/jsjkx.250600131

Abstract

Abstract: In the imitation learning method of behavior cloning(BC),an agent tends to take random actions when encountering states that are not covered by expert data.This deviation from the expert policy leads to what is known as compounding error,a critical factor affecting the performance of BC.To address this issue,this paper first establishes that BC can be regarded as a simplified form of twice learning.Furthermore,in discrete action environments,BC primarily focuses on aligning with the expert-selected actions while ignoring probability information associated with other actions,resulting in incomplete extraction of expert knowledge.Inspired by twice learning,this paper proposes an enhanced version of BC,termed complete behavior cloning(CBC),which aims to leverage a more comprehensive set of information from expert data.To validate the effectiveness of this approach,this paper designs multiple comparative experiments.The results demonstrate that CBC not only mitigates compounding error but also exhibits high transferability across different devices,enhanced robustness to noise,and reduced dependency on expert data.These findings suggest that BC can become highly practical and computationally efficient with only minor modifications.More-over,the experimental results further reinforce the guiding role and effectiveness of twice learning in reinforcement learning problems.

Key words: Imitation learning, Behavior cloning, Compounding error, Twice learning, Information extraction

CLC Number:

TP391

FAN Wenshu, WAN Shenghua, LI Xinchun, SUN Haihang, HUANG Kaichen, GAN Le, ZHAN Dechuan. Twice Learning Revitalizes Behavior Cloning[J].Computer Science, 2026, 53(3): 129-135.

References

[1]ZHOU Z H,JIANG Y,CHEN S F.Extracting Symbolic Rules from Trained Neural Network Ensembles[J].AI Communications,2003,16(1):3-15.
[2]POMERLEAU D A.ALVINN:An autonomous land vehicle in a neural network[C]//NIPS.1989.
[3]ZARE M,KEBRIA P M,KHOSRAVI A,et al.A survey of imitation learning:Algorithms,recent developments,and challenges[J].IEEE Transactions on Cybernetics,2024,54(12):7173-7186.
[4]ROSS S,GORDON G,BAGNELL D.A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics.2011:627-635.
[5]MANDLEKAR A,XU D,MARTÍN-MARTÍN R,et al.Human-in-the-loop imitation learning using remote teleoperation[J].arXiv:2012.06733,2020.
[6]REDDY S,DRAGAN A D,LEVINE S.Sqil:Imitation learning via reinforcement learning with sparse rewards[J].arXiv:1905.11108,2019.
[7]WANG R,CILIBERTO C,AMADORI P V,et al.Random ex-pert distillation:Imitation learning via expert policy support estimation[C]//International Conference on Machine Learning.PMLR,2019:6536-6544.
[8]BRANTLEY K,SUN W,HENAFF M.Disagreement-regula-rized imitation learning[C]//International Conference on Lear-ning Representations.2019.
[9]BRANTLEY K.Expert-in-the-loop for sequential decisions and predictions[D].Maryland:University of Maryland,2021.
[10]CHANG J,UEHARA M,SREENIVAS D,et al.Mitigating covariate shift in imitation learning via offline data with partial coverage[J].Advances in Neural Information Processing Systems,2021,34:965-979.
[11]SYED U,BOWLING M,SCHAPIRE R E.Apprenticeshiplearning using linear programming[C]//Proceedings of the 25th International Conference on Machine Learning.2008:1032-1039.
[12]FU J,LUO K,LEVINE S.Learning robust rewards with adversarial inverse reinforcement learning[J].arXiv:1710.11248,2017.
[13]HO J,ERMON S.Generative adversarial imitation learning[J].Advances in Neural Information Processing Systems,2016,29:4565-4573.
[14]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851.
[15]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on neural Information Processing Systems.2017:6000-6010.
[16]CHI C,XU Z,FENG S,et al.Diffusion policy:Visuomotor policy learning via action diffusion[J].arXiv:2303.04137.2023.
[17]SHAFIULLAH N M,CUI Z,ALTANZAYA A A,et al.Beha-vior transformers:Cloning k modes with one stone[J].Advances in Neural Information Processing Systems,2022,35:22955-22968.
[18]CUI J,LIU T,MENG Z,et al.Grove:A generalized reward for learning open-vocabulary physical skill[C]//Proceedings of the Computer Vision and Pattern Recognition Conference.2025:15781-15790.
[19]ZHAO T Z,TOMPSON J,DRIESS D,et al.Aloha unleashed:A simple recipe for robot dexterity[J].arXiv:2410.13126,2024.
[20]ZHOU Z H,JIANG Y.NeC4.5:Neural ensemble based C4.5[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(6):770-773.
[21]HINTION G,VINYALS O,DEAN J.Distilling the Knowledge in a Neural Network[J].arXiv:1503.02531,2015.
[22]ROMERO A,BALLAS N,KAHOU S E,et al.Fitnets:Hints for thin deep nets[J].arXiv:1412.6550,2014.
[23]TIAN Y,KRISHNAN D,ISOLA P.Contrastive representation distillation[J].arXiv:1910.10699,2019.
[24]ZHOU W,WANG Y,QIAN X.Knowledge distillation and con-trastive learning for detecting visible-infrared transmission lines using separated stagger registration network[J].IEEE Transactions on Circuits and Systems I:Regular Papers,2025,72(8):4140-4152.
[25]YE H J,LU S,ZHAN D C.Distilling cross-task knowledge via relationship matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12396-12405.
[26]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347.2017.
[27]ZHANG W N.Hands-on Reinforcement Learning[M].Beijing:People’s Posts and Telecommunications Press,2022.
[28]YUAN L,TAY F E H,LI G,et al.Revisiting knowledge distillation via label smoothing regularization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3903-3911.

Related Articles 15

[1]	ZHANG Shiju, GUO Chaoyang, WU Chengliang, WU Lingjun, YANG Fengyu. Text Clustering Approach Based on Key Semantic Driven and Contrastive Learning [J]. Computer Science, 2025, 52(8): 171-179.
[2]	WEI Hao, ZHANG Zongyu, DIAO Hongyue, DENG Yaochen. Review of Application of Information Extraction Technology in Digital Humanities [J]. Computer Science, 2025, 52(11A): 250600198-10.
[3]	WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory [J]. Computer Science, 2025, 52(10): 208-216.
[4]	WANG Yanning, ZHANG Fengdi, XIAO Dengmin, SUN Zhongqi. Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning [J]. Computer Science, 2025, 52(1): 323-330.
[5]	ZHAO Yufei, JIN Cong, LIU Xiaoyu, WANG Jie, ZHU Yonggui, LI Bo. Robot Performance Teaching Demonstration System Based on Imitation Learning [J]. Computer Science, 2024, 51(11A): 240300063-5.
[6]	ZHU Taojie, LU Jicang, ZHOU Gang, DING Xiaoyao, WANG Ling, ZHU Xiubao. Review of Document-level Relation Extraction Techniques [J]. Computer Science, 2023, 50(5): 189-200.
[7]	WEN Kunjian, CHEN Yanping, HUANG Ruizhang, QIN Yongbin. Biomedical Relationship Extraction Method Based on Prompt Learning [J]. Computer Science, 2023, 50(10): 223-229.
[8]	SUN Kaili, LUO Xudong , Michael Y.LUO. Survey of Applications of Pretrained Language Models [J]. Computer Science, 2023, 50(1): 176-184.
[9]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[10]	LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[11]	ZHU Yi-na, CAO Yang, ZHONG Jing-yue, ZHENG Yong-zhi. Survey on Event Extraction Technology [J]. Computer Science, 2022, 49(12): 264-273.
[12]	MIAO Lan-xin, LEI Yu, ZENG Peng-peng, LI Xiao-yu, SONG Jing-kuan. Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network [J]. Computer Science, 2022, 49(11): 134-140.
[13]	FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50.
[14]	DING Ling, XIANG Yang. Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion [J]. Computer Science, 2021, 48(5): 202-208.
[15]	JIANG Chong, ZHANG Zong-zhang, CHEN Zi-xuan, ZHU Jia-cheng, JIANG Jun-peng. Data Efficient Third-person Imitation Learning Method [J]. Computer Science, 2021, 48(2): 238-244.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Twice Learning Revitalizes Behavior Cloning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0