计算机科学 ›› 2023, Vol. 50 ›› Issue (5): 3-11.doi: 10.11896/jsjkx.221100159

• 可解释性人工智能 • 上一篇    下一篇

基于可解释性人工智能的软件工程技术方法综述

邢颖   

  1. 北京邮电大学人工智能学院 北京 100876
  • 收稿日期:2022-11-19 修回日期:2023-02-02 出版日期:2023-05-15 发布日期:2023-05-06
  • 通讯作者: 邢颖(xingying@bupt.edu.cn)

Review of Software Engineering Techniques and Methods Based on Explainable Artificial Intelligence

XING Ying   

  1. School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:2022-11-19 Revised:2023-02-02 Online:2023-05-15 Published:2023-05-06
  • About author:XING Ying,born in 1978,Ph.D,is a senior member of China Computer Fe-deration.Her main research interests include software testing and deep lear-ning.

摘要: 在信息处理与决策方面,人工智能(AI)方法相比传统方法表现出了优越的性能。但在将AI模型投入生产时,其输出结果并不能保证完全准确,因此AI技术的“不可信”逐渐成为AI大规模落地的一大阻碍。目前人工智能被逐步应用到软件工程中,其过度依赖历史数据和决策不透明等弊端愈发明显,因此对决策结果做出合理的解释至关重要。文中对可解释性人工智能的基本概念、可解释模型的评估进行了详细阐述,探讨了软件工程与可解释人工智能结合的可行性;同时调研了相关文献,对软件工程中的恶意软件检测、高风险组件检测、软件负载分配、二进制代码相似性分析这4个人工智能的典型应用方向做出分析,讨论如何通过可解释AI揭示系统输出的正确程度,进而提高系统决策的可信度;最后展望未来软件工程与可解释人工智能相结合的研究方向。

关键词: 可解释人工智能, 软件工程, 恶意软件检测, 代码相似性分析

Abstract: In terms of information processing and decision-making,artificial intelligence(AI) methods have shown superior performance compared to traditional methods.However,when AI models are put into production,their output results are not guaranteed to be completely accurate,so the “unreliability” of AI technology has gradually become a major obstacle to the large-scale implementation of AI.As AI is gradually applied to software engineering,the drawbacks of over-reliance on historical data and non-transparent decision-making are becoming more and more obvious,so it is crucial to provide reasonable explanations for the decision results.This paper elaborates on the basic concepts of explainable AI(XAI) and the evaluation of explanation models,and explores the feasibility of combining software engineering with explainable AI.Meanwhile,it investigates relevant researches in software engineering,analyzes the four typical application directions of XAI,namely,malware detection,high-risk component detection,software load distribution,and binary code similarity analysis,to discuss how to reveal the correctness of the system output,thereby increasing the credibility of the software system.This paper also gives insights into the research direction in combining software engineering and explainable artificial intelligence.

Key words: Explainable artificial intelligence, Software engineering, Malware detection, Code similarity analysis

中图分类号: 

  • TP311
[1]VAN T,FISHER W,MANCUSO M.An explainable artificialintelligence system for small-unit tactical behavior[C]//Proceedings 16th Conference Innovative Application Artificial Intelligence.2004:900-907.
[2]GUNNING D,AHA D.DARPA's explainable artificial intelligence(XAI) program[J].AI Magazine,2019,40(2):44-58.
[3]ALAMEDA P X,REEDI M,CELIS E,et al.FAT/MM'19:1st International Workshop on Fairness,Accountability,and Transparency in Multi-Media[C]//The 27th ACM International Conference.2019:2728-2729.
[4]GUY T V.NIPS Workshop on Imperfect Decision Makers2016:Preface[C]//Neural Information Processing Systems.2016:1-3.
[5]YEUNG C,HO D,PHAM B,et al.Enhancing Adjoint Optimization-based Photonics Inverse Design with Explainable Machine Learning[J].ACS Photonics,2022,9(5):1577-1585.
[6]AUGELLO A,INFANTINO I,LIETO A,et al.Towards ADual Process Approach to Computational Explanation in Human-Robot Social Interaction[C]//International Joint Conference on Artificial Intelligence.2017:1-6.
[7]IERACITANO C,MAMMONE N,HUSSAIN A,et al.Anovel explainable machine learning approach for EEG-based brain-computer interface systems[J].Neural Computing and Applications,2021(3):1-14.
[8]ESCALANTE H J,GUYON I,ESCALERA S,et al.Design of an explainable machine learning challenge for video interviews[C]//2017 International Joint Conference on Neural Networks(IJCNN).Anchorage,AK,USA,2017:3688-3695.
[9]CHANDER A.Explainable AI:The New 42?[C]//Proceedings MAKE-Explainable AI.2017:295-303.
[10]WU Y N.Discuss the application of AI in software engineering[J].Network Security Technology & Application,2021(8):52-54.
[11]LU Y.Application of Software Engineering Methods in Compu-ter Software Development[J].Software,2022,43(8):176-178.
[12]LIU X.Application of AI in software engineering[J].Computer &Network,2021,47(3):48.
[13]MEIR K,MARK L.Artificial Intelligence Methods for Software Engineering[M].World Scientific Publishing Company.2021:6-15.
[14]ADADI A,BERRADA M.Peeking Inside the Black-Box:A Survey on Explainable Artificial Intelligence(XAI)[J].IEEE Access,2018,6:52138-52160.
[15]BARREDO A,DIAZ-RODRIGUEZ N,DEL SER J,et al.Explainable Artificial Intelligence(XAI):Concepts,Taxonomies,Opportunities and Challenges Toward Responsible AI[J].Information Fusion,2020,58:82-115.
[16]HOFFMAN R,MUELLER S,KLEIN G,et al.Metrics for Explainable AI:Challenges and Prospects[J/OL].XAI Metrics,2018:1-50.(2018-12-11)[2023-03-06].https://arxiv.org/abs/1812.04608.
[17]NUNES I,JANNACH D.A Systematic Review and Taxonomy of Explanations in Decision Support and Recommender Systems[J].User Modeling and User-Adapted Interaction,2017,27(3):393-444.
[18]MARKUS A,KORS J,RIJNBEEK P.The Role of Explain ability in Creating Trustworthy Artificial Intelligence for Health Care:a Comprehensive Survey of the Terminology,Design Choices,and Evaluation Strategies[J].Journal of Biomedical Informatics,2021,113:1-11.
[19]DOSHI-VELEZ F,KIM B.Towards a Rigorous Science of Interpretable Machine Learning[J/OL].(2017-02-28)[2023-01-30].https://arxiv.org/abs/1702.08608.
[20]VILONE G,LONGO L.Notions of Explainability and Evaluation Approaches for Explainable Artificial Intelligence[J].Information Fusion,2021,76:89-106.
[21]TURING A.Computing machinery and intelligence[J].Mind,1950,59(236):433-460.
[22]SORET B,JOSHI P,JAGTAP V.Use of artificial intelligence in software development life cycle e a state of the art review[J].Advanced Compute Engineering Communication Technology,2015,4:2278-5140.
[23]MARCOS E.Software engineering research versus software development[J].ACM SIGSOFT Software Engineering Notes,2005,30(4):1-7.
[24]RECH J,ALTHOFF K.Artificial intelligence and software engineering:status and future trends[J].Kunstliche Intelligenz,2004,18:5-11.
[25]PRERSSMAN R,MAXIM B.Software Engineering:A Practitioner's Approach[M].McGraw-Hill,Incorporated,2014.
[26]TANGSRIPAIROJ S,SAMADZADEH M.A Taxonomy of DataMining Applications Supporting Software Reuse[J].Intelligent Systems Design and Applications.Advances in Soft Computing,2003,23:303-312.
[27]FELDT R,NETO F,TORKAR R.Ways of applying artificial intelligence in software engineering[C]//Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Sy-nergies in Software Engineering(RAISE '18).2018:35-41.
[28]CARLETAN A,HARPER E,MENZIES T,et al.The AIEffect:Working at the Intersection of AI and SE[J].IEEE Software,2020,37(4):26-35.
[29]ZHU J H,ZHENG W,YANG F Y,et al.Software quality prediction based on ant colony optimization back-propagation neural network[J/OL].(2023-01-12)[2023-03-06].http://kns.cnki.net/kcms/detail/51.1307.tp.20230111.1318.004.html.
[30]ZHAO H J.Analysis of Software Testing Method of Dynamic Weighted Combined Neural Network Model[J].Electronic Technology,2022,51(4):70-72.
[31]LI Z,CUI Z Q,CHEN X,et al.Deep-SBFL:defect locationmethod of deep neural network based on spectrum[J/OL].(2022-11-15)[2023-03-06].http://www.jos.org.cn/jos/article/abstract/6403.
[32]XU H R,WANG Y J,HAUNG Z J,et al.Compiler FuzzingTest Case Generation with Feed-forward NeuralNetwork[J].Journal of Software,2022,33(6):1996-2011.
[33]MOHAMMADIAN M.Innovative Applications of Artificial Intelligence Techniques in Software Engineering[C]//Artificial Intelligence Applications and Innovations.AIAI 2010.IFIP Advances in Information and Communication Technology,2010.
[34]RAZA E.Artificial intelligence techniques in software enginee-ring(AITSE)[C]//International Multi-Conference of Engineers and Computer Scientists.2009:18-20.
[35]ADLEMAN L.An abstract theory of computer viruses(invited talk)[C]//CRYPTO '88:Proceedings on Advances in cryptology.1990:354-374.
[36]FILIOL E.Computer Viruses:from Theory to Applications[M].Berlin;Springer Science & Business Media,2006.
[37]MCGRAW G,MORRISETT G.Attacking malicious code:report to theinfosec research council[J].IEEE Software,2000,17(5):33-41.
[38]FILIOL E.Malware pattern scanning schemes secure againstblack-box analysis[J].Computer Virology,2006,2(1):35-50.
[39]FILIOL E,JACOB G,LIARD M.Evaluation methodology and theoretical model for antiviral behavioural detection strategies[J].Computer Virology,2007,3(1):27-37.
[40]YE Y,LI T,CHEN Y,et al.Automatic malware categorization using cluster ensemble[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2010:95-104.
[41]CHRISTODORESCU M,JHA S,KRUEGEL C.Mining specificationsof malicious behavior[C]//Proceedings of ESEC/FSE07.2007:5-14.
[42]KOLTER J,MALOOF M.Learning to detect malicious executable in the wild[C]//Proceedings of KDD.2004:470-478.
[43]SCHULTZ M,ESKIN E,ZADOK E.Data mining methods fordetection of new malicious executables.In:Security and privacy[C]//Proceedings of 2001 IEEE Symposium.2001:38-49.
[44]SUNG A,XU J,CHAVEZ P,et al.Static analyzer of vicious executables(save)[C]//Proceedings of the 20th Annual Computer Security Applications Conference.2004:326-334.
[45]WANG J,DENG P,FAN Y,et al.Virus detection using data mining techniques[C]//Proceedings of IEEE International Conference on Data Mining.2003:71-76.
[46]REDDY D,PUJARI A.N-gram analysis for computer virus detec-tion[J].Computer Virology,2006,2:231-239.
[47]YE Y,CHEN L,WANG D,et al.SBMDS:an interpretablestring based malware detection system using SVM ensemble with bagging[J].Journal in Computer Virology,2009,5(4):283.
[48]DIETTERICH T.Machine learning research:Four current directions[J].AI Magazine,1997,18(4):97-136.
[49]BREIMA L.Bagging predicators[J].Machine Learning,1996,24:123-140.
[50]DANG X,GONG D,YAO X,et al.Enhancement of Mutation Testing via Fuzzy Clustering andMulti-population Genetic Algorithm[J].IEEE Transactions on Software Engineering,2021,48(6):2141-2156.
[51]WANG S Y,SUN Y Q,SUN J Z.Detection of Bad Smell in Code Based on BP Neural Network[J].Computer Engineering,2020,46(10):216-222,230.
[52]JIANG Y,WANG S,WU K Q,et al.Software defect prediction model based on quantum immune clonal BP algorithm[J].Journal of Southwest Minzu University(Natural Science Edition),2022,48(5):537-542.
[53]LI E H.Research on Accurate Prediction of Android Software defects based on Hybrid Neural Network[J].Automation & Instrumentation,2022,8:33-36,41.
[54]CUI M T,LONG S L,JIANG Y,et al.Research of Software Defect Prediction Model Based on Complex Network and Graph Neural Network[J].Entropy,2022,24(10):1373.
[55]BASILI V,PERRICONE B.Software errors and complexity:an empirical investigation[C]//Communications of the ACM.1984:42-52.
[56]MUNSON J,KHOSHGOFTAAR Y.The detection of fault-prone programs[J].IEEE Transactions on Software Enginee-ring,1992,18(5):423.
[57]SELBY R,PORTER A.Learning from examples:generation and evaluation of decision trees for software resource analysis[J].IEEE Transactions on Software Engineering,1988,14(12):1743-1757.
[58]BRIAND L,BAILI V,THOMAS W.A pattern recognition approach for software engineering data analysis[J].IEEE Tran-sactions on Software Engineering,1992,18(11):931-942.
[59]BRIAND L.Developing interpretable models with optimized set reduction for identifying high-risk software components[J].IEEE Transactions on Software Engineering,1993,19(11):1028-1044.
[60]NORBERT S,ALEXANDER G,CHRISTIAN K.Performance-influence models for highly configurable systems[C]//Procee-dings of the 2015 10th Joint Meeting on Foundations of Software Engineering.2015:284-294.
[61]KASHI V V,NACHIAPPAN N.Characterizing cloud compu-ting hardware reliability[C]//Proceedings of the 1st ACM symposium on Cloud computing.2010:193-204.
[62]XU T L,JIN L,FAN X P,et al.Hey,you have given me too many knobs!understanding and dealing with over-designed configuration in system software[C]//Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering.2015:307-319.
[63]XU T Y,JIN X X,HUANG P,et al.Early detection of configuration errors to reduce failure damage[C]//12th USENIX Symposium on Operating Systems Design and Implementation(OSDI 16).2016.
[64]ADAM B,GEORGE P,ANA K,et al.IX:A Protected Data-plane Operating System for High Throughput and Low Latency[C]//11th USENIX Symposium on Operating Systems Design and Implementation(OSDI 14).2014:49-65.
[65]DANA V,ANDREW P,GEOFFREY J,et al.Automatic database management system tuning through large-scale machine learning[C]//Proceedings of the 2017 ACM International Conference on Management of Data.2017:1009-1024.
[66]YI D,NIKITA M,HENRY H.Generative and multi-phaselearning for computer systems optimization[C]//Proceedings of the 46th International Symposium on Computer Architecture.2019:39-52.
[67]MILTIADIS A,EERL T,PREMKUMAR D,et al.A survey of machine learning for big code and naturalness[J].ACM Computing Surveys.2018,51(4):1-37.
[68]WAN Z Y,XIA X, LO D,et al.How does machine learning change software development practices?[J].IEEE Transactions on Software Engineering,2019,47(9):1857-1871.
[69]ENGIN Ï,SALLY A,RICH C,et al.Efficiently exploring architectural design spaces via predictive modeling[J].ACM SIGOPS Operating Systems Review,2006,40(5):195-206.
[70]NARDI L,SOUZA A,KOEPLINGER D,et al.HyperMapper:a Practical Design Space Exploration Framework[C]//2019 IEEE 27th International Symposium on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems(MAS-COTS).2019:425-426.
[71]YI D,RISI K,JONATHAN E.Multi-resolution Kernel Appro-ximation for Gaussian Process Regression[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17).Long Beach,California,USA,2017:1-9.
[72]ATEFEH M,ANINDA M,BENJAMIN C,et al.Bayesian Optimization for Efficient Accelerator Synthesis[J].ACM Transactions on Architecture and Code Optimization(TACO),2020,18(1):1-25.
[73]FINALE D B,EEN K.Towards a rigorous science of interpretable machine learning[J].arXiv:1702.08608,2017.
[74]YIN Z N,DING Y,ZHOU Y Y,et al.How do fixes becomebugs?[C]//Proceedings of the 19th ACM SIGSOFT Sympo-sium and the 13th European Conference on Foundations of Software Engineering.2011:26-36.
[75]JIANMEI G,KRZYSZTOF C,SVEN A,et al.Variability-aware performance prediction:A statistical learning approach[C]//2013 28th IEEE/ACM International Conference on Automated Software Engineering(ASE).2013:301-311.
[76]CHRISTIAN K,ALEXANDER G,NORBERT S,et al.The interplay of sampling and machine learning for software perfor-mance prediction[J].IEEE Software,2020,37(4):58-66.
[77]FLAVIO M,CHRISTIAN K,MARCIO R,et al.A Comparison of 10 Sampling Algorithms for Configurable Systems[C]//Proceedings of the 38th International Conference on Software Engineering(Austin,Texas).2016:643-654.
[78]MOLNAR C.Interpretable machine learning[M].Lulu.com,2020.
[79]CHI L,SHU W,HENRY H.Statically inferring performanceproperties of software configurations[C]//Fifteenth EuroSys Conference(EuroSys'20).2020:1-16.
[80]VAN C K,JALEEL A,EECKHOUT L,et al.Scheduling hete-rogeneous multi-cores through performance impact estimation(PIE)[J].ACM SIGARCH Computer Architecture News,2012,40(3):216-224.
[81]SHU W,CHI L,HENRY H,et al.Understanding and auto-adjusting performance-sensitive configurations[J].ACM SIGPLAN Notices,2018,53(2):154-168.
[82]DING Y,PERVAIZ A,CARBIN M,et al.Generalizable and interpretable learning for configuration extrapolation[C]//Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering.2021:728-740.
[83]REISS S.Semantics-based code search[C]//Proceedings of the International Conference on Software Engineering.2009:243-253.
[84]XIA B,PANG J M,ZHOU X,et al.Research progress on binary code similarity search[J].Journal of Computer Applications,2022,42(4):985-998.
[85]SHIN E,SONG D,MOAZZEZI R.Recognizing functions in binaries with neural networks[C]//Proceedings of the USENIX Security Symposium.2015:611-626.
[86]BAO T,BURKET J,WOO M,et al.ByteWeight:Learning torecognize functions in binary code[C]//Proceedings of the USENIX Security Symposium.2014:845-860.
[87]LUO L,MING J,WU D,et al.Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection[C]//Proceedings of the International Symposium on Foundations of Software Engineering.2014:389-400.
[88]ZHANG F,WU D,LIU P,et al.Program logic based software plagiarism detection[C]//Proceedings of the IEEE International Symposium on Software Reliability Engineering.2014:66-77.
[89]MENG X,MILLER B,JUN K.Identifying multiple authors in a binary program[C]//Proceedings of the European Symposium on Research in Computer Security.2017:286-304.
[90]PEWNY J,SCHUSTER F,BERNHARD L,et al.Leveraging semantic signatures for bug search in binary programs[C]//Proceedings of the Annual Computer Security Applications Conference.2014:406-415.
[91]ESCHWEILER S,YAKDAN K,GERHARDS-PADILLA E.discovRE:Efficient cross-architecture identification of bugs in binary code[C]//Proceedings of the Network and Distributed System Security Symposium.2016:58-79.
[92]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the ACM Conference on Computer and Communications Security.2017:363-376.
[93]GAO J,YANG X,FU Y,et al.VulSeeker:A semantic learning based vulnerability seeker for cross-platform binary[C]//Proceedings of the ACM/IEEE International Conference on Automated Software Engineering.2018:896-899.
[94]DAVID Y,PARTUSH N,YAHAV E.FirmUp:Precise static detection of common vulnerabilities in firmware[C]//Procee-dings of the International Conference on Architectural Support for Programming Languages and Operating Systems.2018:392-404.
[95]CHANDRAMOHAN M,XUE Y,XU Z,et al.Bin-Go:Cross-architecture cross-os binary search[C]//Proceedings of the International Symposium on Foundations of Software Engineering.2016:678-689.
[96]FENG Q,WANG M,ZHANG M,et al.Extracting conditional formulas for cross-platform bug search[C]//Proceedings of the ACM Symposium on Information,Computer and Communications Security.2017:346-359.
[97]SHIRANI P,COLLARD L,AGBA B.Binarm:Scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices[C]//International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Springer,2018:114-138.
[98]XUE Y,XU Z,CHANDRAMOHA M,et al.Accurate and scalable cross-architecture cross-os binary code search with emulation[J].IEEE Transactions on Software Engineering,2018,45(11):1125-1149.
[99]LIU B,HUO W,ZHANG C,et al.αdiff:Cross-version binarycode similarity detection with DNN[C]//Proceedings of the ACM/IEEE International Conference on Automated Software Engineering.2018:667-678.
[100]DING S,FUNG B,CHARLANG P.Asm2Vec:Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]//Proceedings of the IEEE Symposium on Security and Privacy.IEEE,2019:472-489.
[101]MASSARELLI L,LUNA G,PETRONI F,et al.SAFE:Self-attentive function embeddings for binary similarity[C]//International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Springer,2019:309-329.
[102]PEWNY J,GARMANY B,GAWLIK R,et al.Cross architecture bug search in binary executables[C]//Proceedings of the IEEE Symposium on Security and Privacy.IEEE,2015:709-724.
[103]FENG Q,ZHOU R,XU C,et al.Scalable graph-based bugsearch for firmware images[C]//Proceedings of the ACM Conference on Computer and Communications Security.2016:480-491.
[104]ZUO F,LI X,ZHANG Z,et al.Neural machine translation inspired binary code similarity comparison beyond function pairs[C]//Proceedings of the Network and Distributed System Security Symposium.2019:1-15.
[105]MARASTONI N,GIACOBAZZI R,DALLA PREDA M.A deep learning approach to program similarity[C]//Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis.ACM,2018:26-35.
[106]REDMOND K,LUO L,ZENG Q.A cross-architecture instruction embedding model for natural language processing-inspired binary code analysis[J/OL].(2018-12-23)[2023-01-30].https://arxiv.org/abs/1812.09652.
[107]KIM D,KIM E,CHA S K,et al.Revisiting Binary Code Simila-rity Analysis using Interpretable Feature Engineering and Lessons Learned[J].arXiv:2011.10749,2022.
[108]JIAN G,XIN Y,YING F,et al.VulSeeker-pro:enhanced semantic learning based binary vulnerability seeker with emulation[C]//The 2018 26th ACM Joint Meeting.2018:803-808.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!