计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 252-257.doi: 10.11896/jsjkx.210900210

• 人工智能 • 上一篇    下一篇

一种用于RPA系统的DOM对象快速搜索与定位算法

孟媛, 秦云川, 蔡宇辉, 李肯立   

  1. 湖南大学计算机科学与工程学院 长沙 410082
  • 收稿日期:2021-09-24 修回日期:2022-03-22 出版日期:2022-10-15 发布日期:2022-10-13
  • 通讯作者: 秦云川(qinyunchuan@hnu.edu.cn)
  • 作者简介:(mengyuan2020@hnu.edu.cn)
  • 基金资助:
    国家重点研发计划(2017YFB0202201)

Fast DOM Object Search and Location Algorithm for RPA System

MENG Yuan, QIN Yun-chuan, CAI Yu-hui, LI Ken-li   

  1. School of Computer Science and Engineering,Hunan University,Changsha 410082,China
  • Received:2021-09-24 Revised:2022-03-22 Online:2022-10-15 Published:2022-10-13
  • About author:MENG Yuan,born in 1997,postgra-duate.Her main research interests include artificial intelligence and pattern recognition.
    QIN Yun-chuan,born in 1983,Ph.D,is a member of China Computer Federation.His main research interests include autonomous unmanned systems and high-performance embedded computing.
  • Supported by:
    National Key Research and Development Program of China(2017YFB0202201).

摘要: 机器人流程自动化(RPA)是以软件机器人及人工智能为基础的业务过程自动化科技,能够代替或协助人类在计算机等设备中完成重复性工作。在应用RPA软件对浏览器页面元素进行自动化操作时,在保证准确的前提下快速对目标DOM元素进行定位和搜索是完成一个完整自动化流程的关键技术难点。现有的定位方法,如Xpath和Css-Selector,面对结构复杂的网页会出现路径过长的问题,导致定位速度慢或路径定位不准等。为解决上述问题,提出一种用于RPA系统的DOM对象快速搜索与定位算法——最优XPATH路径算法。该算法分析元素的属性等信息生成最优路径,用于在自动化操作时对元素进行唯一定位。实验结果表明,使用最优路径对元素进行定位所需时间仅为使用完整XPATH路径定位耗时的23.14%,说明所提算法具有降低路径生成难度,加快元素定位速度等优点,提高了自动化效率。

关键词: 机器人流程自动化, DOM元素搜索, DOM元素定位, 自动化, 网页结构

Abstract: Robot process automation(RPA) is a business process automation technology based on software robot and artificial intelligence.It can replace or assist human beings to complete repetitive work in computers and other equipments.When applying RPA software to automate the browser page elements,how to quickly locate and search the target DOM elements on the premise of ensuring accuracy is the key technical difficulty to complete a complete automation process.The existing location methods,such as XPath and Css-Selector,will have problems such as slow location speed or inaccurate path location in the face of web pages with complex structure.In order to solve the above problems,a fast DOM object search and location algorithm for RPA system is proposed:the optimal XPATH path algorithm,which analyzes the attributes of elements and generates the optimal path to uniquely locate elements during automatic operation.Experimental results show that the time required to locate elements using the optimal path is only 23.14% of that using the complete XPATH path.It has the advantages of reducing the difficulty of path generation and improving the element positioning speed,and improves the automation efficiency.

Key words: Robot process automation, DOM element search, DOM element positioning, Automation, Web page structure

中图分类号: 

  • TP312
[1]CHUONG L V,HUNG P D,DIEP V T,et al.Robotic Process Automation and Opportunities for Vietnamese Market[C]//Proceedings of The 7th International Conference on Computer and Communications Management.2019:94-98.
[2]UNAL M A,BOLUKBAS O.The Acquirements of Digitali-zation with RPA(Robotic Process Automation) Technology in the Vakif Participation Bank[C]//ICISS 2021:2021 The 4th International Conference on Information Science and Systems.2021:68-73.
[3]ISSAC R,MUNI R,DESAI K.Delineated Analysis of Robotic Process Automation Tools[C]//2018 Second International Conference on Advances in Electronics,Computers and Communications(ICAECC).2018:1-5.
[4]XU Y F,LIU Y,WU W P.Research and Application of Social Network Data Acquisition Technology[J].Computer Science,2017,44(1):277-282.
[5]LI W Q,SUN X,ZHANG C Y,et al.A Semantic Similarity Measure between Ontological Concepts [J].ACTA Automatica Sinica,2012,38(2):229-235.
[6]WU G Q,HU J,LI L,et al.Online Web News Extraction via Tag Path Feature Fusio[J].Journal of Software,2016,27(3):714-735.
[7]SONG J,YANG X F,LI Y C,et al.Research onRecognition Algorithm for Subject Web Pages Based on Tag Tree Adjacency Matrix[J].Computer Science,2016,43(6):316-320.
[8]NASSIRI H,MACHKOUR M,HACHIMI M.One Query toRetrieve XML and Relational Data[J].Procedia Computer Science,2018,134:340-345.
[9]UZUN E.A Regular Expression Generator Based on CSS Selectors for Efficient Extractionfrom HTML Pages[J].Turkish Journal of Electrical Engineering and Computer Sciences,2020,28(6):3389-3401.
[10]SU Q,LI Z Z,LIU T T,et al.Tree Structure Evaluation Visua-lization Model for Program Debugging[J].Computer Science,2021,48(5):68-74.
[11]THACKSTON R.Exploring the Use of XPath Queries for Automated Assessment of Student Web Development Projects[C]//SIGITE 20:The 21st Annual Conference on Information Technology Education.2020:255-259.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 王岩松, 秦云川, 蔡宇辉, 李肯立.
一种基于UIA接口的RPA系统设计方法
Design and Implementation of RPA System Based on UIA Interface
计算机科学, 2022, 49(8): 225-229. https://doi.org/10.11896/jsjkx.211100046
[3] 高文龙, 周天阳, 朱俊虎, 赵子恒.
基于双向蚁群算法的网络攻击路径发现方法
Network Attack Path Discovery Method Based on Bidirectional Ant Colony Algorithm
计算机科学, 2022, 49(6A): 516-522. https://doi.org/10.11896/jsjkx.210500072
[4] 王田原, 武淑红, 李兆基, 辛昊光, 李璇, 陈永乐.
PGNFuzz:基于指针生成网络的工业控制协议模糊测试框架
PGNFuzz:Pointer Generation Network Based Fuzzing Framework for Industry Control Protocols
计算机科学, 2022, 49(10): 310-318. https://doi.org/10.11896/jsjkx.210700248
[5] 张福昌, 仲国强, 毛玉旭.
面向轻量化医学图像分割网络的神经结构搜索
Neural Architecture Search for Light-weight Medical Image Segmentation Network
计算机科学, 2022, 49(10): 183-190. https://doi.org/10.11896/jsjkx.210800052
[6] 黄双芹, 刘英博, 黄向生.
模型驱动开发工具的自动化测试技术研究
Research on Automatic Testing Technology of Model Driven Development Tools
计算机科学, 2021, 48(6A): 568-571. https://doi.org/10.11896/jsjkx.201000139
[7] 曹浩, 郭绍忠, 刘聃, 许瑾晨.
面向64位RISC-V的基础数学库自动化移植
Automatic Porting of Basic Mathematics Library for 64-bit RISC-V
计算机科学, 2021, 48(6): 41-47. https://doi.org/10.11896/jsjkx.201200058
[8] 周天阳, 曾子懿, 臧艺超, 王清贤.
基于多Agent联合决策的队组协同攻击规划
Team Cooperative Attack Planning Based on Multi-agent Joint Decision
计算机科学, 2021, 48(5): 301-307. https://doi.org/10.11896/jsjkx.200800174
[9] 刘芳, 洪玫, 王潇, 郭丹, 杨正卉, 黄小丹.
面向Java的Randoop自动化单元测试生成工具性能分析
Performance Analysis of Randoop Automated Unit Test Generation Tool for Java
计算机科学, 2020, 47(9): 24-30. https://doi.org/10.11896/jsjkx.200200116
[10] 罗云芳, 唐承娥, 韦军.
基于粗糙规则的脉冲神经膜系统计算能力的研究
Computing Ability of Spiking Neural P System Based on Rough Rules
计算机科学, 2020, 47(6A): 626-630. https://doi.org/10.11896/JsJkx.190500120
[11] 孟繁祎, 王莹, 于海, 朱志良.
复杂软件系统的重构技术:现状、问题与展望
Refactoring of Complex Software Systems Research:PresentProblem and Prospect
计算机科学, 2020, 47(12): 1-10. https://doi.org/10.11896/jsjkx.200800067
[12] 柴锐, 薛凡, 曾建潮, 秦品乐.
一种医学肾动态显像自动化定量评估方法
Automatic Quantitative Evaluation Approach for Medical Renal Dynamic Imaging
计算机科学, 2019, 46(8): 321-326. https://doi.org/10.11896/j.issn.1002-137X.2019.08.053
[13] 仲美稣, 杨勇生, 周亚民.
基于速度控制的自动化码头AGV无冲突路径规划
Free-conflict AGV Path Planning in Automated Terminals Based on Speed Control
计算机科学, 2019, 46(7): 308-314. https://doi.org/10.11896/j.issn.1002-137X.2019.07.047
[14] 童泽平, 吴应强, 任亮, 李巍.
SP-AS/RS基于新型出入口结构的行程时间分析
Travel Time Analysis of SP-AS/RS with New Configuration for I/O Point
计算机科学, 2019, 46(4): 315-320. https://doi.org/10.11896/j.issn.1002-137X.2019.04.049
[15] 张英杰, 朱雪峰.
模式驱动的软件架构设计研究综述
Review of Pattern Driven Software Architecture Design
计算机科学, 2018, 45(11A): 48-52.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!