计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 47-51.doi: 10.11896/jsjkx.190600042

• 计算机软件* 上一篇    下一篇

基于关键词Trie树的GCC抽象语法树消除冗余算法

韩磊, 胡建鹏   

  1. 上海工程技术大学电子电气工程学院 上海201620
  • 收稿日期:2019-06-11 发布日期:2020-09-10
  • 通讯作者: 胡建鹏(mr@sues.edu.cn)
  • 作者简介:m025117123@sues.edu.cn
  • 基金资助:
    国家自然科学基金项目(61802252);上海工程技术大学金课培育项目(DQY19004)

Deduplication Algorithm of Abstract Syntax Tree in GCC Based on Trie Tree of Keywords

HAN Lei, HU Jian-peng   

  1. School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China
  • Received:2019-06-11 Published:2020-09-10
  • About author:HAN Lei,born in 1993,postgraduate,is a member of China Computer Federation.His main research interests include machine learning,data mining.
    HU Jian-peng,born in 1980,associate professor,is a member of China Computer Federation.His main research interests include software engneering,service computing and data mining.
  • Supported by:
    National Natural Science Foundation of China (61802252) and Shanghai University of Engineering Science Golden Course Cultivation Project (DQY19004).

摘要: GCC(GNU Compiler Collection)编译器编译C语言源程序所生成的抽象语法树文本中包含大量与源代码无关的冗余信息,若直接进行解析,会严重影响分析效率,降低分析精确度,同时会占用大量存储空间。针对此问题,提出一种基于关键词Trie树的GCC抽象语法树消除冗余算法,其根据包含抽象语法树文本有用信息节点的关键词建立Trie树,可实现对抽象语法树文本无用节点的过滤,从而达到优化编译的效果。相比传统KMP消除冗余算法,关键词Trie树算法可以有效避免去冗余过程中常量、变量等有用信息节点的丢失,确保数据的完整性;同时,关键词Trie树算法可以最大限度地减少重复前缀或后缀字符串的比较次数,节省了时空开销。挑选不同长度的C语言源码文件进行去冗余实验,测试该算法的性能,并将其与传统KMP算法进行对比。实验结果表明,所提算法的去冗效率和查准率均得到了极大的提高。

关键词: GCC, KMP, 抽象语法树, 关键词Trie树, 消除冗余, 优化编译

Abstract: The abstract syntax tree text generated by GCC compiler compiling C language source program contains a lot of redundant information independent of source code.If directly parsed,it will seriously affect the analysis efficiency,reduce the analysis accuracy,and occupy a large amount of storage space.Aiming at this problem,a GCC abstract syntax tree elimination redundancy algorithm based on the keyword Trie tree is proposed.The Trie tree is built according to the keywords containing the abstract syntax tree text useful information nodes,which can filter the useless node information of the syntax tree text,thus achieving optimized compilation results.Compared with the traditional KMP redundancy elimination algorithm,the keyword Trie tree algorithm can effectively avoid the loss of useful information nodes such as constants and variables in the process of redundancy removal and ensure the integrity of data.At the same time,the keyword Trie tree algorithm can minimize the comparison of repeated prefixes or suffix strings,saving time and space overhead.This paper selects different lengths of C language source files for de-redundancy experiments,tests the performance of the algorithm,and compares it with the traditional KMP algorithm.The experimental results show that the algorithm can greatly improve the redundancy efficiency and precision.

Key words: Abstract syntax tree, Eliminate redundancy, GCC, Keyword Trie tree, KMP, Optimized compilation

中图分类号: 

  • TP311
[1] 王亚刚.深入分析GCC[M].北京:机械工业出版社,2017.
[2] GCC.The GNU compiler collection [EB/OL].[2018-10-26].http://gcc.gnu.org.
[3] ZHANG T,CHEN J,JIANG H,et al.Bug Report Enrichment with Application of Automated Fixer Recommendation[C]//2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).IEEE Computer Society,2017.
[4] VISHWACHI,GUPTA S.Detection of near-miss clones using metrics and Abstract Syntax Trees[C]//International Conferenceon Inventive Communication and Computational Technologles (ICICCT).New York:IEEE,2017:230-234.
[5] YANG X C,JIANG J,MA X D,et al.Program slicing technology based on critical variable dataflow analysis algorithm in GCC compiler[J].Computer Engineering and Applications,2017,53(24):40-47,54.
[6] YANG X L,LIU J.Statically Detecting Likely Buffer Overflow Vulnerabilities in C/C++ Program[J].Computer Engineering and Applications,2004(20):108-110.
[7] ANTONIOL G,PENTA M D,MASONE G,et al.XOgastan:XML-oriented gcc AST analysis and transformations[C]//Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.IEEE,2003.
[8] LI X,WANG T T,SU X H,et al.Research on Eliminating Redundancies of GCC AST Test[J].Computer Science,2008(10):170-172.
[9] TIAN B C,SUN K,CHAO H Q.New Algorithm of Simplifying GCC Syntax Tree[J].Computer Science,2015,42(S1):516-518,530.
[10] ZHANG Q,JIN Y C,LI M,et al.Implementation of Trie Tree Router Lookup Algorithm in Network Processor[J].Computer Engineering,2014,40(4):98-102.
[11] XU G T,ZHANG M.Research on Trie Tree Keeyword FastMatching Algorithm in Network Security Situational Awareness[J].Netinfo Security,2019(4):55-62.
[12] LI A,HE D,WANG H.An advanced trie-based HTTP parsing algorithm[C]//Sixth International Conference on Information Science & Technology.IEEE,2016.
[13] YANG T,MI Z,DUAN R,et al.An ultra-fast universal incremental update algorithm for trie-based routing lookup[C]//2012 20th IEEE International Conference on Network Protocols (ICNP).IEEE Computer Society,2012.
[14] XUE P Q,XIAN Y,NURBOL,et al.Sensitive information filtering algorithm based on Uyghur text information network research[J].Computer Engineering and Applications,2018,54(5):236-241,246.
[15] GONG P,OSBORN W.A Compact-Trie-Based Structure for K-Nearest-Neighbour Searching[C]//IEEE International Conference on Advanced Information Networking & Applications.IEEE,2017:578-585.
[16] GHASEMI C,YOUSEFI H,SHIN K G,et al.A Fast and Memory-Efficient Trie Structure for Name-Based Packet Forwarding[C]//2018 IEEE 26th International Conference on Network Protocols (ICNP).IEEE,2018.
[17] HUANG R,ZHAO Y.C Programming[M].BeiJing:Tsinghua University Press,2012.
[18] DEITEL P,DEITEL H C.How To Program International Edition,Seventh Edition[M].Beijing:Publishing House of Electronics Industry,2018.
[1] 冉丹, 陈哲, 孙毅, 杨志斌.
基于程序转化的SCADE模型检测
SCADE Model Checking Based on Program Transformation
计算机科学, 2021, 48(12): 125-130. https://doi.org/10.11896/jsjkx.201100080
[2] 常建明, 薄莉莉, 孙小兵.
面向缺陷定位的代码搜索引擎
Code Search Engine for Bug Localization
计算机科学, 2021, 48(12): 140-148. https://doi.org/10.11896/jsjkx.201100209
[3] 丁嵘, 于千惠.
基于AADL的自主无人系统可成长框架
Growth Framework of Autonomous Unmanned Systems Based on AADL
计算机科学, 2020, 47(12): 87-92. https://doi.org/10.11896/jsjkx.201100173
[4] 丁丽丽,李雁冰,张素平,王鹏翔,张庆花.
分支嵌套循环的自动并行化研究
Auto-parallelization Research Based on Branch Nested Loops
计算机科学, 2017, 44(5): 14-19. https://doi.org/10.11896/j.issn.1002-137X.2017.05.003
[5] 柯昌博,黄志球,肖甫.
基于本体概念相似度的软件构件检索方法
Software Component Retrieval Method Based on Ontology Concept Similarity
计算机科学, 2017, 44(12): 144-149. https://doi.org/10.11896/j.issn.1002-137X.2017.12.028
[6] 董钰山,李春江.
Intel64体系结构的数据预取机制及效果
Mechanism and Capability of Data Prefetching in Intel64 Architecture
计算机科学, 2016, 43(5): 34-41. https://doi.org/10.11896/j.issn.1002-137X.2016.05.006
[7] 徐颖,李春江,董钰山,周思齐.
GCC编译器中编译指导的自动向量化实现
Implementation of Auto-vectorization Based on Directives in GCC
计算机科学, 2014, 41(Z11): 364-367.
[8] 杨昌坤,许庆国.
C程序控制流程模型的提取技术与实现
Technology and Implementation of Extracting Control Flow Model of C Program
计算机科学, 2014, 41(5): 208-214. https://doi.org/10.11896/j.issn.1002-137X.2014.05.043
[9] 王淅娜,喻建鹏.
一种改进的Boyer-Moore算法在IDS中的应用
Improved Boyer-Moore Algorithm Applied in IDS
计算机科学, 2013, 40(Z11): 196-198.
[10] 刘晓娴,赵荣彩,丁锐.
面向DSWP并行的OpenMP任务调度机制的扩展与实现
Extension to OpenMP Task Scheduling Mechanism for DSWP Parallelization and its Implementation
计算机科学, 2013, 40(9): 38-43.
[11] 李春江,黄娟娟,徐颖,杜云飞,陈娟.
典型编译器自动向量化效果评估与分析
Evaluation and Analysis of Effects of Auto-vectorization in Typical Compilers
计算机科学, 2013, 40(4): 41-46.
[12] 李春江, 杜云飞, 倪晓强, 王永文, 杨灿群.
基于GCC实现飞腾处理器向量处理单元的编译器后端
Implementing Compiler Backend for Vector Processing Unit of FT Processor Based on GCC
计算机科学, 2013, 40(12): 19-22.
[13] 李春江,杜云飞,易会战,杨灿群.
GCC中内嵌函数实现剖析
Anatomy of the Implementation of Builtin Functions in GCC
计算机科学, 2012, 39(Z6): 357-359.
[14] 李春江,杜云飞,倪晓强,王永文,杨灿群.
GCC后端中四路双精度短向量寄存器的实现
Implementation of Four-way Double Precision Short Vector Registers in GCC Backend
计算机科学, 2012, 39(9): 292-295.
[15] 李旭 卢凯 李根.
Jikes RVM动态编译技术分析与性能评测

计算机科学, 2009, 36(4): 129-132.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!