Computer Science ›› 2020, Vol. 47 ›› Issue (9): 47-51.doi: 10.11896/jsjkx.190600042

• Computer Software • Previous Articles     Next Articles

Deduplication Algorithm of Abstract Syntax Tree in GCC Based on Trie Tree of Keywords

HAN Lei, HU Jian-peng   

  1. School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China
  • Received:2019-06-11 Published:2020-09-10
  • About author:HAN Lei,born in 1993,postgraduate,is a member of China Computer Federation.His main research interests include machine learning,data mining.
    HU Jian-peng,born in 1980,associate professor,is a member of China Computer Federation.His main research interests include software engneering,service computing and data mining.
  • Supported by:
    National Natural Science Foundation of China (61802252) and Shanghai University of Engineering Science Golden Course Cultivation Project (DQY19004).

Abstract: The abstract syntax tree text generated by GCC compiler compiling C language source program contains a lot of redundant information independent of source code.If directly parsed,it will seriously affect the analysis efficiency,reduce the analysis accuracy,and occupy a large amount of storage space.Aiming at this problem,a GCC abstract syntax tree elimination redundancy algorithm based on the keyword Trie tree is proposed.The Trie tree is built according to the keywords containing the abstract syntax tree text useful information nodes,which can filter the useless node information of the syntax tree text,thus achieving optimized compilation results.Compared with the traditional KMP redundancy elimination algorithm,the keyword Trie tree algorithm can effectively avoid the loss of useful information nodes such as constants and variables in the process of redundancy removal and ensure the integrity of data.At the same time,the keyword Trie tree algorithm can minimize the comparison of repeated prefixes or suffix strings,saving time and space overhead.This paper selects different lengths of C language source files for de-redundancy experiments,tests the performance of the algorithm,and compares it with the traditional KMP algorithm.The experimental results show that the algorithm can greatly improve the redundancy efficiency and precision.

Key words: Abstract syntax tree, Eliminate redundancy, GCC, Keyword Trie tree, KMP, Optimized compilation

CLC Number: 

  • TP311
[1] 王亚刚.深入分析GCC[M].北京:机械工业出版社,2017.
[2] GCC.The GNU compiler collection [EB/OL].[2018-10-26].http://gcc.gnu.org.
[3] ZHANG T,CHEN J,JIANG H,et al.Bug Report Enrichment with Application of Automated Fixer Recommendation[C]//2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).IEEE Computer Society,2017.
[4] VISHWACHI,GUPTA S.Detection of near-miss clones using metrics and Abstract Syntax Trees[C]//International Conferenceon Inventive Communication and Computational Technologles (ICICCT).New York:IEEE,2017:230-234.
[5] YANG X C,JIANG J,MA X D,et al.Program slicing technology based on critical variable dataflow analysis algorithm in GCC compiler[J].Computer Engineering and Applications,2017,53(24):40-47,54.
[6] YANG X L,LIU J.Statically Detecting Likely Buffer Overflow Vulnerabilities in C/C++ Program[J].Computer Engineering and Applications,2004(20):108-110.
[7] ANTONIOL G,PENTA M D,MASONE G,et al.XOgastan:XML-oriented gcc AST analysis and transformations[C]//Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.IEEE,2003.
[8] LI X,WANG T T,SU X H,et al.Research on Eliminating Redundancies of GCC AST Test[J].Computer Science,2008(10):170-172.
[9] TIAN B C,SUN K,CHAO H Q.New Algorithm of Simplifying GCC Syntax Tree[J].Computer Science,2015,42(S1):516-518,530.
[10] ZHANG Q,JIN Y C,LI M,et al.Implementation of Trie Tree Router Lookup Algorithm in Network Processor[J].Computer Engineering,2014,40(4):98-102.
[11] XU G T,ZHANG M.Research on Trie Tree Keeyword FastMatching Algorithm in Network Security Situational Awareness[J].Netinfo Security,2019(4):55-62.
[12] LI A,HE D,WANG H.An advanced trie-based HTTP parsing algorithm[C]//Sixth International Conference on Information Science & Technology.IEEE,2016.
[13] YANG T,MI Z,DUAN R,et al.An ultra-fast universal incremental update algorithm for trie-based routing lookup[C]//2012 20th IEEE International Conference on Network Protocols (ICNP).IEEE Computer Society,2012.
[14] XUE P Q,XIAN Y,NURBOL,et al.Sensitive information filtering algorithm based on Uyghur text information network research[J].Computer Engineering and Applications,2018,54(5):236-241,246.
[15] GONG P,OSBORN W.A Compact-Trie-Based Structure for K-Nearest-Neighbour Searching[C]//IEEE International Conference on Advanced Information Networking & Applications.IEEE,2017:578-585.
[16] GHASEMI C,YOUSEFI H,SHIN K G,et al.A Fast and Memory-Efficient Trie Structure for Name-Based Packet Forwarding[C]//2018 IEEE 26th International Conference on Network Protocols (ICNP).IEEE,2018.
[17] HUANG R,ZHAO Y.C Programming[M].BeiJing:Tsinghua University Press,2012.
[18] DEITEL P,DEITEL H C.How To Program International Edition,Seventh Edition[M].Beijing:Publishing House of Electronics Industry,2018.
[1] RAN Dan, CHEN Zhe, SUN Yi, YANG Zhi-bin. SCADE Model Checking Based on Program Transformation [J]. Computer Science, 2021, 48(12): 125-130.
[2] CHANG Jian-ming, BO Li-li, SUN Xiao-bing. Code Search Engine for Bug Localization [J]. Computer Science, 2021, 48(12): 140-148.
[3] DING Rong, YU Qian-hui. Growth Framework of Autonomous Unmanned Systems Based on AADL [J]. Computer Science, 2020, 47(12): 87-92.
[4] DING Li-li, LI Yan-bing, ZHANG Su-ping, WANG Peng-xiang and ZHANG Qing-hua. Auto-parallelization Research Based on Branch Nested Loops [J]. Computer Science, 2017, 44(5): 14-19.
[5] KE Chang-bo, HUANG Zhi-qiu and XIAO Fu. Software Component Retrieval Method Based on Ontology Concept Similarity [J]. Computer Science, 2017, 44(12): 144-149.
[6] DONG Yu-shan and LI Chun-jiang. Mechanism and Capability of Data Prefetching in Intel64 Architecture [J]. Computer Science, 2016, 43(5): 34-41.
[7] TIAN Bing-chuan, SUN Ke and CHAO Han-qing. New Algorithm of Simplifying GCC Syntax Tree [J]. Computer Science, 2015, 42(Z6): 516-518.
[8] XU Ying,LI Chun-jiang,DONG Yu-shan and ZHOU Si-qi. Implementation of Auto-vectorization Based on Directives in GCC [J]. Computer Science, 2014, 41(Z11): 364-367.
[9] YANG Chang-kun and XU Qing-guo. Technology and Implementation of Extracting Control Flow Model of C Program [J]. Computer Science, 2014, 41(5): 208-214.
[10] WANG Xi-na and YU Jian-peng. Improved Boyer-Moore Algorithm Applied in IDS [J]. Computer Science, 2013, 40(Z11): 196-198.
[11] LIU Xiao-xian,ZHAO Rong-cai and DING Rui. Extension to OpenMP Task Scheduling Mechanism for DSWP Parallelization and its Implementation [J]. Computer Science, 2013, 40(9): 38-43.
[12] LI Chun-jiang,HUANG Juan-juan,XU Ying,DU Yun-fei and CHEN Juan. Evaluation and Analysis of Effects of Auto-vectorization in Typical Compilers [J]. Computer Science, 2013, 40(4): 41-46.
[13] LI Chun-jiang, DU Yun-fei, NI Xiao-qiang, WANG Yong-wen and YANG Can-qun. Implementing Compiler Backend for Vector Processing Unit of FT Processor Based on GCC [J]. Computer Science, 2013, 40(12): 19-22.
[14] . Anatomy of the Implementation of Builtin Functions in GCC [J]. Computer Science, 2012, 39(Z6): 357-359.
[15] . Implementation of Four-way Double Precision Short Vector Registers in GCC Backend [J]. Computer Science, 2012, 39(9): 292-295.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!