计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 47-51.doi: 10.11896/jsjkx.190600042
韩磊, 胡建鹏
HAN Lei, HU Jian-peng
摘要: GCC(GNU Compiler Collection)编译器编译C语言源程序所生成的抽象语法树文本中包含大量与源代码无关的冗余信息,若直接进行解析,会严重影响分析效率,降低分析精确度,同时会占用大量存储空间。针对此问题,提出一种基于关键词Trie树的GCC抽象语法树消除冗余算法,其根据包含抽象语法树文本有用信息节点的关键词建立Trie树,可实现对抽象语法树文本无用节点的过滤,从而达到优化编译的效果。相比传统KMP消除冗余算法,关键词Trie树算法可以有效避免去冗余过程中常量、变量等有用信息节点的丢失,确保数据的完整性;同时,关键词Trie树算法可以最大限度地减少重复前缀或后缀字符串的比较次数,节省了时空开销。挑选不同长度的C语言源码文件进行去冗余实验,测试该算法的性能,并将其与传统KMP算法进行对比。实验结果表明,所提算法的去冗效率和查准率均得到了极大的提高。
中图分类号:
[1] 王亚刚.深入分析GCC[M].北京:机械工业出版社,2017. [2] GCC.The GNU compiler collection [EB/OL].[2018-10-26].http://gcc.gnu.org. [3] ZHANG T,CHEN J,JIANG H,et al.Bug Report Enrichment with Application of Automated Fixer Recommendation[C]//2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).IEEE Computer Society,2017. [4] VISHWACHI,GUPTA S.Detection of near-miss clones using metrics and Abstract Syntax Trees[C]//International Conferenceon Inventive Communication and Computational Technologles (ICICCT).New York:IEEE,2017:230-234. [5] YANG X C,JIANG J,MA X D,et al.Program slicing technology based on critical variable dataflow analysis algorithm in GCC compiler[J].Computer Engineering and Applications,2017,53(24):40-47,54. [6] YANG X L,LIU J.Statically Detecting Likely Buffer Overflow Vulnerabilities in C/C++ Program[J].Computer Engineering and Applications,2004(20):108-110. [7] ANTONIOL G,PENTA M D,MASONE G,et al.XOgastan:XML-oriented gcc AST analysis and transformations[C]//Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.IEEE,2003. [8] LI X,WANG T T,SU X H,et al.Research on Eliminating Redundancies of GCC AST Test[J].Computer Science,2008(10):170-172. [9] TIAN B C,SUN K,CHAO H Q.New Algorithm of Simplifying GCC Syntax Tree[J].Computer Science,2015,42(S1):516-518,530. [10] ZHANG Q,JIN Y C,LI M,et al.Implementation of Trie Tree Router Lookup Algorithm in Network Processor[J].Computer Engineering,2014,40(4):98-102. [11] XU G T,ZHANG M.Research on Trie Tree Keeyword FastMatching Algorithm in Network Security Situational Awareness[J].Netinfo Security,2019(4):55-62. [12] LI A,HE D,WANG H.An advanced trie-based HTTP parsing algorithm[C]//Sixth International Conference on Information Science & Technology.IEEE,2016. [13] YANG T,MI Z,DUAN R,et al.An ultra-fast universal incremental update algorithm for trie-based routing lookup[C]//2012 20th IEEE International Conference on Network Protocols (ICNP).IEEE Computer Society,2012. [14] XUE P Q,XIAN Y,NURBOL,et al.Sensitive information filtering algorithm based on Uyghur text information network research[J].Computer Engineering and Applications,2018,54(5):236-241,246. [15] GONG P,OSBORN W.A Compact-Trie-Based Structure for K-Nearest-Neighbour Searching[C]//IEEE International Conference on Advanced Information Networking & Applications.IEEE,2017:578-585. [16] GHASEMI C,YOUSEFI H,SHIN K G,et al.A Fast and Memory-Efficient Trie Structure for Name-Based Packet Forwarding[C]//2018 IEEE 26th International Conference on Network Protocols (ICNP).IEEE,2018. [17] HUANG R,ZHAO Y.C Programming[M].BeiJing:Tsinghua University Press,2012. [18] DEITEL P,DEITEL H C.How To Program International Edition,Seventh Edition[M].Beijing:Publishing House of Electronics Industry,2018. |
[1] | 冉丹, 陈哲, 孙毅, 杨志斌. 基于程序转化的SCADE模型检测 SCADE Model Checking Based on Program Transformation 计算机科学, 2021, 48(12): 125-130. https://doi.org/10.11896/jsjkx.201100080 |
[2] | 常建明, 薄莉莉, 孙小兵. 面向缺陷定位的代码搜索引擎 Code Search Engine for Bug Localization 计算机科学, 2021, 48(12): 140-148. https://doi.org/10.11896/jsjkx.201100209 |
[3] | 丁嵘, 于千惠. 基于AADL的自主无人系统可成长框架 Growth Framework of Autonomous Unmanned Systems Based on AADL 计算机科学, 2020, 47(12): 87-92. https://doi.org/10.11896/jsjkx.201100173 |
[4] | 丁丽丽,李雁冰,张素平,王鹏翔,张庆花. 分支嵌套循环的自动并行化研究 Auto-parallelization Research Based on Branch Nested Loops 计算机科学, 2017, 44(5): 14-19. https://doi.org/10.11896/j.issn.1002-137X.2017.05.003 |
[5] | 柯昌博,黄志球,肖甫. 基于本体概念相似度的软件构件检索方法 Software Component Retrieval Method Based on Ontology Concept Similarity 计算机科学, 2017, 44(12): 144-149. https://doi.org/10.11896/j.issn.1002-137X.2017.12.028 |
[6] | 董钰山,李春江. Intel64体系结构的数据预取机制及效果 Mechanism and Capability of Data Prefetching in Intel64 Architecture 计算机科学, 2016, 43(5): 34-41. https://doi.org/10.11896/j.issn.1002-137X.2016.05.006 |
[7] | 徐颖,李春江,董钰山,周思齐. GCC编译器中编译指导的自动向量化实现 Implementation of Auto-vectorization Based on Directives in GCC 计算机科学, 2014, 41(Z11): 364-367. |
[8] | 杨昌坤,许庆国. C程序控制流程模型的提取技术与实现 Technology and Implementation of Extracting Control Flow Model of C Program 计算机科学, 2014, 41(5): 208-214. https://doi.org/10.11896/j.issn.1002-137X.2014.05.043 |
[9] | 王淅娜,喻建鹏. 一种改进的Boyer-Moore算法在IDS中的应用 Improved Boyer-Moore Algorithm Applied in IDS 计算机科学, 2013, 40(Z11): 196-198. |
[10] | 刘晓娴,赵荣彩,丁锐. 面向DSWP并行的OpenMP任务调度机制的扩展与实现 Extension to OpenMP Task Scheduling Mechanism for DSWP Parallelization and its Implementation 计算机科学, 2013, 40(9): 38-43. |
[11] | 李春江,黄娟娟,徐颖,杜云飞,陈娟. 典型编译器自动向量化效果评估与分析 Evaluation and Analysis of Effects of Auto-vectorization in Typical Compilers 计算机科学, 2013, 40(4): 41-46. |
[12] | 李春江, 杜云飞, 倪晓强, 王永文, 杨灿群. 基于GCC实现飞腾处理器向量处理单元的编译器后端 Implementing Compiler Backend for Vector Processing Unit of FT Processor Based on GCC 计算机科学, 2013, 40(12): 19-22. |
[13] | 李春江,杜云飞,易会战,杨灿群. GCC中内嵌函数实现剖析 Anatomy of the Implementation of Builtin Functions in GCC 计算机科学, 2012, 39(Z6): 357-359. |
[14] | 李春江,杜云飞,倪晓强,王永文,杨灿群. GCC后端中四路双精度短向量寄存器的实现 Implementation of Four-way Double Precision Short Vector Registers in GCC Backend 计算机科学, 2012, 39(9): 292-295. |
[15] | 李旭 卢凯 李根. Jikes RVM动态编译技术分析与性能评测 计算机科学, 2009, 36(4): 129-132. |
|