计算机科学 ›› 2013, Vol. 40 ›› Issue (5): 206-208.

• 人工智能 • 上一篇    下一篇

一种基于双数组Trie的B2B规则串提取方法

李慧,杨炳儒,潘丽芳,钱文彬   

  1. 北京科技大学计算机与通信工程学院知识工程研究所 北京100083;北京科技大学计算机与通信工程学院知识工程研究所 北京100083;北京科技大学计算机与通信工程学院知识工程研究所 北京100083;北京科技大学计算机与通信工程学院知识工程研究所 北京100083
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(61175048,60875029),科技部创新方法工作专项项目(2010IM020900)资助

Rules String Extracting Method for B2B System Based on Double-array Trie

LI Hui,YANG Bing-ru,PAN Li-fang and QIAN Wen-bin   

  • Online:2018-11-16 Published:2018-11-16

摘要: 针对B2B垂直搜索引擎中提取产品规格信息困难的问题,提出了一种基于双数组Trie(Double-Array Trie)的规则串提取方法。该方法针对B2B系统中“参数名:参数值”字符串的规则特征构建规则串,生成双数组Trie树;并优先处理分支结点最多的子树,来提高存储效率。该方法对搜索文本进行一次扫描就能得到所有规则串;通过在规则中加入约束条件,对候选串进行有效过滤,以提高规则串的提取准确率。实验表明,该方法能够降低传统规则串查找的算法复杂度,查找规则串的时间复杂度是O(n)。

关键词: 双数组Trie,垂直搜索,规则串,B2B系统

Abstract: To extract the data of product specification in B2B system,the ruled string extracting method based on dou-ble-array trie was proposed.The data feature is formed as "name:value" for the parameters of the product specification in B2B system.The method constructs the rule according to the data feature of specification parameters.The double-array trie is generated for the extracting processing according to the rules database.The optimization measures are adopted to improve the storing efficiency for the double-array trie.The measures include giving high priority to handle the sub tree with more child node.The method can extract all the ruled string by scanning the input text data once.The accuracy of the extracting results is improved via filtering according to the restrictions condition of the rules.Experimental results show that the extracting method can improve accuracy and decrease complexity comparing to the traditional method.The complexity of the extracting algorithm is O(n).

Key words: Double-array Trie,Vertical search,Rules string,B2B system

[1] Curran K,Glinchey J M.Vertical Search Engines [J].ITB Journal,2008,16:22-28
[2] 雷育生.基于垂直网站的网络信息支持系统研究[J].计算机应用研究,2005,7:105-107
[3] Aoe J.An Efficient Digital Search Algorithm by Using a Double-Array Structure [J].IEEE Transactions on Software Enginee-ring,1989,15(9):1066-1077
[4] Aoe J,Morimoto K,Sato T.An Efficient Implementation of Trie Structures [J].Software Practice and Experience,1992,22(9):695-721
[5] Karoonboonyanan T.An Implementation of Double-Array Trie .http://linux.Thai.net/thep/datrie/datrie.html,2003
[6] 王思力,张华平,王斌.双数组Trie数算法优化及应用研究[J].中文信息学报:人工智能及识别技术,2006,20(5):24-30
[7] 赵欢,朱红权.基于双数组Trie数中文分词研究[J].湖南大学学报,2009,36(5):77-80
[8] 刘燕兵,刘萍,谭建龙,等.基于存储优化的多模式串匹配算法[J].计算机研究与发展,2009,6(10):1768-1776
[9] 刘群,张华平,俞鸿魁,等.基于层次隐马模型的汉语语法分析[J].计算机研究与发展,2004,41(8):1421-1429
[10] Dorji T C,Atlam E-S.New methods for compression of MPdouble array by compact management of suffixes[J].Information Processing & Management,2010,6(5):502-513
[11] Schubert P,Legner C.CAKES-NEGO:Causal knowledge-based expert system for B2B negotiation[J].Expert Systems with Applications,2011,5(1):459-471
[12] Rosenzweig E D,Timothy M.Through the service operationsstrategy looking glass:Influence of industrial sector,ownership,and service offerings on B2B e-marketplace failures[J].Journal of Operations Management,2011,9(1):33-48

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!