计算机科学 ›› 2014, Vol. 41 ›› Issue (6): 148-154.doi: 10.11896/j.issn.1002-137X.2014.06.029

• 软件与数据库技术 • 上一篇    下一篇

基于LaTex的Web数学公式提取方法研究

陈立辉,苏伟,蔡川,陈晓云   

  1. 兰州大学信息科学与工程学院 兰州730000;兰州大学信息科学与工程学院 兰州730000;兰州大学信息科学与工程学院 兰州730000;兰州大学信息科学与工程学院 兰州730000
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61003139,2),教育部-英特尔信息技术专项科研基金(MOE-INTEL-11-03),中央高校基本科研业务费专项资金(lzujbky-2013-39,lzujbky-2013-188,lzujbky-2013-187)资助

Research of Extraction Method of Web Mathematical Formula Based on LaTex

CHEN Li-hui,SU Wei,CAI Chuan and CHEN Xiao-yun   

  • Online:2018-11-14 Published:2018-11-14

摘要: 数学论坛、Wiki等社会性网站对数学教育的影响日益增长,数学公式广泛存在这些网站中,如何对这些网站中的数学公式进行搜索,对学习和科研非常重要。数学公式提取是索引系统的前提和基础,文中主要研究LaTex格式的数学公式的提取方法,结合BNF表述方式,提出自动分析提取包含LaTex公式特征的方法。依据公式包含的特征,提出提取和过滤LaTex数学公式的方法规则。通过实验发现,该方法的查全率达到75%,查准率达到99%。

关键词: 数学公式,LaTex,查准率,查全率,主题爬虫,搜索引擎 中图法分类号TP311文献标识码A

Abstract: The influence of Wiki,mathematics forum and other social networking sites on the mathematics education field is growing.Mathematical formulas exist widely in these websites.How to search the mathematical formulas of these websites is very important for study and research.The extraction of mathematical formulas is the premise and foundation of the indexing system.This paper mainly studied the format of the LaTex mathematical formulas,and pre-sented the automatic analysis extraction method of Web mathematical formulas based on LaTex through the BNF paradigm.According to features the formulas contain,the paper proposed the method of extraction and filtration of LaTex mathematical formula.The experiment discovers that the recall rate reaches 75% and the precision rate comes to 99% using this method.

Key words: Mathematical formula,LaTex,Precision,Recall,Topic crawler,Search engine

[1] 赵飞,周涛,张良,等.维基百科研究综述[J].电子科技大学学报,2010,39(3):322
[2] Krebs M,Ludwig M,Müller W.Learning Mathematics using a Wiki[J].Procedia-Social and Behavioral Sciences,2010,2(2):1469-1476
[3] Lamport L.LATEX:User’s Guide & Reference Manual [M].Addison-Wesley Publishing Company Inc,1994
[4] 聂俊,陈天莹,符红光.基于Latex的互联网数学公式搜索引擎[J].计算机应用,2010(12):312-315
[5] 赵琳.基于知识本体的数学公式语义检索方法与技术研究[D].天津:南开大学,2011
[6] Samarasinghe S H,Hui S C.Mathematical document retrieval for problem solving[C]∥2009International Conference on Computer Engineering and Technology.2009,1:583-587
[7] Misutka J,Galambos L.Mathematical extension of full textsearch engine indexer[C]∥ICTTA.Damascus,April 2008:1-6
[8] Shatnawi M,Youssef A.Equivalence detection using parse-tree normalization for math search[C]∥ 2nd International Confe-rence on Digital Information Management,2007(ICDIM’07).IEEE,2007,2:643-648
[9] Kohlhase M,Sucan I.A Search Engine for Mathematical Formulae[C]∥8th International Conference on Artificial Intelligence and Symbolic Computation (AISC 2006).2006:241-253
[10] Miutka J,Galambo L.System description:EgoMath2as a tool for mathematical searching on wikipedia.org[M]∥Intelligent Computer Mathematics.Springer Berlin Heidelberg,2011:307-309
[11] Miner R,Munavalli R.An approach to mathematical searchthrough query formulation and data normalization[M]∥Towards Mechanized Mathematical Assistants,MKM 2007. 2007:342-355
[12] Libbrecht P,Melis E.Methods to access and retrieve mathematical content in activemath[C]∥Proceedings of the Second International Conference on Mathematical Software.2006:331-342
[13] Kohlhase M.OMDoc-An Open Markup Format for Mathematical Documents [version 1.2]:Foreword by Alan Bundy[M].Springer,2006
[14] Youssef A.Roles of math search in mathematics[C]∥Procee-dings of the 5th International Conference on Mathematical Knowledge Management.Springer Berlin Heidelberg,2006:2-16
[15] 刘志伟.数学搜索引擎研究[D].兰州:兰州大学,2011
[16] Guo Wei,Su Wei,Lian Li,et al.MQL:A Mathematical Formula Query Language for Mathematical Search[C]∥IEEE 14th International Conference on Computational Science and Engineering (CSE).IEEE,2011:245-250
[17] 景珂.网络数学搜索中的数学查询语言与索引的研究[D].兰州:兰州大学,2009
[18] 崔林卫,苏伟,郭卫,等.基于Nutch的Web数学公式提取[J].广西师范大学学报:自然科学版,2011,29(1)
[19] Srinivasan P,Menczer F,Pant G.A general evaluation fram-ework for topical crawlers[J].Information Retrieval,2005,8(3):417-447
[20] Menczer F,Pant G,Srinivasan P.Topical web crawlers:Evaluating adaptive algorithms[J].ACM Transactions on Internet Technology (TOIT),2004,4(4):378-419
[21] 郑冬冬,赵朋朋,崔志明,等.Deep Web爬虫研究与设计[J].清华大学学报:自然科学版,2005,45(9):1896-1902
[22] 谭思亮.聚焦爬行系统的设计—算法视角[D].成都:中国科学院研究生院(成都计算机应用研究所),2006
[23] Fuentes Sepúlveda J,Ferres L.Improving accessibility to mathematical formulas:the Wikipedia Math Accessor[J].New Review of Hypermedia and Multimedia,2012,18(3):183-204
[24] Abelson H,Dybvig R K,Haynes C T,et al.Revised report on the algorithmic language scheme[J].ACM SIGPLAN Lisp Pointers,1991,4(3):1-55

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!