计算机科学 ›› 2011, Vol. 38 ›› Issue (8): 171-175.

• 数据库与数据挖掘 • 上一篇    下一篇

基于学术社区的学术搜索引擎设计

陈国华,汤庸,彭泽武,李建国   

  1. (中山大学信息科学与技术学院 广州510006);(华南师范大学计算机学院 广州510631)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金时态角色关系模型及协同感知技术研究(60970044),广东自然科技计划项目面向学术信息服务领域的协同软件平台((20108010600031)资助。

Design of an Academic Search Engine Based on the Scholar Community

CHEN Guo-hua,TANG Yong,PENG Ze-wu,LI Jian-guo   

  • Online:2018-11-16 Published:2018-11-16

摘要: 学术社区和学术搜索引擎在科研活动中日趋重要。给出了一个基于学术社区的学术搜索引擎的设计方案,指出了它应具备的功能,提出了应着重解决的关键问题,并对部分问题提出了实现思路。给出了系统的架构设计,并讨论了文献资料的整合算法,将分散在不同位置、提供不同内容的学术信息组合为一个整体,有效地解决了文献提取问题。针对普通中文分词组件在对姓名进行分词时准确率较低的问题,设计了一个专门针对姓名进行分词的高效的算法。在开源框架Nutch和HBase的基础上,实现了一个学术搜索引擎,并在实验中验证了设计的有效性。

关键词: 学术社区,学术搜索引擎,文献整合算法,中文姓名分词

Abstract: Scholar communities and academic search engines are becoming more and more important when conducting researches. We designed an academic search engine based on the scholar community. The functions of the academic search engine and the key problems it should put priority on its consideration were proposed, and the roadmaps to solving these problems were illustrated. Finally, we presented the system architecture and discussed the scholar information integration algorithm which aims to integrate the different content of the academic information provided by different suppliers distributed on different places among the interned I}he algorithm we propose can effectively solve the problem of fetching academic information. Normal Chinese word segmentation components cannot be applied to the Chinese names effectively. Aiming at this problem, we proposed a Chinese name segmentation algorithm which can effectively and efficiently solve the problem. A prototype of such an academic search engine was implemented on the basis of an open source framework Nutch and Hl3ase, which attests the validity of our design.

Key words: Scholar community, Academic search engine, Scholar information integration, Chinese name segmentation

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!