计算机科学 ›› 2011, Vol. 38 ›› Issue (1): 198-202.

• 数据库与数据挖掘 • 上一篇    下一篇

通用抽取引擎框架:一种新的Web信息抽取方法的研究

宫继兵,唐杰,杨文军   

  1. (燕山大学计算机科学与工程系 秦皇岛066004);(清华大学计算机科学与技术系 北京100084);(中石油规划研究院信息中心 北京100083)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家863高技术研究发展计划(No. 2009AA01Z138)和新教师基金(No. 20070003093)资助。

General Extraction Engine Framework:Research of a New Approach for Web Information Extraction

GONG Ji-bing,TANG Jie,YANG Wen-jun   

  • Online:2018-11-16 Published:2018-11-16

摘要: 大规模的网络视频信息既为用户信息分享带来了方便,同时也为国家监管部门带来了新的挑战。考虑到效率问题,在线视频监管则主要考虑视频描述信息。主要研究了网络视频描述信息的抽取问题,提出了一种新的Web信息抽取方法:通用抽取引擎框架,其主要包括对视频描述信息抽取问题的形式化描述和用户感知的视频网站逻辑模型。该方法在国家某部委的视频监管项目中已得到应用,并取得了很好的效果。实验结果表明,该方法的扩展性、通用性和抽取准确率大大优于其他方法。

关键词: 通用抽取引擎框架,网络视频监管,视频网站逻辑模型,Web信息抽取,抽取模式产生算法

Abstract: The large size of video collection not only provides an easy way for users to share information, but also brings a big challenge for managing them, in particular online monitoring. A critical rectuirement to monitor the video information is to accurately and adaptively identify the key information describing the video, which is also the first step for video analysis and video search. In this paper, we focused on the extraction problem of the video information from different websites. Specifically,we proposed an engine framework for information extraction. We formally defined the description model in the framework and implemented a customizable engine for information. The proposed framework has been applied to a real-world application of a national department and obtains promising results. Experimental results show that the proposed approach can effectively extract the video information and it significantly outperforms the baseline methods.

Key words: General extraction engine framework, Internet video monitoring, Logical model of video website, Web information extraction, Algorithms for generating extraction patterns

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!