计算机科学 ›› 2011, Vol. 38 ›› Issue (6): 230-236.

• 人工智能 • 上一篇    下一篇

中文社区问答中问题答案质量评价和预测

李 晨,巢文涵,陈小明,李舟军   

  1. (北京航空航天大学计算机学院 北京100191)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家白然科学基金项目(90718017)和教育部高等学校博十学科点专项基金(20070006055)资助。

Quality Evaluation and Prediction for Question and Answer in Chinese Community Question Answering

LI Chcn,CHAO Wcn-han,CHEN Xiao-ming,LI Zhou-jun   

  • Online:2018-11-16 Published:2018-11-16

摘要: 知识共享型网站为自动问答系统带来了新的研究契机。但用户提供的问题及其答案质量参差不齐,在提供有用信息的同时可能包含各种无关甚至恶意的信息。对此类信息进行判别和过滤,并选取高质量的问题与答案对,有助于在基于社区的自动问答系统中重用相关问题的答案以提高问答系统的服务质量。首先从中文社区问答网站上抓取大量问题及答案,利用社会网络的方法对提问者和回答者的互动关系及特点进行了统计与分析。然后基于给定的问答质量判定标准,对3000多个问题及其答案进行了人工标注。并通过提取文本和非文本两类特征集,利用机器学习算法设计和实现了基于特征集的问答质量分类器。试验结果表明其精度和召回率均在70%以上。最后分析了影响社区网络中问答质量的主要因素。

关键词: 社区问答,社会网络,机器学习,问题答案质量评价和预测,人工标注

Abstract: The rise of Knowledgcsharing platform on the Internet in China provides a new approach for Automatic Question Answering. However, the quality of User-Generated Content in such social networks may vary significantly,from useless information to malice spam. Identifying and filtering such content arc particularly important to improve users' experience and the performance of Question Answering System. We first extracted a set of question answer content from Chinese Community Question Answering site, investigated a series of statistic characteristics on the interaction of participants, and then manually annotated quality of a subset of these questions and answers. By combining text features and non-text features provided by the community extracted from those questions and answers,we established acontent quality classification model for evaluation and prediction. We find that this model is able to distinguish highquality ones from others with considerable accuracy.

Key words: Community question answering, Social networks, Machine learning, Question and answer quality evaluation and prediction, Human annotation

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!