计算机科学 ›› 2014, Vol. 41 ›› Issue (10): 31-35.doi: 10.11896/j.issn.1002-137X.2014.10.007

• 2013’和谐人机环境联合学术会议 • 上一篇    下一篇

改进贝叶斯分类的智能短信分类方法

杨柳,殷钊,滕建斌,王衡,汪国平   

  1. 北京市虚拟仿真与可视化工程技术研究中心北京大学 北京100871;北京市虚拟仿真与可视化工程技术研究中心北京大学 北京100871;北京市虚拟仿真与可视化工程技术研究中心北京大学 北京100871;北京市虚拟仿真与可视化工程技术研究中心北京大学 北京100871;北京市虚拟仿真与可视化工程技术研究中心北京大学 北京100871
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受863计划重点项目(2011AA120301),国家自然科学基金项目(60925007,61173080,61232014)资助

Intelligent SMS Classification Method Based on Improved Bayes Classification Algorithm

YANG Liu,YIN Zhao,TENG Jian-bin,WANG Heng and WANG Guo-ping   

  • Online:2018-11-14 Published:2018-11-14

摘要: 随着移动通信技术的不断发展,手机的普及率在不断上升,而短信作为传统的移动通信服务,长久以来一直在人们的日常生活中占据着极为重要的位置。可以说,短信在一定程度上记录了人们生活的轨迹。但是,现有的短信管理系统仅对短信进行以联系人为特征分类、以时间为顺序显示的简单非智能化的管理,导致了用户手机中各类短信混杂不清,短信的管理效率极低。通过研究短信的特征,分析传统的基于文档频率的特征值提取方法和基于互信息的特征值提取方法的优势与不足,提出了一种适用于短信的基于词频和互信息的特征值提取方法,并结合短信长度实现了一种改进的贝叶斯分类算法。实验证明,算法在进行短信分类时可以得到相当可观的召回率和准确率。

关键词: 短信智能管理,文本分类,特征值提取,贝叶斯分类

Abstract: With the development of the mobile communication technology,the number of mobile phone users is increa-sing continuously.As a traditional mobile communication service,SMS occupies a very important position in people’s lives.SMS messages record the track of one’s life to a certain extent.However,the existing SMS management systems only manage our messages in an unintelligent way—classifying by contacts and showing in the order of sending time.As a result,different kinds of messages mix together and are hard to be managed.By studying the characteristics of SMS messages and analyzing the shortages of the traditional algorithm based on word frequency and the algorithm based on mutual information,we proposed a new feature selection algorithm for SMS messages based on both word frequency and mutual information and improved the accuracy of the Bayes classification algorithm using more features including the length of SMS messages.In the experiments,it is proved that this new algorithm can get a very good recall rate and accuracy rate when processing SMS messages.

Key words: Intelligent SMS management,Text classification,Feature selection,Bayes classification algorithm

[1] Patel D,Bhatnagar M.Mobile SMS Classification:An Application of Text Classification[J].International Journal of Soft Computing and Engineering,2011,1(1):47-49
[2] Liu Wu-ying,Wang Ting.Index-based online text classificationfor sms spam filtering[J].Journal of Computers,2010,5(6):844-851
[3] Li Feng,Li Ji-gang.Studying of Classifying Chinese SMS Message Based on Bayesian Classification[J].Journal of Theoretical and Applied Information Technology,2012,44(1):141-146
[4] 陈艳秋.有效特征值提取的快速中文文本分类[D].天津:南开大学,2007
[5] 李静梅,孙丽华,张巧荣,等.一种文本处理中的朴素贝叶斯分类器[J].哈尔滨工程大学学报,2003,24(1):71-74
[6] 自然语言处理与信息检索共享平台.[2013-08-12].http://www.nlpir.org
[7] Jcseg开源中文分词组件.[2013-08-12]. https://code.google.com/p/jcseg
[8] Chen Tao,Kan Min-yen.Creating a live,public short message service corpus:The NUS SMS corpus[J].Language Resources and Evaluation,2013,47(2):1-37

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!