计算机科学 ›› 2017, Vol. 44 ›› Issue (12): 232-238.doi: 10.11896/j.issn.1002-137X.2017.12.042

• 人工智能 • 上一篇    下一篇

电信大数据文本挖掘算法及应用

汪东升,黄传河,黄晓鹏,倪秋芬   

  1. 武汉大学计算机学院 武汉430072,武汉大学计算机学院 武汉430072,武汉大学计算机学院 武汉430072,武汉大学计算机学院 武汉430072
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61373040,61572370)资助

Text Mining Algorithm and Application of Telecom Big Data

WANG Dong-sheng, HUANG Chuan-he, HUANG Xiao-peng and NI Qiu-fen   

  • Online:2018-12-01 Published:2018-12-01

摘要: 电信大数据中包含了大量的非结构化文本数据,无法通过常规的方法进行信息挖掘,在此情况下文本挖掘可以更好地实现对文本数据的分析挖掘。提出了基于文本的新词识别算法和命名实体识别算法,从而有效地分析用户投诉文本内容并判断其所属类别,并且从用户上网文本信息中识别出其终端型号,为电信行业提供更好的用户支撑和用户体验。最后,对模型的实际应用表明,所提方法对电信投诉文本数据的识别是高效的。

关键词: 电信,大数据,文本挖掘,模型识别,用户终端机型

Abstract: Major telecom data contain a large number of unstructured text data,which are difficult for conventional methods to mine information.Text mining can do better than conventional methods under this circumstance.Based on the text data,this paper proposed a new word identification algorithm and a named entity recognition algorithm.At this process,we analyzed the customers’ complaint texts and judged their categories,and then identified the user’s terminal types from their information,which provides better user supports and experiences for the telecom industry.Experiment results validate that the proposed algorithm achieves good performance for the identification of customers’ complaint texts in the telecom.

Key words: Telecom,Big data,Text mining,Pattern recognition,User’s terminal types

[1] SENBALC C,ALTUNTAS S,BOZKUS Z,et al.Big data paltform development with a domain specific language for telecom industries [C]∥High Capacity Optical Networks and Emerging/Enabling Technologies.2013:116-120.
[2] TSENG J C,TSENG H C,LIU C W.A successful application of big storage techniques implemented to criminal investigation for telecom [C]∥Network Operations and Management Sympo-sium (APNOMS).2013:1-3.
[3] JONY R I,HABIB A,MOHANMMED N,et al.Big Data Use Case Domains for Telecom Operates [C]∥IEEE International Conference on Smart City/SocialCom/SustainCom.2015:850-855.
[4] ZHONG N,LI Y F.Effective Pattern Discovery for Text Mining [J].IEEE Transactions on Knowledge and Data Engineering,2012,24(1):30-44.
[5] ELAGIB S B,HASHIM A H A,OLANREWAJU R F.CDRanalysis using Big Data technology[C]∥International Confe-rence on Computing,Control,Networking,Electronics and Embedded Systems Engineering (ICCNEEE).2015:467-471.
[6] DAM R V D.Big Data a Sure Thing for Telecommunications:Telecom’s Future in Big Data [C]∥Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC).2013:148-154.
[7] OUYANG Y,HU M M,HUET A,et al.Mining of leaders in mobile telecom social networks [C]∥Wireless Telecommunications Symposium (WTS).2016:1-4.
[8] HUANG W L,CHEN Z,DONG W Y,et al.Mobile Internet big data platform in China Unicom [J].Tsinghua Science and Technology,2014,19(1):95-101.
[9] CHETAN S B J,SRINIVASA K G.Large Scale Multi-labelText Classification of a Hierarchical Dataset using Rocchio algorithm [C]∥International Conference on Computational Systems and Information Systems for Sustainable Solutions.2016:291-296.
[10] YANG W C,FU Y M,ZHANG D.An Improved Parallel Algorithm for Text Categorization [C]∥International Symposium on Computer,Consumer and Control.2016:451-454.
[11] SANTOSO J,YUNIARNO E M,HARIADI M.Large ScaleText Classification using Map Reduce and Nave Bayes Algorithm for Domain Specified Ontology Building [C]∥7th International Conference on Intelligent Human-Machine Systems and Cybernetics.2015:428-432.
[12] YANG J,YANG M H.Top-Down Visual Saliency via Joint CRF and Dictionary Learning[C]∥ Computer Vision and Pattern Recognition.IEEE,2012:2296-2303.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!