计算机科学 ›› 2018, Vol. 45 ›› Issue (8): 203-207.doi: 10.11896/j.issn.1002-137X.2018.08.036

• 人工智能 • 上一篇    下一篇

基于改进贝叶斯的书目自动分类算法

杨晓花1, 高海云2   

  1. 福州大学至诚学院 福州3500021
    福州大学物理与信息工程学院 福州3500162
  • 收稿日期:2018-02-28 出版日期:2018-08-29 发布日期:2018-08-29
  • 作者简介:杨晓花(1979-),女,硕士,高级工程师,主要研究领域为大数据分析、图像处理,E-mail:45665192@qq.com(通信作者); 高海云(1979-),女,硕士生,助理研究员,主要研究领域为非线性整数规划、图像处理、系统建模算法优化设计。
  • 基金资助:
    本文受福建省中青年教师教育科研项目(JAT160658)资助。

Improved Bayesian Algorithm Based Automatic Classification Method for Bibliography

YANG Xiao-hua1, GAO Hai-yun2   

  1. Zhicheng College,Fuzhou University,Fuzhou 350002,China1
    College of Physics and Information Engineering,Fuzhou University,Fuzhou 350016,China2
  • Received:2018-02-28 Online:2018-08-29 Published:2018-08-29

摘要: 贝叶斯算法被广泛应用于书目自动分类领域。该算法常使用差分进化算法来评估概率项,但是传统的差分进化算法容易陷入局部最优解,使得贝叶斯分类精度较低。针对该问题,提出了基于改进贝叶斯的书目自动分类方法。该方法通过多父突变和交叉操作估计概率项的最优解,提高贝叶斯分类精度;在进行书目自动分类时,先采用ICTCLAS系统进行文本预处理,再提取文本的词频-逆向文件频率特征,接着采用改进的贝叶斯估计方法对特征进行训练与分类,最终实现书目的自动分类。仿真结果表明,该方法具有较高的分类准确率。

关键词: 贝叶斯算法, 差分进化, 书目自动分类, 特征提取

Abstract: Bayesian algorithm is widely used in the field of automatic classification for bibliography.This method usually adopts differential evolution method to estimate the probability items.However,the traditional differential evolution method is easy to fall into the local optimum when estimating the probability items,which reduces the accuracy of Bayesian classifcation.In view of this problem,this paper proposed an improved Bayesian algorithm based automatic classification method for bibliography.In this method,the optimal solution of probability items is estimated through multi-parent mutation and crossover operation,which improves the accuracy of Bayesian classification.In the process of automatic classification for bibliography,the ICTCLAS system is used to preprocess the text and then extract the term frequency-inverse document frequency features of texts.Then,the improved Bayesian estimation method is utilized to train and classify the features.Finally,the automatic classification for bibliography is achieved.Simulation results show that this method has a high classification accuracy.

Key words: Automatic classification for bibliography, Bayesian algorithm, Differential evolution, Feature extraction

中图分类号: 

  • TP391
[1]MURTAGH F,KURTZ M J.The Classification Society’s Biblio-graphy Over Four Decades:History and Content Analysis[J].Journal of Classification,2016,33(1):6-29.
[2]KLEIN K.A Review of Bibliography Complex:Fundamentals of Librarianship and Knowledge Management[J].Cataloging & Classification Quarterly,2014,52(3):341-342.
[3]WELDON S P.Organizing knowledge in the Isis bibliographyfrom Sarton to the early twenty-first century[J].Isis,2013,104(3):540-550.
[4]ARGAMON S,WHITELAW C,CHASE P,et al.Stylistic textclassification using functional lexical features[J].Journal of the Association for Information Science & Technology,2014,58(6):802-822.
[5]LIN Y S,JIANG J Y,LEE S J.A Similarity Measure for Text Classification and Clustering[J].IEEE Transactions on Know-ledge & Data Engineering,2015,26(7):1575-1590.
[6]UYSAL A K,GUNAL S.The impact of preprocessing on text classification[J].Information Processing & Management,2014,50(1):104-112.
[7]D’ASPREMONT A.Predicting abnormal returns from newsusing text classification[J].Quantitative Finance,2015,15(6):999-1012.
[8]SHANG C,LI M,FENG S,et al.Feature selection via maximizing global information gain for text classification[J].Know-ledge-Based Systems,2013,54(4):298-309.
[9]KANAAN G,AL-SHALABI R,GHWANMEH S,et al.A comparison of text-classification techniques applied to Arabic text[J].Journal of the American Society for Information Science & Technology,2014,60(9):1836-1844.
[10]KHORSHEED M S.Comparative evaluation of text classification techniques using a large diverse Arabic dataset[J].Language Resources & Evaluation,2013,47(2):513-538.
[11]ABUERRUB A.Arabic Text Classification Algorithm usingTFIDF and Chi Square Measurements[J].International Journal of Computer Applications,2014,93(6):40-45.
[12]HU J,YAO Y.Research on the Application of an ImprovedTFIDF Algorithm in Text Classification[J].Journal of Convergence Information Technology,2013,8(7):639-646.
[13]GHAG K,SHAH K.SentiTFIDF-Sentiment Classification using Relative Term Frequency Inverse Document Frequency[J].International Journal of Advanced Computer Science & Applications,2014,5(2):36-43.
[14]BILAL M,ISRAR H,SHAHID M,et al.Sentiment classification of Roman-Urdu opinions using Navie Baysian,Decision Tree and KNN classification techniques[J].Journal of King Saud University-Computer and Information Sciences,2016,28(3):330-344.
[15]CHEN R,CHEN F,SUN Y.Research on Automatic Text Classification Algorithm Based on ITF-IDF and KNN[J].Applied Mechanics & Materials,2015,713-715:1830-1834.
[16]FENG G,WANG H,SUN T,et al.A Term Frequency BasedWeighting Scheme Using Naïve Bayes for Text Classification[J].Journal of Computational & Theoretical Nanoscience,2016,13(1):319-326.
[17]GONG W,CAI Z.Differential evolution with ranking-based mutation operators[J].IEEE Transactions on Cybernetics,2013,43(6):2066-2081.
[18]YANG M,GU J.Study and Apply of Chinese BibliographiesAutomatic Classification Based on Support Vector Machine[J].Library and Information Service,2012,56(9):114-119.(in Chinese)杨敏,谷俊.基于SVM的中文书目自动分类及应用研究[J].图书情报工作,2012,56(9):114-119.
[19]PAULINAS M.A survey of genetic algorithms applications for image enhancement and segmentation[J].Information Techno-logy & Control,2015,36(3):278-284.
[20]JIN X R,QI J D,WANG L C,et al.Approach of classification mapping between international patent--classification and chinese library classification based on machine learning[J].Journal of Computer Applications,2011,31(7):1781-1784.(in Chinese)靳雪茹,齐建东,王立臣,等.基于机器学习的类目映射方法——国际专利分类法与中国图书馆分类法[J].计算机应用,2011,31(7):1781-1784.
[21]YANG B,HAN Q W,LEI M,et al.Short Text Clssification Algorithm Based on Improved TF-IDF Weight.Journal of Chongqing University of Technology(Natural Science),2016,30(12):103-113.(in Chinese)杨彬,韩庆文,雷敏,等.基于改进TF-IDF权重的短文本分类算法.重庆理工大学学报(自然科学),2016,30(12):103-113.
[1] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[2] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[3] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[4] 刘宝宝, 杨菁菁, 陶露, 王贺应.
基于DE-LSTM模型的教育统计数据预测研究
Study on Prediction of Educational Statistical Data Based on DE-LSTM Model
计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120
[5] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[6] 高元浩, 罗晓清, 张战成.
基于特征分离的红外与可见光图像融合算法
Infrared and Visible Image Fusion Based on Feature Separation
计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[7] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[8] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[9] 张师鹏, 李永忠.
基于降噪自编码器和三支决策的入侵检测方法
Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions
计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[10] 冯霞, 胡志毅, 刘才华.
跨模态检索研究进展综述
Survey of Research Progress on Cross-modal Retrieval
计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165
[11] 张丽倩, 李孟航, 高珊珊, 张彩明.
面向计算机辅助舌诊关键问题的解决方案综述
Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems
计算机科学, 2021, 48(7): 256-269. https://doi.org/10.11896/jsjkx.200800223
[12] 暴雨轩, 芦天亮, 杜彦辉, 石达.
基于i_ResNet34模型和数据增强的深度伪造视频检测方法
Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation
计算机科学, 2021, 48(7): 77-85. https://doi.org/10.11896/jsjkx.210300258
[13] 霍帅, 庞春江.
基于Transformer和多通道卷积神经网络的情感分析研究
Research on Sentiment Analysis Based on Transformer and Multi-channel Convolutional Neural Network
计算机科学, 2021, 48(6A): 349-356. https://doi.org/10.11896/jsjkx.200800004
[14] 李娜娜, 王勇, 周林, 邹春明, 田英杰, 郭乃网.
基于特征重要度二次筛选的DDoS攻击随机森林检测方法
DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance
计算机科学, 2021, 48(6A): 464-467. https://doi.org/10.11896/jsjkx.200900101
[15] 雷剑梅, 曾令秋, 牟洁, 陈立东, 王淙, 柴勇.
基于整车EMC标准测试和机器学习的反向诊断方法
Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning
计算机科学, 2021, 48(6): 190-195. https://doi.org/10.11896/jsjkx.200700204
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!