计算机科学 ›› 2018, Vol. 45 ›› Issue (8): 203-207.doi: 10.11896/j.issn.1002-137X.2018.08.036
杨晓花1, 高海云2
YANG Xiao-hua1, GAO Hai-yun2
摘要: 贝叶斯算法被广泛应用于书目自动分类领域。该算法常使用差分进化算法来评估概率项,但是传统的差分进化算法容易陷入局部最优解,使得贝叶斯分类精度较低。针对该问题,提出了基于改进贝叶斯的书目自动分类方法。该方法通过多父突变和交叉操作估计概率项的最优解,提高贝叶斯分类精度;在进行书目自动分类时,先采用ICTCLAS系统进行文本预处理,再提取文本的词频-逆向文件频率特征,接着采用改进的贝叶斯估计方法对特征进行训练与分类,最终实现书目的自动分类。仿真结果表明,该方法具有较高的分类准确率。
中图分类号:
[1]MURTAGH F,KURTZ M J.The Classification Society’s Biblio-graphy Over Four Decades:History and Content Analysis[J].Journal of Classification,2016,33(1):6-29. [2]KLEIN K.A Review of Bibliography Complex:Fundamentals of Librarianship and Knowledge Management[J].Cataloging & Classification Quarterly,2014,52(3):341-342. [3]WELDON S P.Organizing knowledge in the Isis bibliographyfrom Sarton to the early twenty-first century[J].Isis,2013,104(3):540-550. [4]ARGAMON S,WHITELAW C,CHASE P,et al.Stylistic textclassification using functional lexical features[J].Journal of the Association for Information Science & Technology,2014,58(6):802-822. [5]LIN Y S,JIANG J Y,LEE S J.A Similarity Measure for Text Classification and Clustering[J].IEEE Transactions on Know-ledge & Data Engineering,2015,26(7):1575-1590. [6]UYSAL A K,GUNAL S.The impact of preprocessing on text classification[J].Information Processing & Management,2014,50(1):104-112. [7]D’ASPREMONT A.Predicting abnormal returns from newsusing text classification[J].Quantitative Finance,2015,15(6):999-1012. [8]SHANG C,LI M,FENG S,et al.Feature selection via maximizing global information gain for text classification[J].Know-ledge-Based Systems,2013,54(4):298-309. [9]KANAAN G,AL-SHALABI R,GHWANMEH S,et al.A comparison of text-classification techniques applied to Arabic text[J].Journal of the American Society for Information Science & Technology,2014,60(9):1836-1844. [10]KHORSHEED M S.Comparative evaluation of text classification techniques using a large diverse Arabic dataset[J].Language Resources & Evaluation,2013,47(2):513-538. [11]ABUERRUB A.Arabic Text Classification Algorithm usingTFIDF and Chi Square Measurements[J].International Journal of Computer Applications,2014,93(6):40-45. [12]HU J,YAO Y.Research on the Application of an ImprovedTFIDF Algorithm in Text Classification[J].Journal of Convergence Information Technology,2013,8(7):639-646. [13]GHAG K,SHAH K.SentiTFIDF-Sentiment Classification using Relative Term Frequency Inverse Document Frequency[J].International Journal of Advanced Computer Science & Applications,2014,5(2):36-43. [14]BILAL M,ISRAR H,SHAHID M,et al.Sentiment classification of Roman-Urdu opinions using Navie Baysian,Decision Tree and KNN classification techniques[J].Journal of King Saud University-Computer and Information Sciences,2016,28(3):330-344. [15]CHEN R,CHEN F,SUN Y.Research on Automatic Text Classification Algorithm Based on ITF-IDF and KNN[J].Applied Mechanics & Materials,2015,713-715:1830-1834. [16]FENG G,WANG H,SUN T,et al.A Term Frequency BasedWeighting Scheme Using Naïve Bayes for Text Classification[J].Journal of Computational & Theoretical Nanoscience,2016,13(1):319-326. [17]GONG W,CAI Z.Differential evolution with ranking-based mutation operators[J].IEEE Transactions on Cybernetics,2013,43(6):2066-2081. [18]YANG M,GU J.Study and Apply of Chinese BibliographiesAutomatic Classification Based on Support Vector Machine[J].Library and Information Service,2012,56(9):114-119.(in Chinese)杨敏,谷俊.基于SVM的中文书目自动分类及应用研究[J].图书情报工作,2012,56(9):114-119. [19]PAULINAS M.A survey of genetic algorithms applications for image enhancement and segmentation[J].Information Techno-logy & Control,2015,36(3):278-284. [20]JIN X R,QI J D,WANG L C,et al.Approach of classification mapping between international patent--classification and chinese library classification based on machine learning[J].Journal of Computer Applications,2011,31(7):1781-1784.(in Chinese)靳雪茹,齐建东,王立臣,等.基于机器学习的类目映射方法——国际专利分类法与中国图书馆分类法[J].计算机应用,2011,31(7):1781-1784. [21]YANG B,HAN Q W,LEI M,et al.Short Text Clssification Algorithm Based on Improved TF-IDF Weight.Journal of Chongqing University of Technology(Natural Science),2016,30(12):103-113.(in Chinese)杨彬,韩庆文,雷敏,等.基于改进TF-IDF权重的短文本分类算法.重庆理工大学学报(自然科学),2016,30(12):103-113. |
[1] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[2] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[3] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[4] | 刘宝宝, 杨菁菁, 陶露, 王贺应. 基于DE-LSTM模型的教育统计数据预测研究 Study on Prediction of Educational Statistical Data Based on DE-LSTM Model 计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120 |
[5] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[6] | 高元浩, 罗晓清, 张战成. 基于特征分离的红外与可见光图像融合算法 Infrared and Visible Image Fusion Based on Feature Separation 计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148 |
[7] | 左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263 |
[8] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[9] | 张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059 |
[10] | 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述 Survey of Research Progress on Cross-modal Retrieval 计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165 |
[11] | 张丽倩, 李孟航, 高珊珊, 张彩明. 面向计算机辅助舌诊关键问题的解决方案综述 Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems 计算机科学, 2021, 48(7): 256-269. https://doi.org/10.11896/jsjkx.200800223 |
[12] | 暴雨轩, 芦天亮, 杜彦辉, 石达. 基于i_ResNet34模型和数据增强的深度伪造视频检测方法 Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation 计算机科学, 2021, 48(7): 77-85. https://doi.org/10.11896/jsjkx.210300258 |
[13] | 霍帅, 庞春江. 基于Transformer和多通道卷积神经网络的情感分析研究 Research on Sentiment Analysis Based on Transformer and Multi-channel Convolutional Neural Network 计算机科学, 2021, 48(6A): 349-356. https://doi.org/10.11896/jsjkx.200800004 |
[14] | 李娜娜, 王勇, 周林, 邹春明, 田英杰, 郭乃网. 基于特征重要度二次筛选的DDoS攻击随机森林检测方法 DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance 计算机科学, 2021, 48(6A): 464-467. https://doi.org/10.11896/jsjkx.200900101 |
[15] | 雷剑梅, 曾令秋, 牟洁, 陈立东, 王淙, 柴勇. 基于整车EMC标准测试和机器学习的反向诊断方法 Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning 计算机科学, 2021, 48(6): 190-195. https://doi.org/10.11896/jsjkx.200700204 |
|