Computer Science ›› 2023, Vol. 50 ›› Issue (1): 76-86.doi: 10.11896/jsjkx.220100078

• Database & Big Data & Data Science • Previous Articles     Next Articles

Text Material Recommendation Method Combining Label Classification and Semantic QueryExpansion

MENG Yiyue, PENG Rong, LYU Qibiao   

  1. School of Computer Science,Wuhan University,Wuhan 430072,China
  • Received:2022-01-09 Revised:2022-07-02 Online:2023-01-15 Published:2023-01-09
  • About author:MENG Yiyue,born in 1998,postgra-duate.His main research interests include requirements engineering,software engineering and so on.
    PENG Rong,born in 1975,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include requirements engineering,software engineering,ser-vice computing,etc.
  • Supported by:
    Joint Founds of China Mobile of the Ministry of Education of China(MCM2020J01).

Abstract: In the process of preparing various planning and research reports,researchers often need to collect and read a large amount of text materials according to the proposed catalog or title,not only the workload is large,but the quality cannot be gua-ranteed.To this end,in the field of digital government planning documentation,a text material recommendation method combining label classification and semantic query expansion is proposed.From the perspective of information retrieval,the titles at all levels in the catalog are regarded as query sentences,and the referenced text materials are used as target documents,so as to retrieve and recommend text materials.This method is based on the differential evolution algorithm,organically combining the text material recommendation method based on word vector average,semantic query expansion and label classification,which makes up the shortcoming of the traditional text material recommendation method and achieves to retrieve the text materials with the granularity of paragraphs through the title of catalog.After experimental verification on 10 datasets,the results show that the performance of the proposed method is significantly improved.It can greatly reduce the workload of manual material selection and material classification,as well as reduce the difficulty of documentation.

Key words: Text material recommendation, Information retrieval, Digital government, Query expansion, Differential evolution algorithm

CLC Number: 

  • TP311.5
[1]ZHANG D W,ZHANG S M,SHI Y.On the methods of improving the efficiency of document preparation in standardization [C]//The 10th China Standardization Forum.2013:810-812,820.
[2]HUANG L,DU L L,ZHUANG Y C.Urban Planning ProjectManagement System:Guangzhou Urban Planning Compilation and Research Center Experience [J].PLANNERS,2009,25(10):9-13.
[3]LI H Y,YUAN M.Project management maturity model for14th Five-Year Plan formulation [J].Project Management Technology,2021,19(5):83-87.
[4]MANNING C D,RAGHAVAN P,SCHÜTZE H.Introduction to Information Retrieval[M].Cambridge:Cambridge University Press,2008.
[5]ITO T,KURIBAYASHI T,KOBAYASHI H,et al.Diamonds in the Rough:Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance [C]//Proceedings of the 12th International Conference on Natural Language Generation.2019:40-53.
[6]ROEMMELE M,GORDON A S.Automated Assistance forCreative Writing with an RNN Language Model [C]//Procee-dings of the 23rd International Conference on Intelligent User Interfaces Companion.Association for Computing Machinery.New York,NY,USA,2019:1-2.
[7]NAGATA R,HASHIGUCHI T,SADOUN D.Is the SimplestChatbot Effective in English Writing Learning Assistance?[C]// Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Demonstrations.2015:245-256.
[8]TONG G.Official Document Writing Assistant System Designand Implement [D].Beijing:Beijing University of Technology,2014.
[9]SOYER H,TOPIĆ G,STENETORP P,et al.CroVeWA:Crosslingual Vector-Based Writing Assistance[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Demonstrations.2015:91-95.
[10]NI W J,SUN Y J,LIU T,et al.NativeHelper:A Bilingual Sentence Search and Recommendation Engine for Academic Writing[C]//Asia-Pacific Web(APWeb) and Web-Age Information Management(WAIM) Joint International Conference on Web and Big Data.2019:412-416.
[11]YANG X Y,YE M C,YOU Q Z,et al.Writing by Memorizing:Hierarchical Retrieval-based Medical Report Generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:5000-5009.
[12]KONG H.Writing Assistant System Based on Topic Recom-mendation [D].Harbin:Harbin Institute of Technology,2015.
[13]CAO J B,ZHANG S Y.Research on Standards Conformance Testing of Traffic information and its System Development [D].Shanxi:Chang'an University,2017.
[14]WANG J,DONG Y.Measurement of Text Similarity:A Survey[J].Information,2020,11(9):421.
[15]FAROUK M.Measuring Sentences Similarity:A Survey[J].CoRR,2019,12(25):1-11.
[16]SALTON G.A Vector Space Model for Automatic Indexing[J].Communications of the ACM,1975,18(11):613-620.
[17]LANDAUER T K,DUMAIS S T.A Solution to Plato's Problem:The Latent Semantic Analysis Theory of Acquisition,Induction,and Representation of Knowledge [J].Psychological Review,1997,104(2):211-240.
[18]BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet Allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[19]LE Q,MIKOLOV T.Distributed Representations of Sentencesand Documents[J].arXiv.1405.4053,2014.
[20]HU B,LU Z,HANG L,et al.Convolutional Neural Network Architectures for Matching Natural Language Sentences[J].Advances in Neural Information Processing Systems,2015,3:2042-2050.
[21]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Conference on Empirical Methods in Natural Language Processing.2014:1532-1543.
[22]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[23]JIA P T,SUN W.A Survey of Text Classification Based onDeep Learning [J].Computer and Modernization,2021(7):29-37.
[24]QIANG G.An effective algorithm for improving the perfor-mance of naive bayes for text classification[C]//Second International Conference on Computer Research & Development.IEEE,2010:1678-1684.
[25]LIU C L,LIANG R S,DI Y H.Research on short text classification based on TFIDF and gradient lifting decision tree [J].Technology Wind,2019(24):231-232.
[26]HUANG X Y,XIONG L Y,LIU Y T.An improved KNN short text classification algorithm based on category feature words [J].Computer Engineering & Science,2018,40(1):148-154.
[27]WANG H L,LIU L,LIN M,et al.Music personalized recom-mendation algorithm based on k-means clustering algorithm[J].Journal of Jilin University(Engineering and Technology Edition),2021,51(5):1845-1850.
[28]WANG Y Z,ZHENG X,HOU D.Short Text Sentiment Classification of High Dimensional Hybrid Feature Based on SVM [J].Computer Technology and Development,2018,28(2):88-93.
[29]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha,Qatar:Association for Computational Linguistics,2014:1746-1751.
[30]SOCHER R,PERELYGIN A,WU J,et al.Recursive Deep Mo-dels for Semantic Compositionality Over a Sentiment Treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Seattle,Washington,USA:Association for Computational Linguistics,2013:1631-1642.
[31]ZHANG Y,LIU Q,SONG L.Sentence-State LSTM for Text Representation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Melbourne,Australia:Association for Computational Linguistics,2018:317-327.
[32]TANG Q T,LI J,CHEN J Y,et al.Full attention-based bi-GRU neural network for news text classification[C]//Proceedings of the 2019 IEEE 5th International Conference on Computer and Communication.2019:1970-1974.
[33]JIN Y C,WANG Q Q,GAO J,et al.Multi-label Financial Text Classification Algorithm Based on Graph Deep Learning[J].Computer Engineering,2022,48(4):16-21.
[34]LAI S,LIU K,HE S,et al.How to Generate a Good Word Embedding [J].IEEE Intelligent Systems,2016,31(6):5-14.
[35]AZAD H K,DEEPAK A.Query expansion techniques for information retrieval:A survey [J].Information Processing and Management,2019,56(5):1698-1735.
[36]STORN R,PRICE K.A Simple and Efficient Heuristic for glo-bal Optimization over Continuous Spaces [J].Journal of Global Optimization,1997,11(4):341-359.
[37]VOORHEES E.The TREC-8 Question Answering Track Report[C]//Proceedings of the 8th Text Retrieval Conference.1999:77-82.
[38]EHEK R,SOJKA P.Software Framework for Topic Modelling with Large Corpora[C]//Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.2010:45-50.
[39]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports[C]//2012 34th International Conference on Software Engineering(ICSE).2012:14-24.
[40]KANWAL S,NAWAZ S,MALIK M K,et al.A Review ofText-Based Recommendation Systems[J].IEEE Access,2021,9:31638-31661.
[41]HUANG P S,HE X,GAO J,et al.Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Confe-rence on Information & Knowledge Management.2013:2333-2338.
[42]CHEN Q,ZHU X,LING Z H,et al.Enhanced LSTM for Natural Language Inference[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2016:1657-1668.
[1] LI Dan-dan, WU Yu-xiang, ZHU Cong-cong, LI Zhong-kang. Improved Sparrow Search Algorithm Based on A Variety of Improved Strategies [J]. Computer Science, 2022, 49(6A): 217-222.
[2] LIU Bao-bao, YANG Jing-jing, TAO Lu, WANG He-ying. Study on Prediction of Educational Statistical Data Based on DE-LSTM Model [J]. Computer Science, 2022, 49(6A): 261-266.
[3] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[4] YANG Hao, YAN Qiao. Adversarial Character CAPTCHA Generation Method Based on Differential Evolution Algorithm [J]. Computer Science, 2022, 49(11A): 211100074-5.
[5] NI Zhen, LI Bin, SUN Xiao-bing, LI Bi-xin, ZHU Cheng. Research and Progress on Bug Report-oriented Bug Localization Techniques [J]. Computer Science, 2022, 49(11): 8-23.
[6] YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng. Approach for Knowledge-driven Similar Bug Report Recommendation [J]. Computer Science, 2021, 48(5): 91-98.
[7] JIN Wen-qing and HAN Fang. Main Melody Extraction Method Based on Saliency Enhancement [J]. Computer Science, 2020, 47(6A): 24-28.
[8] LI Hao, ZHONG Sheng, KANG Yan, LI Tao, ZHANG Ya-chuan, BU Rong-jing. API Recommendation Model with Fusion Domain Knowledge [J]. Computer Science, 2020, 47(11A): 544-548.
[9] DUAN Jian-yong, YOU Shi-xin, ZHANG Mei, WANG Hao. Keyword Extraction Based on Multi-feature Fusion [J]. Computer Science, 2020, 47(11A): 73-77.
[10] WANG Xuan, MAO Ying-chi, XIE Zai-peng, HUANG Qian. Inference Task Offloading Strategy Based on Differential Evolution [J]. Computer Science, 2020, 47(10): 256-262.
[11] XIAO Peng, ZOU De-xuan, ZHANG Qiang. Efficient Dynamic Self-adaptive Differential Evolution Algorithm [J]. Computer Science, 2019, 46(6A): 124-132.
[12] FAN Dao-yuan, SUN Ji-hong, WANG Wei, TU Ji-ping, HE Xin. Detection Method of Duplicate Defect Reports Fusing Text and Categorization Information [J]. Computer Science, 2019, 46(12): 192-200.
[13] YU Yuan-yuan, CHAO Wen-han, HE Yue-ying, LI Zhou-jun. Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding [J]. Computer Science, 2019, 46(1): 238-244.
[14] HAN Zhao, MIAO Duo-qian, REN Fu-ji. Rough Set Based Knowledge Predicate Analysis of Chinese Knowledge Based Question Answering [J]. Computer Science, 2018, 45(6): 183-186.
[15] HUANG Qiao-juan, LUO Xu-dong. State-of-the-art and Development Trend of Artificial Intelligence Combined with Law [J]. Computer Science, 2018, 45(12): 1-11.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!