计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200005-10.doi: 10.11896/jsjkx.231200005
覃娴萍1, 丁昭旭1, 仲国强1, 王栋2
QIN Xianping1, DING Zhaoxu1, ZHONG Guoqiang1, WANG Dong2
摘要: 移动互联网的快速发展和现代移动客户端的普及推动了网络新闻行业、社交媒体和自媒体等的蓬勃发展,为用户提供了多元、丰富的海量信息。随着我国海洋强国战略的稳步推进,国民海洋意识的显著增强,有关海洋领域的多方面信息充斥着网络,相关媒体报道、公众舆论在网上大量涌现,热点事件频频发生。针对多来源、多属性的网络海洋信息,基于多源文本聚类和自动摘要技术,提出一种基于深度学习的海洋热点新闻自动挖掘系统,包括多源涉海数据自动采集、数据预处理、特征提取、文本聚类、自动摘要五大功能模块。具体而言,网络爬虫程序从多个数据源釆集多样且分散的海洋数据,自动将数据结构化后存入数据库;根据文本特征的近似程度和文本间的关联关系实现聚类分析,聚类结果为后继摘要生成、主题发现提供数据支撑;基于预训练语言模型强大的上下文理解能力和丰富的语言表达能力,提出基于预训练语言模型的海洋新闻自动摘要生成方法。通过多组实验证明了所提方法在各个评估指标上的有效性,突显出其在多源异构网络海洋新闻挖掘方面的优势。该方法为处理分散的海洋资讯信息、生成可读性更强的内容摘要提供可行的解决方案,对提高海洋信息获取效率、监测公众舆论走向、推动海洋信息的应用与传播具有重要意义。
中图分类号:
[1]LIU Z C,LIN G S,GOH W L.Bottom-up scene text detection with Markov clustering networks[J].International Journal of Computer Vision,2020,128:1786-1809. [2]FAN J C.Large-scale subspace clustering via k-factorization[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:342-352. [3]DANG Z Y,DENG C,YANG X,et al.Nearest Neighbor Ma-tching for Deep Clustering[C]//Proceedings of the IEEE/CVF Conference on Compute Vision and Pattern Recognition.2021:13693-13702. [4]HARTL P,KRUSCHWITZ U.Applying Automatic Text Summarization for Fake News Detection[J].arXiv:2204.01841,2022. [5]LI H R,ZhU J N,ZhANG J J,et al.Keywords-guided abstractive sentence summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8196-8203. [6]ABDI A,HASAN S,SHAMMUDDIN S M,et al.A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion[J].Knowledge-Based Systems,2021,213:106658. [7]ALAMI N,MEKNASSI M,EN-NAHNAHI N,et al.Unsuper-vised neural networks for automatic Arabic text summarization using document clustering and topic modeling[J].Expert Systems with Applications,2021,172:114652. [8]STEFANOVITCH N,JACQUET G,LONGUEVILLE B D.Graph and Embedding based Approach for Text Clustering:Topic Detection in a Large Multilingual Public Consultation[C]//Companion Proceedings of the ACM Web Conference 2023.2023:694-700. [9]MCCONVILLE R,SANTOS-RODRIGUEZ R,PIECHOCKI RJ,et al.N2D:(not too) deep clustering via clustering the local manifold of an autoencoded embedding[C]//2020 25th international conference on pattern recognition(ICPR).IEEE,2021:5145-5152. [10]WANG D X,LI T R,DENG P,et al.A Generalized Deep Lear-ning Algorithm based on NMF for Multi-view Clustering[J].IEEE Transactions on Big Data,2022,9(1):328-340. [11]GEORGE L,SUMATHY P.An integrated clustering and BERT framework for improved topic modeling[J].International Journal of Information Technology,2023,15(4):2187-2195. [12]OLUKANMI P,NELWAMONDO F,MARWALA T,et al.Automatic detection of outliers and the number of clusters in k-means clustering via Chebyshev-type inequalities[J].Neural Computing and Applications,2022,34(8):5939-5958. [13]SAHA J,MUKHERJEE J.CNAK:Cluster number assistedK-means[J].Pattern Recognition,2021,110:107625. [14]ZHAO X W,NIE F P,WANG R,et al.Improving projectedfuzzy K-means clustering via robust learning[J].Neurocompu-ting,2022,101,491:34-43. [15]UNGER H,KUBEK M,HLOCH M,et al.A survey on innovative graph-based clustering algorithms[J].The Autonomous Web,2022,101:95-110. [16]WANG C,PAN S R,CELINA P Y,et al.Deep neighbor-aware embedding for node clustering in attributed graphs[J].Pattern Recognition,2022,122:108230. [17]RAN X C,XI Y,LU Y,et al.Comprehensive survey on hierarchical clustering algorithms and the recent developments[J].Artificial Intelligence Review,2023,56(8):8219-8264. [18]DOGAN A,BIRANT D.K-centroid link:a novel hierarchicalclusteringlinkage method[J].Applied Intelligence,2022,52(5):5537-5560. [19]IKOTUN A M,EZUGWU A E,ABUALIGAH L,et al.K-means clustering algorithms:A comprehensive review,va-riants analysis,and advances in the era of big data[J].Information Sciences,2023,622:178-210. [20]HUANG S D,KANGZ,XU Z,et al.Robust deep k-means:An effective and simple method for data clustering[J].Pattern Re-cognition,2021,117:107996. [21]SHRIFAN N H M M,AKBAR M F,ISA N A M.An adaptive outlier removal aided k-means clustering algorithm[J].Journal of King Saud University-Computer and Information Sciences,2022,34(8):6365-6376. [22]HE W H,WU C J,ZHOU S J,et al.Study on Short Text Clustering with Unsupervised SimCSE[J].Computer Science,2023,50(11):71-76. [23]LI Y F,YANG M,X PENG D Z,et al.Twin contrastive learning for online clustering[J].International Journal of Computer Vision,2022,130(9):2205-2221. [24]LIU Y,TU W X,ZHOU S H,et al.Deep graph clustering via dual correlation reduction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(7):7603-7611. [25]CAI J Y,FAN J C,GUO W Z,et al.Efficient deep embedded subspace clustering[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:1-10. [26]SUBAKTI A,MURFI H,HARIADI N.The performance ofBERT as data representation of text clustering[J].Journal of big Data,2022,9(1):1-21. [27]CAI J Y,WANG S P,XU C,Y et al.Unsupervised deep clustering via contractive feature representation and focal loss[J].Pattern Recognition,2022,123:108386. [28]RONEN M,FINDER S E,FREIFELD O.DeepDPM:Deep clustering with an unknown number of clusters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:9861-9870. [29]EL-KASSAS W S,SALAMA C R,RAFEA A A,et al.Auto-matic text summarization:A comprehensive survey[J].Expert systems with applications,2021,165:113679. [30]CAI X Y,SHI K L,JIANG Y H,et al.HITS-based attentional neural model for abstractive summarization[J].Knowledge-Based Systems,2021,222:106996. [31]LIU Y X,LIU P F,RADEV D,et al.BRIO:Bringing order toabstractive summarization[J].arXiv:2203.16804,2022. [32]JIN H Q,WANG T M,WAN X J.SemSUM:Semantic depen-dency guided neural abstractive summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8026-8033. [33]JOSHI A,FIDALGO E,ALEGRE E,et al.RankSum—An unsupervised extractive text summarization based on rank fusion[J].Expert Systems with Applications,2022,200:116846. [34]JOSHI A,FIDALGO E,ALEGRE E,et al.DeepSumm:Exploiting topic models and sequence to sequence networks for extractive text summarization[J].Expert Systems with Applications,2023,211:118442. [35]MAO X J,WEI Y,YANG Y R,et al.KHGAS:KeywordsGuided Heterogeneous Graph for Abstractive Summarization[J].Computer Science,2024,51(7):278-286. [36]SRIVASTAVA R,SINGH P,RANA K P S,et al.A topic mo-deled unsupervised approach to single document extractive text summarization[J].Knowledge-Based Systems,2022,246:108636. [37]KHURANA A,BHATNAGAR V.Investigating entropy for extractive document summarization[J].Expert Systems with Applications,2022,187:115820. [38]JING B Y,YOU Z Y,YANG T,et al.Multiplex graph neural network for extractive text summarization[J].arXiv:2108.12870,2021. [39]KOURIS P,ALEXANDRIDIS G,STAFYLOPATIS A.Ab-stractive text summarization based on deep learning and semantic content generalization[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5082-5092. |
|