计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200005-10.doi: 10.11896/jsjkx.231200005

• 智能计算 • 上一篇    下一篇

基于深度学习的海洋热点新闻挖掘方法

覃娴萍1, 丁昭旭1, 仲国强1, 王栋2   

  1. 1 中国海洋大学计算机科学与技术学院 山东 青岛 266404
    2 中国海洋大学图书馆 山东 青岛 266404
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 王栋(wangdong@ouc.edu.cn)
  • 作者简介:(qinxianping_20@163.com)
  • 基金资助:
    科技创新2030-“新一代人工智能”重大项目(2018AAA0100400);山东省自然科学基金(ZR2020MF131,ZR2021ZD19);青岛市科技计划项目(21-1-4-ny-19-nsh);中国海洋大学图书情报研究基金(202253006)

Deep Learning-based Method for Mining Ocean Hot Spot News

QIN Xianping1, DING Zhaoxu1, ZHONG Guoqiang1, WANG Dong2   

  1. 1 College of Computer Science and Technology,Ocean University of China,Qingdao,Shandong 266404,China
    2 Library of Ocean University of China,Qingdao,Shandong 266404,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:QIN Xianping,born in 2000,postgra-duate.Her main research interests include neural architecture search and natural language processing.
    WANG Dong,born in 1979,Ph.D,senior engineer.His main research interests include machine vision, embedded system,software programming, and IoT design.
  • Supported by:
    Scientific and Technological Innovation 2030-Major Project for New Generation of AI(2018AAA0100400),Natural Science Foundation of Shandong Province,China(ZR2020MF131,ZR2021ZD19),Science and Technology Program of Qingdao(21-1-4-ny-19-nsh) and Fundamental Research Funds for the Central Universities(202253006).

摘要: 移动互联网的快速发展和现代移动客户端的普及推动了网络新闻行业、社交媒体和自媒体等的蓬勃发展,为用户提供了多元、丰富的海量信息。随着我国海洋强国战略的稳步推进,国民海洋意识的显著增强,有关海洋领域的多方面信息充斥着网络,相关媒体报道、公众舆论在网上大量涌现,热点事件频频发生。针对多来源、多属性的网络海洋信息,基于多源文本聚类和自动摘要技术,提出一种基于深度学习的海洋热点新闻自动挖掘系统,包括多源涉海数据自动采集、数据预处理、特征提取、文本聚类、自动摘要五大功能模块。具体而言,网络爬虫程序从多个数据源釆集多样且分散的海洋数据,自动将数据结构化后存入数据库;根据文本特征的近似程度和文本间的关联关系实现聚类分析,聚类结果为后继摘要生成、主题发现提供数据支撑;基于预训练语言模型强大的上下文理解能力和丰富的语言表达能力,提出基于预训练语言模型的海洋新闻自动摘要生成方法。通过多组实验证明了所提方法在各个评估指标上的有效性,突显出其在多源异构网络海洋新闻挖掘方面的优势。该方法为处理分散的海洋资讯信息、生成可读性更强的内容摘要提供可行的解决方案,对提高海洋信息获取效率、监测公众舆论走向、推动海洋信息的应用与传播具有重要意义。

关键词: 海洋新闻, 文本聚类, 自动摘要, 深度学习, 自然语言处理, 预训练模型

Abstract: The rapid development of the mobile Internet and the popularity of modern mobile clients promote the vigorous deve-lopment of the online news industry,social media and self-media,etc.,providing users with diverse and rich information.With the steady advancement of China's maritime power strategy and the significant enhancement of national maritime eawareness,the Internet is flooded with multifaceted information on the ocean field, with relevant media reports and public opinions proliferating online and hotspot events occurring frequently.Aiming at multi-source and multi-attribute network marine information,based on multi-source text clustering and automatic summarization technology,an automatic deep learning-based ocean hot news mining system is proposed,including five functional modules:automatic collection of multi-source ocean-related data,data preprocessing,feature extraction,text clustering,and automatic summarization.Specifically,the web crawler program collects diverse and scattered ocean data from multiple data sources,automatically structures the data and stores it in the database;clustering analysis is performed based on the similarity of text features and relationships between texts,which provides data support for subsequent summarization generation and topic discovery.Additionally,an automatic summary generation method for ocean news is proposed,leveraging the powerful contextual understanding and rich language expression abilities of the pre-trained language mo-dels.Multiple experiments demonstrate the effectiveness of the proposed method in each evaluation index,highlighting its superiority in mining news on multi-source heterogeneous networks.This method provides a feasible solution for processing scattered marine information and generating more readable content summaries,significantly contributing to the enhancement of marine information retrieval efficiency,monitoring public opinion trends,and promoting the application and dissemination of marine information.

Key words: Ocean news, Text clustering, Automatic summarization, Deep learning, Natural language processing, Pre-trained model

中图分类号: 

  • TP391
[1]LIU Z C,LIN G S,GOH W L.Bottom-up scene text detection with Markov clustering networks[J].International Journal of Computer Vision,2020,128:1786-1809.
[2]FAN J C.Large-scale subspace clustering via k-factorization[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:342-352.
[3]DANG Z Y,DENG C,YANG X,et al.Nearest Neighbor Ma-tching for Deep Clustering[C]//Proceedings of the IEEE/CVF Conference on Compute Vision and Pattern Recognition.2021:13693-13702.
[4]HARTL P,KRUSCHWITZ U.Applying Automatic Text Summarization for Fake News Detection[J].arXiv:2204.01841,2022.
[5]LI H R,ZhU J N,ZhANG J J,et al.Keywords-guided abstractive sentence summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8196-8203.
[6]ABDI A,HASAN S,SHAMMUDDIN S M,et al.A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion[J].Knowledge-Based Systems,2021,213:106658.
[7]ALAMI N,MEKNASSI M,EN-NAHNAHI N,et al.Unsuper-vised neural networks for automatic Arabic text summarization using document clustering and topic modeling[J].Expert Systems with Applications,2021,172:114652.
[8]STEFANOVITCH N,JACQUET G,LONGUEVILLE B D.Graph and Embedding based Approach for Text Clustering:Topic Detection in a Large Multilingual Public Consultation[C]//Companion Proceedings of the ACM Web Conference 2023.2023:694-700.
[9]MCCONVILLE R,SANTOS-RODRIGUEZ R,PIECHOCKI RJ,et al.N2D:(not too) deep clustering via clustering the local manifold of an autoencoded embedding[C]//2020 25th international conference on pattern recognition(ICPR).IEEE,2021:5145-5152.
[10]WANG D X,LI T R,DENG P,et al.A Generalized Deep Lear-ning Algorithm based on NMF for Multi-view Clustering[J].IEEE Transactions on Big Data,2022,9(1):328-340.
[11]GEORGE L,SUMATHY P.An integrated clustering and BERT framework for improved topic modeling[J].International Journal of Information Technology,2023,15(4):2187-2195.
[12]OLUKANMI P,NELWAMONDO F,MARWALA T,et al.Automatic detection of outliers and the number of clusters in k-means clustering via Chebyshev-type inequalities[J].Neural Computing and Applications,2022,34(8):5939-5958.
[13]SAHA J,MUKHERJEE J.CNAK:Cluster number assistedK-means[J].Pattern Recognition,2021,110:107625.
[14]ZHAO X W,NIE F P,WANG R,et al.Improving projectedfuzzy K-means clustering via robust learning[J].Neurocompu-ting,2022,101,491:34-43.
[15]UNGER H,KUBEK M,HLOCH M,et al.A survey on innovative graph-based clustering algorithms[J].The Autonomous Web,2022,101:95-110.
[16]WANG C,PAN S R,CELINA P Y,et al.Deep neighbor-aware embedding for node clustering in attributed graphs[J].Pattern Recognition,2022,122:108230.
[17]RAN X C,XI Y,LU Y,et al.Comprehensive survey on hierarchical clustering algorithms and the recent developments[J].Artificial Intelligence Review,2023,56(8):8219-8264.
[18]DOGAN A,BIRANT D.K-centroid link:a novel hierarchicalclusteringlinkage method[J].Applied Intelligence,2022,52(5):5537-5560.
[19]IKOTUN A M,EZUGWU A E,ABUALIGAH L,et al.K-means clustering algorithms:A comprehensive review,va-riants analysis,and advances in the era of big data[J].Information Sciences,2023,622:178-210.
[20]HUANG S D,KANGZ,XU Z,et al.Robust deep k-means:An effective and simple method for data clustering[J].Pattern Re-cognition,2021,117:107996.
[21]SHRIFAN N H M M,AKBAR M F,ISA N A M.An adaptive outlier removal aided k-means clustering algorithm[J].Journal of King Saud University-Computer and Information Sciences,2022,34(8):6365-6376.
[22]HE W H,WU C J,ZHOU S J,et al.Study on Short Text Clustering with Unsupervised SimCSE[J].Computer Science,2023,50(11):71-76.
[23]LI Y F,YANG M,X PENG D Z,et al.Twin contrastive learning for online clustering[J].International Journal of Computer Vision,2022,130(9):2205-2221.
[24]LIU Y,TU W X,ZHOU S H,et al.Deep graph clustering via dual correlation reduction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(7):7603-7611.
[25]CAI J Y,FAN J C,GUO W Z,et al.Efficient deep embedded subspace clustering[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:1-10.
[26]SUBAKTI A,MURFI H,HARIADI N.The performance ofBERT as data representation of text clustering[J].Journal of big Data,2022,9(1):1-21.
[27]CAI J Y,WANG S P,XU C,Y et al.Unsupervised deep clustering via contractive feature representation and focal loss[J].Pattern Recognition,2022,123:108386.
[28]RONEN M,FINDER S E,FREIFELD O.DeepDPM:Deep clustering with an unknown number of clusters[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:9861-9870.
[29]EL-KASSAS W S,SALAMA C R,RAFEA A A,et al.Auto-matic text summarization:A comprehensive survey[J].Expert systems with applications,2021,165:113679.
[30]CAI X Y,SHI K L,JIANG Y H,et al.HITS-based attentional neural model for abstractive summarization[J].Knowledge-Based Systems,2021,222:106996.
[31]LIU Y X,LIU P F,RADEV D,et al.BRIO:Bringing order toabstractive summarization[J].arXiv:2203.16804,2022.
[32]JIN H Q,WANG T M,WAN X J.SemSUM:Semantic depen-dency guided neural abstractive summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8026-8033.
[33]JOSHI A,FIDALGO E,ALEGRE E,et al.RankSum—An unsupervised extractive text summarization based on rank fusion[J].Expert Systems with Applications,2022,200:116846.
[34]JOSHI A,FIDALGO E,ALEGRE E,et al.DeepSumm:Exploiting topic models and sequence to sequence networks for extractive text summarization[J].Expert Systems with Applications,2023,211:118442.
[35]MAO X J,WEI Y,YANG Y R,et al.KHGAS:KeywordsGuided Heterogeneous Graph for Abstractive Summarization[J].Computer Science,2024,51(7):278-286.
[36]SRIVASTAVA R,SINGH P,RANA K P S,et al.A topic mo-deled unsupervised approach to single document extractive text summarization[J].Knowledge-Based Systems,2022,246:108636.
[37]KHURANA A,BHATNAGAR V.Investigating entropy for extractive document summarization[J].Expert Systems with Applications,2022,187:115820.
[38]JING B Y,YOU Z Y,YANG T,et al.Multiplex graph neural network for extractive text summarization[J].arXiv:2108.12870,2021.
[39]KOURIS P,ALEXANDRIDIS G,STAFYLOPATIS A.Ab-stractive text summarization based on deep learning and semantic content generalization[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5082-5092.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!