计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 97-105.doi: 10.11896/jsjkx.200300087
钱玲龙, 武娇, 王人锋, 陆慧娟
QIAN Ling-long, WU Jiao, WANG Ren-feng, LU Hui-juan
摘要: 文档自动摘要是自然语言处理领域中的重要任务,受限于难以准确理解文档语义,大多通过词频、关键词等人工特征对文档句子进行重要程度排序,以此提取摘要。受稀疏表示理论启发,提出了一种基于稀疏表示的动态语义空间划分算法。算法对初始划分的语义子空间进行字典学习,利用所得字典对所有句向量进行稀疏重构,从而将各句向量动态调整至重构误差最小的划分,迭代地实现语义空间的重划分。对于划分后语义子空间内摘要句的提取,提出了一种基于稀疏相似度排序的自动摘要提取算法。将各语义子空间的所有句向量作为字典原子,通过稀疏重构,得到能体现句子对其他句子语义表征程度的稀疏相似度,以各句累积稀疏相似度作为衡量句子表征空间语义信息能力的指标,依据其排序来提取摘要句。在猫途鹰网站热门景点旅游评论数据集上进行了实验,结果表明语义空间重构误差快速迭代5次即可稳定收敛且平均有效降低重构误差约17%,且算法对数据维度不敏感,所提摘要避免了重复提取冗余度大、重复性高的文本,是一种有效的自动摘要方法。
中图分类号:
[1] ZHANG C.Text summary algorithm based on semantic reconstruction[D].Nanjing:Nanjing University,2016. [2] ALLAHYARI M,POURIYEH S,ASSEFI M,et al.Text Summarization Techniques:A Brief Survey[J].International Journal of Advanced Computer Science & Applications,2017,8(10):397-405. [3] FERILLI S,PAZIENZA A.An Abstract Argumentation-Based Approach to Automatic Extractive Text Summarization[C]//Italian Research Conference on Digital Libraries.Springer,Cham,2018. [4] LIU H,YU H,DENG Z H.Multi-document summarizationbased on two-level sparse representation model[C]//AAAI.2015:196-202. [5] HE R,TANG J,GONG P,et al.Multi-document summarization via group sparse learning[J].Information Sciences,2016,349:12-24. [6] HE Z,CHEN C,BU J,et al.Document summarization based on data reconstruction[C]//AAAI.2012:620-626. [7] HE Z,CHEN C,BU J,et al.Unsupervised document summarization from data reconstruction perspective[J].Neurocompu-ting,2015,157:356-366. [8] JIAO L C.Sparse learning,classification and recognition[M].SCIENCE PRESS,2017. [9] XIONG X.Research on Extractive Answer Fusion for Q & A Community[D].Harbin:Harbin Institute of Technology,2018. [10] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018. [11] DEVLIN J,CHANG M W,LEEK,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [12] CANDES E J,ROMBERG J K.Practical signal recovery from random projection [J].Proc Spie,2005,5674:76-86. [13] PENG S.Sparse representation coding model and its application in text classification [D].Tianjin:Tianjin University,2015. [14] DAVENPORT M A,DUARTE M F,ELDAR Y C,et al.Introduction to Compressed Sensing[M]//Compressed Sensing:Theory and Applications.Cambridge:Cambridge University Press,2012. [15] AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:an algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Image Processing,2006,54(11):4311-4322. [16] PATI Y C,REZAIITAR R,KRISHNAPRASAD P S.Orthogonal matching pursuit:recursive function approximation with applications to wavelet decomposition[C]//Proceeding of the 27thAsilomar Conference on Signals,Systems and Computers.1993:40-44. [17] DAVIS G,MALLAT S,AVELLANEDAM.Adaptive greedyapproximation[J].Constructive Approximation,1997,13 (1):57-98. [18] CHENG H,LIU Z,HOU L,et al.Sparsity-Induced Similarity Measure and Its Applications[J].IEEE Transactions on Circuits &Systems for Video Technology,2016,26(4):613-626. |
[1] | 周蔚, 王兆毓, 魏斌. 面向法律裁判文书的生成式自动摘要模型 Abstractive Automatic Summarizing Model for Legal Judgment Documents 计算机科学, 2021, 48(12): 331-336. https://doi.org/10.11896/jsjkx.210500028 |
[2] | 张帆, 贺文琪, 姬红兵, 李丹萍, 王磊. 基于块对角化表示的多视角字典对学习 Multi-view Dictionary-pair Learning Based on Block-diagonal Representation 计算机科学, 2021, 48(1): 233-240. https://doi.org/10.11896/jsjkx.200800211 |
[3] | 田旭, 常侃, 黄升, 覃团发. 基于残差字典及协作表达的单图像超分辨率算法 Single Image Super-resolution Algorithm Using Residual Dictionary and Collaborative Representation 计算机科学, 2020, 47(9): 135-141. https://doi.org/10.11896/jsjkx.190600146 |
[4] | 李金霞, 赵志刚, 李强, 吕慧显, 李明生. 改进的局部和相似性保持特征选择算法 Improved Locality and Similarity Preserving Feature Selection Algorithm 计算机科学, 2020, 47(6A): 480-484. https://doi.org/10.11896/JsJkx.20190800095 |
[5] | 王军浩, 闫德勤, 刘德山, 邢钰佳. 融合极端学习机的判别性分析字典学习算法 Algorithm with Discriminative Analysis Dictionary Learning by Fusing Extreme Learning Machine 计算机科学, 2020, 47(5): 137-143. https://doi.org/10.11896/jsjkx.190600090 |
[6] | 王丽芳, 史超宇, 蔺素珍, 秦品乐, 高媛. 基于联合图像块聚类自适应字典学习的多模态医学图像融合 Multi-modal Medical Image Fusion Based on Joint Patch Clustering of Adaptive Dictionary Learning 计算机科学, 2019, 46(7): 238-245. https://doi.org/10.11896/j.issn.1002-137X.2019.07.036 |
[7] | 杜秀丽, 左思铭, 邱少明. 基于图像灰度熵的自适应字典学习算法 Adaptive Dictionary Learning Algorithm Based on Image Gray Entropy 计算机科学, 2019, 46(5): 266-271. https://doi.org/10.11896/j.issn.1002-137X.2019.05.041 |
[8] | 李秀琴, 王天荆, 白光伟, 沈航. 基于压缩感知的两阶段多目标定位算法 Two-phase Multi-target Localization Algorithm Based on Compressed Sensing 计算机科学, 2019, 46(5): 50-56. https://doi.org/10.11896/j.issn.1002-137X.2019.05.007 |
[9] | 吴晨, 袁昱纬, 王宏伟, 刘宇, 刘思彤, 全吉成. 基于词向量融合的遥感场景零样本分类算法 Word Vectors Fusion Based Remote Sensing Scenes Zero-shot Classification Algorithm 计算机科学, 2019, 46(12): 286-291. https://doi.org/10.11896/jsjkx.181202257 |
[10] | 张真真,王建林. 结合第二代Bandelet变换分块的字典学习图像去噪算法 Dictionary Learning Image Denoising Algorithm Combining Second Generation Bandelet Transform Block 计算机科学, 2018, 45(7): 264-270. https://doi.org/10.11896/j.issn.1002-137X.2018.07.046 |
[11] | 吕巨建, 赵慧民, 陈荣军, 李键红. 基于自适应稀疏邻域重构的无监督主动学习算法 Unsupervised Active Learning Based on Adaptive Sparse Neighbors Reconstruction 计算机科学, 2018, 45(6): 251-258. https://doi.org/10.11896/j.issn.1002-137X.2018.06.045 |
[12] | 李键红,吴亚榕,吕巨建. 基于组稀疏表示的在线单帧图像超分辨率算法 Online Single Image Super-resolution Algorithm Based on Group Sparse Representation 计算机科学, 2018, 45(4): 312-318. https://doi.org/10.11896/j.issn.1002-137X.2018.04.053 |
[13] | 王铁建,吴飞,荆晓远. 基于多核字典学习的软件缺陷预测 Multiple Kernel Dictionary Learning for Software Defect Prediction 计算机科学, 2017, 44(12): 131-134. https://doi.org/10.11896/j.issn.1002-137X.2017.12.026 |
[14] | 徐煜明,宋佳伟,肖贤建. 基于亚像素块匹配和字典学习的超分辨率算法 Super Resolution Algorithm Based on Sub-pixel Block Matching and Dictionary Learning 计算机科学, 2016, 43(8): 304-308. https://doi.org/10.11896/j.issn.1002-137X.2016.08.062 |
[15] | 施静兰,常侃,张智勇,覃团发. 人脸识别中基于系数相似性的字典学习算法 Coefficient-similarity-based Dictionary Learning Algorithm for Face Recognition 计算机科学, 2016, 43(6): 298-302. https://doi.org/10.11896/j.issn.1002-137X.2016.06.059 |
|