Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 97-105.doi: 10.11896/jsjkx.200300087

• Artificial Intelligence • Previous Articles     Next Articles

Multi-document Automatic Summarization Based on Sparse Representation

QIAN Ling-long, WU Jiao, WANG Ren-feng, LU Hui-juan   

  1. China Jiliang University,Hangzhou 310018,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:QIAN Ling-long,born in 1996,postgraduate,is a member of China Computer Federation.His main research interest include national language processing,knowledge graph and explainable AI.
    LU Hui-juan,born in 1962,professor,is director and an outstanding member of China Computer Federation.Her main research interest include machine learning,deep learning and big data.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61272315,61602431),Natural Science Foundation of Zhejiang province,China (LQ20F030015) and National College Student Innovation and Entrepreneurship Training Program (201810356020).

Abstract: Automatic document summary is an important task in the field of natural language processing.Limited by the difficulty of accurately understanding the semantics of documents,most of the documents are sorted by artificial features,such as word frequency and keywords,to extract the abstract.Inspired by the theory of sparse representation,a dynamic semantic space partition algorithm based on sparse representation is proposed.The algorithm performs dictionary learning on the initially divided semantic subspace,uses the obtained dictionary to sparsely reconstruct the sentence vector.Dynamically adjusts it to the division which has the smallest reconstruction error.Iteratively realizes the re-division of the semantic space.For abstracting sentences in the divided semantic subspace,an automatic extraction algorithm based on sparse similarity ranking is proposed.All sentence vectors in each semantic subspace are viewed as dictionary atoms.Through sparse reconstruction,the sparse similarity can be obtained which reflects the degree of semantic representation of one sentences to others.The cumulative sparse similarity of each sentence to other sentences is used as a metric to measure the ability of the sentence to represent the spatial semantic information.Ranking the cumulative sparse similarity,and then extract the required top N sentences.The experimental results on the travel review data set of popular attractions on the TripAdvisor website show that the semantic space reconstruction error can be rapidly reduced after5 iterations,remain stable which shows the convergence.Except for effectively reduce the reconstruction error by nearly 17%,the algorithm is also not sensitive to data dimensions.The proposed summary avoids repeated abstraction of redundant and highly repetitive text,which is an effective multi-document automatic summarization method.

Key words: Automatic summarization, Dictionary learning, Sparse reconstruction

CLC Number: 

  • TP391.1
[1] ZHANG C.Text summary algorithm based on semantic reconstruction[D].Nanjing:Nanjing University,2016.
[2] ALLAHYARI M,POURIYEH S,ASSEFI M,et al.Text Summarization Techniques:A Brief Survey[J].International Journal of Advanced Computer Science & Applications,2017,8(10):397-405.
[3] FERILLI S,PAZIENZA A.An Abstract Argumentation-Based Approach to Automatic Extractive Text Summarization[C]//Italian Research Conference on Digital Libraries.Springer,Cham,2018.
[4] LIU H,YU H,DENG Z H.Multi-document summarizationbased on two-level sparse representation model[C]//AAAI.2015:196-202.
[5] HE R,TANG J,GONG P,et al.Multi-document summarization via group sparse learning[J].Information Sciences,2016,349:12-24.
[6] HE Z,CHEN C,BU J,et al.Document summarization based on data reconstruction[C]//AAAI.2012:620-626.
[7] HE Z,CHEN C,BU J,et al.Unsupervised document summarization from data reconstruction perspective[J].Neurocompu-ting,2015,157:356-366.
[8] JIAO L C.Sparse learning,classification and recognition[M].SCIENCE PRESS,2017.
[9] XIONG X.Research on Extractive Answer Fusion for Q & A Community[D].Harbin:Harbin Institute of Technology,2018.
[10] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018.
[11] DEVLIN J,CHANG M W,LEEK,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[12] CANDES E J,ROMBERG J K.Practical signal recovery from random projection [J].Proc Spie,2005,5674:76-86.
[13] PENG S.Sparse representation coding model and its application in text classification [D].Tianjin:Tianjin University,2015.
[14] DAVENPORT M A,DUARTE M F,ELDAR Y C,et al.Introduction to Compressed Sensing[M]//Compressed Sensing:Theory and Applications.Cambridge:Cambridge University Press,2012.
[15] AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:an algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Image Processing,2006,54(11):4311-4322.
[16] PATI Y C,REZAIITAR R,KRISHNAPRASAD P S.Orthogonal matching pursuit:recursive function approximation with applications to wavelet decomposition[C]//Proceeding of the 27thAsilomar Conference on Signals,Systems and Computers.1993:40-44.
[17] DAVIS G,MALLAT S,AVELLANEDAM.Adaptive greedyapproximation[J].Constructive Approximation,1997,13 (1):57-98.
[18] CHENG H,LIU Z,HOU L,et al.Sparsity-Induced Similarity Measure and Its Applications[J].IEEE Transactions on Circuits &Systems for Video Technology,2016,26(4):613-626.
[1] ZHOU Wei, WANG Zhao-yu, WEI Bin. Abstractive Automatic Summarizing Model for Legal Judgment Documents [J]. Computer Science, 2021, 48(12): 331-336.
[2] FU Ying, WANG Hong-ling, WANG Zhong-qing. Scientific Paper Summarization Using Word-Section Association [J]. Computer Science, 2021, 48(10): 59-66.
[3] ZHANG Fan, HE Wen-qi, JI Hong-bing, LI Dan-ping, WANG Lei. Multi-view Dictionary-pair Learning Based on Block-diagonal Representation [J]. Computer Science, 2021, 48(1): 233-240.
[4] TIAN Xu, CHANG Kan, HUANG Sheng, QIN Tuan-fa. Single Image Super-resolution Algorithm Using Residual Dictionary and Collaborative Representation [J]. Computer Science, 2020, 47(9): 135-141.
[5] LI Jin-xia, ZHAO Zhi-gang, LI Qiang, LV Hui-xian and LI Ming-sheng. Improved Locality and Similarity Preserving Feature Selection Algorithm [J]. Computer Science, 2020, 47(6A): 480-484.
[6] WANG Jun-hao, YAN De-qin, LIU De-shan, XING Yu-jia. Algorithm with Discriminative Analysis Dictionary Learning by Fusing Extreme Learning Machine [J]. Computer Science, 2020, 47(5): 137-143.
[7] LI Xiu-qin, WANG Tian-jing, BAI Guang-wei, SHEN Hang. Two-phase Multi-target Localization Algorithm Based on Compressed Sensing [J]. Computer Science, 2019, 46(5): 50-56.
[8] DU Xiu-li, ZUO Si-ming, QIU Shao-ming. Adaptive Dictionary Learning Algorithm Based on Image Gray Entropy [J]. Computer Science, 2019, 46(5): 266-271.
[9] WU Chen, YUAN Yu-wei, WANG Hong-wei, LIU Yu, LIU Si-tong, QUAN Ji-cheng. Word Vectors Fusion Based Remote Sensing Scenes Zero-shot Classification Algorithm [J]. Computer Science, 2019, 46(12): 286-291.
[10] ZHANG Zhen-zhen ,WANG Jian-lin. Dictionary Learning Image Denoising Algorithm Combining Second Generation Bandelet Transform Block [J]. Computer Science, 2018, 45(7): 264-270.
[11] LV Ju-jian, ZHAO Hui-min, CHEN Rong-jun, LI Jian-hong. Unsupervised Active Learning Based on Adaptive Sparse Neighbors Reconstruction [J]. Computer Science, 2018, 45(6): 251-258.
[12] YOU Si-si, YING Long, GUO Wen, DING Xin-miao and HUA Zhen. Discriminative Visual Tracking by Collaborative Structural Sparse Reconstruction [J]. Computer Science, 2018, 45(3): 69-75.
[13] WANG Tie-jian, WU Fei and JING Xiao-yuan. Multiple Kernel Dictionary Learning for Software Defect Prediction [J]. Computer Science, 2017, 44(12): 131-134.
[14] XU Yu-ming, SONG Jia-wei and XIAO Xian-jian. Super Resolution Algorithm Based on Sub-pixel Block Matching and Dictionary Learning [J]. Computer Science, 2016, 43(8): 304-308.
[15] SHI Jing-lan, CHANG Kan, ZHANG Zhi-yong and QIN Tuan-fa. Coefficient-similarity-based Dictionary Learning Algorithm for Face Recognition [J]. Computer Science, 2016, 43(6): 298-302.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!