Computer Science ›› 2021, Vol. 48 ›› Issue (8): 60-65.doi: 10.11896/jsjkx.200700008

• Database & Big Data & Data Science • Previous Articles     Next Articles

Text Matching Method Based on Fine-grained Difference Features

WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga   

  1. Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China
  • Received:2020-07-01 Revised:2020-08-20 Published:2021-08-10
  • About author:WANG Sheng,born in 1996,postgra-duate.His main research interests include natural language processing and machine learning.(1028742881@qq.com)ZHANG Yang-sen,born in 1962,postdoctoral,professor,is a distinguished member of China Computer Federation.His main research interests include na-tural language processing and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61772081),National Key Research and Development Plan(2018YFB1403104) and Research Fund of Beijing Information Science and Technology University(2035008).

Abstract: Text matching is one of the key technologies in the retrieval system.Aiming at the problem that the existing text ma-tching models can't capture the semantic differences of texts accurately,this paper proposes a text matching method based on fine-grained difference features.Firstly,the pre-trained model is used as the basic model to extract the matching text semantics and preliminarily match them.Then,the idea of adversarial learning is introduced in the embedding layer,and by constructing the virtual confrontation samples artificially for training,the learning ability and generalization ability of the model are improved.Finally,by introducing the fine-grained difference feature of the text to correct the preliminary prediction results of the text ma-tching,the capture ability of the model for fine-grained difference features is effectively improved,and then the performance of the text matching model is improved.In this paper,two datasets are tested,and the experiment on LCQMC dataset shows that the performance index of ACC is 88.96%,which is better than the best known model.

Key words: Adversarial learning, Difference feature, Pre-trained model, Semantic similarity, Text match

CLC Number: 

  • TP391.1
[1]LI X.Research on paragraph retrieval technology for questionanswering system[D].Chengdu:University of Science and Technology of China,2010.
[2]SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1988,24(5):513-523.
[3]SONG F,CROFT W B.A general language model for information retrieval[C]//Proceedings of the Eighth International Conference on Information and Knowledge Management.1999:316-321.
[4]LE Q,MIKOLOV T.Distributed representations of sentencesand documents[C]//International Conference on Machine Learning.2014:1188-1196.
[5]LOGESWARAN L,LEE H.An efficient framework for learning sentence representations[J].arXiv:1803.02893,2018.
[6]CER D,YANG Y,KONG S,et al.Universal sentence encoder[J].arXiv:1803.11175,2018.
[7]YIN W,SCHÜTZE H,XIANG B,et al.Abcnn:Attention-based convolutional neural network for modeling sentence pairs[J].Transactions of the Association for Computational Linguistics,2016,4:259-272.
[8]CHEN Q,ZHU X,LING Z,et al.Enhanced lstm for natural language inference[J].arXiv:1609.06038,2016.
[9]WANG Z,HAMZA W,FLORIAN R.Bilateral multi-perspective matching for natural language sentences[J]. arXiv:1702.03814,2017.
[10]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding with unsupervised learning[R/OL].Technical Report,OpenAI,2018.https://openai.com/blog/language-unsupervised/.
[11]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[12]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9.
[13]SUN Y,WANG S,LI Y,et al.Ernie:Enhanced representation through knowledge integration[J].arXiv:1904.09223,2019.
[14]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].arXiv:1910.10683,2019.
[15]HU W,DANG A,TAN Y.A Survey of State-of-the-Art Short Text Matching Algorithms[C]//International Conference on Data Mining and Big Data.Singapore:Springer,2019:211-219.
[16]SAKATA W,SHIBATA T,TANAKA R,et al.FAQ retrieval using query-question similarity and BERT-based query-answer relevance[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:1113-1116.
[17]WU Y,WANG R J.Application of BERT-Based Semantic Ma-tching Algorithm in Question Answering System[J].Instrumentation Technology,2020(6):19-22,30.
[18]WANG N Z.Research on improved text representation model based on BERT[D].Southwest University.2019.
[19]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv:1412.6572,2014.
[20]ZHU C,CHENG Y,GAN Z,et al.Freelb:Enhanced adversarial training for natural language understanding[C]//International Conference on Learning Representations.2019.
[21]YAN J,BRACEWELL D B,REN F,et al.A semantic analyzer for aiding emotion recognition in Chinese[C]//International Conference on Intelligent Computing.Berlin,Heidelberg:Springer,2006:893-901.
[22]SONG Y,SHI S,LI J,et al.Directional skip-gram:Explicitlydistinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 2 (Short Papers).2018:175-180.
[23]LIU X,CHEN Q,DENG C,et al.Lcqmc:A large-scale chinese question matching corpus[C]//Proceedings of the 27th International Conference on Computational Linguistics.2018:1952-1962.
[24]ZHANG X,LU W,ZHANG G,et al.Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model[C]//Paci-fic-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2020:246-257.
[25]LIU W,ZHOU P,ZHAO Z,et al.K-BERT:Enabling Language Representation with Knowledge Graph[C]//AAAI.2020:2901-2908.
[26]CHEN J,CAO C,JIANG X.SiBert:Enhanced Chinese Pre-trained Language Model with Sentence Insertion[C]//Procee-dings of The 12th Language Resources and Evaluation Confe-rence.2020:2405-2412.
[27]MENG Y,WU W,WANG F,et al.Glyce:Glyph-vectors forChinese character representations[C]//Advances in Neural Information Processing Systems.2019:2746-2757.
[28]CUI Y,CHE W,LIU T,et al.Pre-training with whole word masking for chinese bert[J].arXiv:1906.08101,2019.
[29]KOEHN P.Statistical significance tests for machine translation evaluation[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:388-395.
[1] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[2] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[3] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[4] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[5] LUO Yue-tong, WANG Tao, YANG Meng-nan, ZHANG Yan-kong. Historical Driving Track Set Based Visual Vehicle Behavior Analytic Method [J]. Computer Science, 2021, 48(9): 86-94.
[6] LIU Li-bo, GOU Ting-ting. Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning [J]. Computer Science, 2021, 48(9): 200-207.
[7] TANG Shi-zheng, ZHANG Yan-feng. DragDL:An Easy-to-Use Graphical DL Model Construction System [J]. Computer Science, 2021, 48(8): 220-225.
[8] ZHAN Wan-jiang, HONG Zhi-lin, FANG Lu-ping, WU Zhe-fu, LYU Yue-hua. Collaborative Filtering Recommendation Algorithm Based on Adversarial Learning [J]. Computer Science, 2021, 48(7): 172-177.
[9] LYU Le-bin, LIU Qun, PENG Lu, DENG Wei-bin , WANG Chong-yu. Text Matching Fusion Model Combining Multi-granularity Information [J]. Computer Science, 2021, 48(6): 196-201.
[10] ZHAN Rui, LEI Yin-jie, CHEN Xun-min, YE Shu-han. Street Scene Change Detection Based on Multiple Difference Features Network [J]. Computer Science, 2021, 48(2): 142-147.
[11] WU Yu, LI Zhou-jun. Survey on Retrieval-based Chatbots [J]. Computer Science, 2021, 48(12): 278-285.
[12] HUANG Xin, LEI Gang, CAO Yuan-long, LU Ming-ming. Review on Interactive Question Answering Techniques Based on Deep Learning [J]. Computer Science, 2021, 48(12): 286-296.
[13] ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[14] ZHANG Yun-fan,ZHOU Yu,HUANG Zhi-qiu. Semantic Similarity Based API Usage Pattern Recommendation [J]. Computer Science, 2020, 47(3): 34-40.
[15] MA Xiao-hui, JIA Jun-zhi, ZHOU Xiang-zhen, YAN Jun-ya. Semantic Similarity-based Method for Sentiment Classification [J]. Computer Science, 2020, 47(11): 275-279.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!