Computer Science ›› 2018, Vol. 45 ›› Issue (6): 57-66.doi: 10.11896/j.issn.1002-137X.2018.06.010

• WISA2022 • Previous Articles     Next Articles

Diversity Measures Method in High-dimensional Semantic Vector Based on Asymmetric Multi-valued Feature Jaccard Coefficient

FENG Yan-hong1,2, YU Hong1,2, SUN Geng1,2, PENG Song1   

  1. College of Information Engineering,Dalian Ocean University,Dalian 116023,China1;
    Key Laboratory of Marine Information Technology of Liaoning Province,Dalian Ocean University,Dalian 116023,China2
  • Online:2018-06-15 Published:2018-07-24

Abstract: The diversity measures of semantic vector are important base of natural language processing problem resolved by deep learning methods.There is a problem of “measurement concentration” in the diversity measure of high dimension semantic vector,which leads to the diversity of the semantic vectors disappear when the diversity are obtained by the traditional measure methods.To resolve this problem,a diversity measures method based on the asymmetric multi-valued feature Jaccard coefficient was proposed.From the statistical distribution of the dimension values of the high-dimensional semantic vector,the values of the partial dimensions are densely distributed in a certain range,which makes them impossible to contribute the diversity.Therefore,the contribution of different dimensions to the diversity is diffe-rent and has asymmetry.This method defines the importance function about the dimension value,selects the dimensions of the importance function value satisfying the threshold to participate in the diversity calculation and removes the dimensions that can not contribute the diversity,and then realizes the dimensionality reduction and alleviates the problem of “measurement concentration”.The experiments were respectively conducted on fishery data sets and public data sets.Different measures methods of the different dimension semantic vector were compared.Under the condition that the semantic nature is not markedly reduced,the diversity index of theproposed method is much higher than the current optimal measures method.

Key words: Asymmetric multi-valued feature, High-dimensional semantic vector, Jaccard coefficient, Measurement concentration, Measures method

CLC Number: 

  • TP183
[2]PACCANARO A,HINTO G E.Learning distributed representations of concepts using linear relational embedding[J].IEEE Transactions on Knowledge & Data Engineering,2001,13(2):232-244.
[3]BENGIO Y,SCHWENK H,SENÉCAL J,et al.Neural Probabilistic Language Models[J].Journal of Machine Learning Research,2001,3(6):1137-1155.
[4]FENG Y H,YU H,SUN G,et al.Domain-specific Terminology Recognition Method Based on Word Embedding and CRF[J].Journal of Computer Applications,2016,36(11):3146-3151.(in Chinese)
[5]YAN J,LIU W F,LIN H F.Music Recommendation Study Based on Tags Multi-Space[J].Journal of Chinese Information Processing,2014,28(4):117-122.(in Chinese)
[6]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].ComputerScien-ce,arXiv:1301.378lv3.
[7]MIKOLOV T,SUTSKEVER I,CEHN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
[8]BUHLMANN P,VAN DE GEER S.Statistics for High-Dimensional Data[M].Springer-Verlag Berlin Heidelberg,2011.
[9]BELLMAN R.Adaptive Control Process:A Guide Tour[M].Princeton University Press,Princeton,New Jersey,1961.
[10]FUKUNAGA K.Introduction to Statistical Pattern Recognition(2nd ed)[M].New York:Academicpress,1972.
[11]LEDOUX M.The concentration of measure phenomenon[J].Mathematical Surveys and Monographs,2001,89:94-124.
[12]HE L,CAI Y C,YANG Z.Researches on Similarity Measurement of High Dimensional Data[J].Computer Science,2010,37(5):155-156.(in Chinese)
[13]HE J R,DING L X,HU Q H,et al.Properties of High-dimensional Data Space and Metric Choice[J].Computer Science,2014,41(3):212-217.(in Chinese)
[14]CHEN S G,ZHANG D Q.Experimental Comparisons of Semi-Supervised Dimensional Reduction Methods[J].Journal of Software,2011,22(1):28-43.(in Chinese)
[15]FENG L,LIU S L,ZHANG J,et al.Robust Activation Function of Extreme Learning Machine and Linea Dimensionality Reduction in High-Dimensional Data[J].Journal of Computer Research and Development,2014,51(6):1331-1340.(in Chinese)
[16]LAI S W.Word and Document Embedding Based on Neural Network Approaches[D].Beijing:University of Chinese Academy of Sciences,2016:27-39.(in Chinese)
[17]JACCARD P.Etude de la distribution florale dans une portion des Alpes et du Jura[J].Bulletin De La Societe Vaudoise Des Sciences Naturelles,1901,37(142):547-579.
[18]Jaccard index[EB/OL].[2017-4-29].
[19]SAMANTHULA B K,JIANG W.Secure Multiset Intersection Cardinality and its Application to Jaccard Coefficient[J].IEEE Transactions on Dependable & Secure Computing,2016,13(5):1.
[20]CHENG Y,WANG S T.A Multiple Alternative Clusterings Mining Algorithm Using Locality Preserving Projections[J].CAAI Transactions on Intelligent Systems,2016,11(5):600-607.(in Chinese)
[21]LIAO B,ZHANG T,YU J,et al.Efficiency Optimization of Jaccard's Similarity Coefficient Based on Two Dimensional Partition[J].Computer Science,2017,44(1):219-225.(in Chinese)
[22]TANIMOTO T T.An Elementary Mathematical theory of Classification and Prediction[R].Internal IBM Technical Report,1957.
[23]ROGERS ,TANIMOTO D J,TAFFEE T.A Computer Program for Classifying Plants[J].Science,1960,132(3434):1115-1118.
[24] 潘迎捷.水产辞典[M].上海:上海辞书出版社,2007.
[1] WANG Guan-yu, ZHONG Ting, FENG Yu, ZHOU Fan. Collaborative Filtering Recommendation Method Based on Vector Quantization Coding [J]. Computer Science, 2022, 49(9): 48-54.
[2] HUANG Li, ZHU Yan, LI Chun-ping. Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(9): 76-82.
[3] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[4] SHUAI Jian-bo, WANG Jin-ce, HUANG Fei-hu, PENG Jian. Click-Through Rate Prediction Model Based on Neural Architecture Search [J]. Computer Science, 2022, 49(7): 10-17.
[5] DU Hang-yuan, LI Duo, WANG Wen-jian. Method for Abnormal Users Detection Oriented to E-commerce Network [J]. Computer Science, 2022, 49(7): 170-178.
[6] TANG Feng, FENG Xiang, YU Hui-qun. Multi-task Cooperative Optimization Algorithm Based on Adaptive Knowledge Transfer andResource Allocation [J]. Computer Science, 2022, 49(7): 254-262.
[7] CAI Xin-yu, FENG Xiang, YU Hui-qun. Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes [J]. Computer Science, 2022, 49(6): 134-141.
[8] PU Qian-qian, LEI Hang, LI Zhen-hao, LI Xiao-yu. Personalized News Recommendation Algorithm with Enhanced List Information and User Interests [J]. Computer Science, 2022, 49(6): 142-148.
[9] XIONG Zhong-min, SHU Gui-wen, GUO Huai-yu. Graph Neural Network Recommendation Model Integrating User Preferences [J]. Computer Science, 2022, 49(6): 165-171.
[10] DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[11] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[12] LIU Bao-bao, YANG Jing-jing, TAO Lu, WANG He-ying. Study on Prediction of Educational Statistical Data Based on DE-LSTM Model [J]. Computer Science, 2022, 49(6A): 261-266.
[13] ZHOU Zhi-hao, CHEN Lei, WU Xiang, QIU Dong-liang, LIANG Guang-sheng, ZENG Fan-qiao. SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm [J]. Computer Science, 2022, 49(6A): 562-570.
[14] WANG Jian. Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving [J]. Computer Science, 2022, 49(6A): 575-580.
[15] WANG Shan, XU Chu-yi, SHI Chun-xiang, ZHANG Ying. Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM [J]. Computer Science, 2022, 49(6A): 675-679.
Full text



No Suggested Reading articles found!