Computer Science ›› 2018, Vol. 45 ›› Issue (6): 57-66.doi: 10.11896/j.issn.1002-137X.2018.06.010

• WISA2022 • Previous Articles     Next Articles

Diversity Measures Method in High-dimensional Semantic Vector Based on Asymmetric Multi-valued Feature Jaccard Coefficient

FENG Yan-hong1,2, YU Hong1,2, SUN Geng1,2, PENG Song1   

  1. College of Information Engineering,Dalian Ocean University,Dalian 116023,China1;
    Key Laboratory of Marine Information Technology of Liaoning Province,Dalian Ocean University,Dalian 116023,China2
  • Online:2018-06-15 Published:2018-07-24

Abstract: The diversity measures of semantic vector are important base of natural language processing problem resolved by deep learning methods.There is a problem of “measurement concentration” in the diversity measure of high dimension semantic vector,which leads to the diversity of the semantic vectors disappear when the diversity are obtained by the traditional measure methods.To resolve this problem,a diversity measures method based on the asymmetric multi-valued feature Jaccard coefficient was proposed.From the statistical distribution of the dimension values of the high-dimensional semantic vector,the values of the partial dimensions are densely distributed in a certain range,which makes them impossible to contribute the diversity.Therefore,the contribution of different dimensions to the diversity is diffe-rent and has asymmetry.This method defines the importance function about the dimension value,selects the dimensions of the importance function value satisfying the threshold to participate in the diversity calculation and removes the dimensions that can not contribute the diversity,and then realizes the dimensionality reduction and alleviates the problem of “measurement concentration”.The experiments were respectively conducted on fishery data sets and public data sets.Different measures methods of the different dimension semantic vector were compared.Under the condition that the semantic nature is not markedly reduced,the diversity index of theproposed method is much higher than the current optimal measures method.

Key words: Asymmetric multi-valued feature, High-dimensional semantic vector, Jaccard coefficient, Measurement concentration, Measures method

CLC Number: 

  • TP183
[1]中文信息处理发展报告[EB/OL].[2017-4-11].http://www.cipsc.org.cn/download.php?file=cips2016.pdf.
[2]PACCANARO A,HINTO G E.Learning distributed representations of concepts using linear relational embedding[J].IEEE Transactions on Knowledge & Data Engineering,2001,13(2):232-244.
[3]BENGIO Y,SCHWENK H,SENÉCAL J,et al.Neural Probabilistic Language Models[J].Journal of Machine Learning Research,2001,3(6):1137-1155.
[4]FENG Y H,YU H,SUN G,et al.Domain-specific Terminology Recognition Method Based on Word Embedding and CRF[J].Journal of Computer Applications,2016,36(11):3146-3151.(in Chinese)
冯艳红,于红,孙庚,等.基于词向量和CRF的领域术语识别方法[J].计算机应用,2016,36(11):3146-3151.
[5]YAN J,LIU W F,LIN H F.Music Recommendation Study Based on Tags Multi-Space[J].Journal of Chinese Information Processing,2014,28(4):117-122.(in Chinese)
闫俊,刘文飞,林鸿飞.基于标签混合语义空间的音乐推荐方法研究[J].中文信息学报,2014,28(4):117-122.
[6]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].ComputerScien-ce,arXiv:1301.378lv3.
[7]MIKOLOV T,SUTSKEVER I,CEHN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
[8]BUHLMANN P,VAN DE GEER S.Statistics for High-Dimensional Data[M].Springer-Verlag Berlin Heidelberg,2011.
[9]BELLMAN R.Adaptive Control Process:A Guide Tour[M].Princeton University Press,Princeton,New Jersey,1961.
[10]FUKUNAGA K.Introduction to Statistical Pattern Recognition(2nd ed)[M].New York:Academicpress,1972.
[11]LEDOUX M.The concentration of measure phenomenon[J].Mathematical Surveys and Monographs,2001,89:94-124.
[12]HE L,CAI Y C,YANG Z.Researches on Similarity Measurement of High Dimensional Data[J].Computer Science,2010,37(5):155-156.(in Chinese)
贺玲,蔡益朝,杨征.高维数据的相似性度量研究[J].计算机科学,2010,37(5):155-156.
[13]HE J R,DING L X,HU Q H,et al.Properties of High-dimensional Data Space and Metric Choice[J].Computer Science,2014,41(3):212-217.(in Chinese)
何进荣,丁立新,胡庆辉,等.高维数据空间的性质及度量选择[J].计算机科学,2014,41(3):212-217.
[14]CHEN S G,ZHANG D Q.Experimental Comparisons of Semi-Supervised Dimensional Reduction Methods[J].Journal of Software,2011,22(1):28-43.(in Chinese)
陈诗国,张道强.半监督降维方法的实验比较[J].软件学报,2011,22(1):28-43.
[15]FENG L,LIU S L,ZHANG J,et al.Robust Activation Function of Extreme Learning Machine and Linea Dimensionality Reduction in High-Dimensional Data[J].Journal of Computer Research and Development,2014,51(6):1331-1340.(in Chinese)
冯林,刘胜蓝,张晶,等.高维数据中鲁棒激活函数的极端学习机及线性降维[J].计算机研究与发展,2014,51(6):1331-1340.
[16]LAI S W.Word and Document Embedding Based on Neural Network Approaches[D].Beijing:University of Chinese Academy of Sciences,2016:27-39.(in Chinese)
来斯惟.基于神经网络的词和文档语义向量表示方法研究[D].北京:中国科学院大学自动化研究所,2016:27-39.
[17]JACCARD P.Etude de la distribution florale dans une portion des Alpes et du Jura[J].Bulletin De La Societe Vaudoise Des Sciences Naturelles,1901,37(142):547-579.
[18]Jaccard index[EB/OL].[2017-4-29].https://en.wikipedia.org/wiki/Jaccard_index#cite_note-1.
[19]SAMANTHULA B K,JIANG W.Secure Multiset Intersection Cardinality and its Application to Jaccard Coefficient[J].IEEE Transactions on Dependable & Secure Computing,2016,13(5):1.
[20]CHENG Y,WANG S T.A Multiple Alternative Clusterings Mining Algorithm Using Locality Preserving Projections[J].CAAI Transactions on Intelligent Systems,2016,11(5):600-607.(in Chinese)
程旸,王士同.基于局部保留投影的多可选聚类发掘算法[J].智能系统学报,2016,11(5):600-607.
[21]LIAO B,ZHANG T,YU J,et al.Efficiency Optimization of Jaccard's Similarity Coefficient Based on Two Dimensional Partition[J].Computer Science,2017,44(1):219-225.(in Chinese)
廖彬,张陶,于炯,等.基于二维划分的杰卡德相似系数批量计算效率优化[J].计算机科学,2017,44(1):219-225.
[22]TANIMOTO T T.An Elementary Mathematical theory of Classification and Prediction[R].Internal IBM Technical Report,1957.
[23]ROGERS ,TANIMOTO D J,TAFFEE T.A Computer Program for Classifying Plants[J].Science,1960,132(3434):1115-1118.
[24] 潘迎捷.水产辞典[M].上海:上海辞书出版社,2007.
[25]搜狗全网新闻数据(SogouCA)[EB/OL].[2017-02-14].http://www.sogou.com/labs/dl/ca.html.
[1] WANG Guan-yu, ZHONG Ting, FENG Yu, ZHOU Fan. Collaborative Filtering Recommendation Method Based on Vector Quantization Coding [J]. Computer Science, 2022, 49(9): 48-54.
[2] HUANG Li, ZHU Yan, LI Chun-ping. Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(9): 76-82.
[3] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[4] SHUAI Jian-bo, WANG Jin-ce, HUANG Fei-hu, PENG Jian. Click-Through Rate Prediction Model Based on Neural Architecture Search [J]. Computer Science, 2022, 49(7): 10-17.
[5] DU Hang-yuan, LI Duo, WANG Wen-jian. Method for Abnormal Users Detection Oriented to E-commerce Network [J]. Computer Science, 2022, 49(7): 170-178.
[6] TANG Feng, FENG Xiang, YU Hui-qun. Multi-task Cooperative Optimization Algorithm Based on Adaptive Knowledge Transfer andResource Allocation [J]. Computer Science, 2022, 49(7): 254-262.
[7] CAI Xin-yu, FENG Xiang, YU Hui-qun. Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes [J]. Computer Science, 2022, 49(6): 134-141.
[8] PU Qian-qian, LEI Hang, LI Zhen-hao, LI Xiao-yu. Personalized News Recommendation Algorithm with Enhanced List Information and User Interests [J]. Computer Science, 2022, 49(6): 142-148.
[9] XIONG Zhong-min, SHU Gui-wen, GUO Huai-yu. Graph Neural Network Recommendation Model Integrating User Preferences [J]. Computer Science, 2022, 49(6): 165-171.
[10] DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[11] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[12] LIU Bao-bao, YANG Jing-jing, TAO Lu, WANG He-ying. Study on Prediction of Educational Statistical Data Based on DE-LSTM Model [J]. Computer Science, 2022, 49(6A): 261-266.
[13] ZHOU Zhi-hao, CHEN Lei, WU Xiang, QIU Dong-liang, LIANG Guang-sheng, ZENG Fan-qiao. SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm [J]. Computer Science, 2022, 49(6A): 562-570.
[14] WANG Jian. Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving [J]. Computer Science, 2022, 49(6A): 575-580.
[15] WANG Shan, XU Chu-yi, SHI Chun-xiang, ZHANG Ying. Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM [J]. Computer Science, 2022, 49(6A): 675-679.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!