Diversity Measures Method in High-dimensional Semantic Vector Based on Asymmetric Multi-valued Feature Jaccard Coefficient

FENG Yan-hong1,2, YU Hong1,2, SUN Geng1,2, PENG Song1   

  1. College of Information Engineering,Dalian Ocean University,Dalian 116023,China1;
    Key Laboratory of Marine Information Technology of Liaoning Province,Dalian Ocean University,Dalian 116023,China2
  • Online:2018-06-15 Published:2018-07-24

Abstract: The diversity measures of semantic vector are important base of natural language processing problem resolved by deep learning methods.There is a problem of “measurement concentration” in the diversity measure of high dimension semantic vector,which leads to the diversity of the semantic vectors disappear when the diversity are obtained by the traditional measure methods.To resolve this problem,a diversity measures method based on the asymmetric multi-valued feature Jaccard coefficient was proposed.From the statistical distribution of the dimension values of the high-dimensional semantic vector,the values of the partial dimensions are densely distributed in a certain range,which makes them impossible to contribute the diversity.Therefore,the contribution of different dimensions to the diversity is diffe-rent and has asymmetry.This method defines the importance function about the dimension value,selects the dimensions of the importance function value satisfying the threshold to participate in the diversity calculation and removes the dimensions that can not contribute the diversity,and then realizes the dimensionality reduction and alleviates the problem of “measurement concentration”.The experiments were respectively conducted on fishery data sets and public data sets.Different measures methods of the different dimension semantic vector were compared.Under the condition that the semantic nature is not markedly reduced,the diversity index of theproposed method is much higher than the current optimal measures method.

Key words: Asymmetric multi-valued feature, High-dimensional semantic vector, Jaccard coefficient, Measurement concentration, Measures method

  • TP183
