Computer Science ›› 2012, Vol. 39 ›› Issue (4): 177-180.
Previous Articles Next Articles
Online:
Published:
Abstract: Automatic Decp Web classification is the basis of building Decp Web data intergration system. An approach was proposed to classify the Deep Web based on domain feature text. Using the ontology knowledge, the concepts which express the same semantics were firstly extracted from different texts. Then the definition of domain correlation was given as the quantitative criteria for feature text selection, in order to avoid the subjectivity and uncertainty of manual selection. In the process of the interface vector space model construction, an improved weighting method namedw I}FIDF was proposed to evaluate the different roles of feature text. At last, a KNN algorithm was used to classify these interface vectors. Comparative experiments indicate that the feature text selected by our method is accurate and effec- tive, and the new weighting method can improve the classification precision significantly and shows good stability in KNN classification.
Key words: Fcaturc tcxt, Domain classification, Vcctor space model, Dccp Web
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jsjkx.com/EN/
https://www.jsjkx.com/EN/Y2012/V39/I4/177
Cited