计算机科学 ›› 2012, Vol. 39 ›› Issue (6): 147-150.

• 数据库与数据挖掘 • 上一篇    下一篇

ESSK:一种计算点击流相似度的新方法

刘嘉,祁奇,陈振宇,惠成峰   

  1. (计算机软件新技术国家重点实验室 南京 210093) (南京大学软件学院 南京 210093)
  • 出版日期:2018-11-16 发布日期:2018-11-16

ESSK; A New Approach to Compute Clickstream Similarity

  • Online:2018-11-16 Published:2018-11-16

摘要: 用户点击流信息被广泛应用于Web使用信息挖掘中。点击流相似度常用于用户会话分类和聚类。SSK(String Subscqucncc Kcrncl)最初被用于计算字符串相似度,后被引入计算点击流相似度,并成为目前常用方法之一。SSK选择两个字符串所有长度为k的子序列生成特征空间。单一k的选择往往存在特征数不足的问题,从而难以获得足够精确的点击流相似度。因此,提出一种新的点击流相似度计算方法ESSK(Extcndcd String Subscqucncc Kernel) 。ESSK采用所有子序列生成特征空间以解决SSK存在的问题。同时提出一种高效计算ESSK的算法,以降低计算复杂度。实验表明,ESSK比SSK更精确,比其它方法具有更高的区分度,因此更适合点击流相似度分析和应用。

关键词: 点击流相似度,算法设计,计算复杂度

Abstract: Clickstream is widely used in Web usage mining. Clickstream similarity is usually used to classify or cluster Web user sessions. SSK(string subsequence kernel) is an approach for computing string similarity originally. Then it is introduced to compute chckstream similarity and becomes one of the most popular methods. It selects all subsequences of length k of two strings to generate the feature space. A single value of k may cause a problem that the number of features is not enough to get an accurate clickstrcam similarity. So, a new approach to compute clickstream similarity ESSK (extended string subsequcnce kernel) was proposed. ESSK generates the feature space by all subsequences to solve the problem of SSK. To reduce the complexity of computation, an effective algorithm to compute ESSK was proposed. An experiment indicates that ESSK is more accurate than SSK and has a higher discrimination than other approaches. So it is more suitable to compute clickstrcam similarity.

Key words: Clickstream similarity, Design of algorithm, Computation complexity

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!