计算机科学 ›› 2014, Vol. 41 ›› Issue (3): 110-115.

• 网络与信息安全 • 上一篇    下一篇

基于网页浏览日志的用户行为分析

郭俊霞,高城,许南山,卢罡   

  1. 北京化工大学信息科学与技术学院 北京100029;北京化工大学信息科学与技术学院 北京100029;北京化工大学信息科学与技术学院 北京100029;北京化工大学信息科学与技术学院 北京100029
  • 出版日期:2018-11-14 发布日期:2018-11-14

User Behavior Analysis Based on Web Browsing Logs

GUO Jun-xia,GAO Cheng,XU Nan-shan and LU Gang   

  • Online:2018-11-14 Published:2018-11-14

摘要: 随着问答社区信息的长期积累,越来越多的过时信息充斥在其中并被搜索引擎检索,给信息需求者带来不便。用户的网页浏览日志中隐性地包含用户的行为习惯,通过分析得到这些信息对判断网页信息时效性有着重要意义。文中提出针对网页浏览日志的查询过程划分方法,并在划分的基础之上对大量真实用户的浏览行为习惯做了统计分析。结果显示,用户查询一次信息平均浏览8.05个页面,用时6.28分钟,有将近1/3的查询在交替并发中进行,另外用户对于网站站内搜索的依赖较高。从浏览日志数据集中选取了一个社区网站的浏览记录来进行初步的网页信息时效性分析,结果表明造成用户不满意的原因主要是查询相关度不高,而过时信息只是其中一小部分。

关键词: 网页浏览日志,用户行为分析,网页时效性,问答社区 中图法分类号TP391.1文献标识码A

Abstract: With the long-term accumulation of the Q & A community information,there is more and more outdated information indexed by search engines, bringing inconvenience to users.The log of a user’s browsing-behaviors contains the user’s behavioral intentions and habits,which can help analyze timeliness of the information.This paper proposed a query-process-division method for users’ browsing logs.Based on this method,a large number of real users’ browsing historical records were statistically analyzed.The results show that in average,a user browses 8.05Web pages in 6.28minutes for one query.In addition,nearly 1/3of total queries carry out concurrently and alternately.It is also found that users rely on inner-site searching more.By analyzing the browsing historical records of a community site,we found that the users are not satisfied with the query results mainly because of the non-high-related results.Out-of-date information is only a small part in the query results.

Key words: Web browsing logs,User behavior analysis,Web page timeliness,CQA

[1] Silverstein C,Marais H,Henzinger M,et al.Analysis of a very large web search engine query log[C]∥ACM SIGIR Forum.1999,3:6-12
[2] He D,Gker A.Detecting session boundaries from web user logs[C]∥Proceedings of the BCS-IRSG 22nd annual colloquium on information retrieval research.2000:57-66
[3] Radlinski F,Joachims T.Query chains:learning to rank fromimplicit feedback[C]∥Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining.2005:239-248
[4] Jansen B J,Spink A.How are we searching the World Wide Web? A comparison of nine search engine transaction logs[J].Information Processing & Management,2006,42(1):248-263
[5] Spink A,Park M,Jansen B J,et al.Multitasking during Websearch sessions[J].Information Processing & Management,2006,42(1):264-275
[6] Lau T,Horvitz E.Patterns of search:analyzing and modeling Web query refinement[C]∥Proceeding UM’99-Proceeding of the Seventh International Conference on User Modeling.Springer-Verlag New York,1999:119-128
[7] He D,Gker A,Harper D J.Combining evidence for automatic web session identification[J].Information Processing & Mana-gement,2002,38(5):727-742
[8] Ozmutlu H C,Cavdur F.Application of automatic topic identification on excite web search engine data logs[J].Information Processing & Management,2005,41(5):1243-1262
[9] Shen X,Tan B,Zhai C.Implicit user modeling for personalizedsearch[C]∥Proceedings of the 14th ACM International Confe-rence on Information and Knowledge Management.2005:824-831
[10] Jones R,Klinkner K L.Beyond the session timeout:automatic hierarchical segmentation of search topics in query logs[C]∥CIKM’08.2008:699-708
[11] 张磊,李亚楠,王斌,等.网页搜索引擎查询日志的 Session 划分研究[J].中文信息学报,2009,3(2):54-61
[12] Lucchese C,Orlando S,Perego R,et al.Identifying task-based sessions in search engine query logs[C]∥Proceedings of the Fourth ACM International Conference on Web Search and Data Mining.2011:277-286
[13] Lui Y,Agichtein E.On the Evolution of the Yahoo! Answers QA Community[C]∥the ACM SIGIR International Conference on Research and Development in Information Retrieval.Singapore,2008:737-738
[14] Nam K K,Ackerman M S.Question in,Knowledge in?:a study of naver’s question answering community[C]∥Proceedings of CHI’09.Boston,MA,2009:779-788
[15] Rodrigues E M,Frayling N M.Socializing or knowledge sha-ring?:characterizing social intent in community question answering[C]∥Proceedings of CIKM 2009.Hong Kong,China,2009:1127-1136
[16] Liu Q L,Agichtein E,Dror G,et al.Predicting web searcher satisfaction with existing community-based answers[C]∥Procee-dings of SIGIR’11.Beijing,China,2011
[17] Jiang D,Pei J,Li H.Mining Search and Browse Logs for Web Search:A Survey[J].ACM Transactions on Computational Logic,2013,4(4):1-42
[18] Wikipedia.Uniform resource locator[EB/OL].http://en.wikipedia.org/wiki/Uniform_resource_locator
[19] Hassan A,Jones R,Klinkner K L.Beyond DCG:User behavior as a predictor of a successful search[C]∥Proceedings of the third ACM international conference on Web search and data mining.2010:221-230
[20] Liu Y,Bian J,Agichtein E.Predicting information seeker satisfaction in community question answering[C]∥Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2008:483-490

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!