摘要: 随着问答社区信息的长期积累,越来越多的过时信息充斥在其中并被搜索引擎检索,给信息需求者带来不便。用户的网页浏览日志中隐性地包含用户的行为习惯,通过分析得到这些信息对判断网页信息时效性有着重要意义。文中提出针对网页浏览日志的查询过程划分方法,并在划分的基础之上对大量真实用户的浏览行为习惯做了统计分析。结果显示,用户查询一次信息平均浏览8.05个页面,用时6.28分钟,有将近1/3的查询在交替并发中进行,另外用户对于网站站内搜索的依赖较高。从浏览日志数据集中选取了一个社区网站的浏览记录来进行初步的网页信息时效性分析,结果表明造成用户不满意的原因主要是查询相关度不高,而过时信息只是其中一小部分。
[1] Silverstein C,Marais H,Henzinger M,et al.Analysis of a very large web search engine query log[C]∥ACM SIGIR Forum.1999,3:6-12 [2] He D,Gker A.Detecting session boundaries from web user logs[C]∥Proceedings of the BCS-IRSG 22nd annual colloquium on information retrieval research.2000:57-66 [3] Radlinski F,Joachims T.Query chains:learning to rank fromimplicit feedback[C]∥Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining.2005:239-248 [4] Jansen B J,Spink A.How are we searching the World Wide Web? A comparison of nine search engine transaction logs[J].Information Processing & Management,2006,42(1):248-263 [5] Spink A,Park M,Jansen B J,et al.Multitasking during Websearch sessions[J].Information Processing & Management,2006,42(1):264-275 [6] Lau T,Horvitz E.Patterns of search:analyzing and modeling Web query refinement[C]∥Proceeding UM’99-Proceeding of the Seventh International Conference on User Modeling.Springer-Verlag New York,1999:119-128 [7] He D,Gker A,Harper D J.Combining evidence for automatic web session identification[J].Information Processing & Mana-gement,2002,38(5):727-742 [8] Ozmutlu H C,Cavdur F.Application of automatic topic identification on excite web search engine data logs[J].Information Processing & Management,2005,41(5):1243-1262 [9] Shen X,Tan B,Zhai C.Implicit user modeling for personalizedsearch[C]∥Proceedings of the 14th ACM International Confe-rence on Information and Knowledge Management.2005:824-831 [10] Jones R,Klinkner K L.Beyond the session timeout:automatic hierarchical segmentation of search topics in query logs[C]∥CIKM’08.2008:699-708 [11] 张磊,李亚楠,王斌,等.网页搜索引擎查询日志的 Session 划分研究[J].中文信息学报,2009,3(2):54-61 [12] Lucchese C,Orlando S,Perego R,et al.Identifying task-based sessions in search engine query logs[C]∥Proceedings of the Fourth ACM International Conference on Web Search and Data Mining.2011:277-286 [13] Lui Y,Agichtein E.On the Evolution of the Yahoo! Answers QA Community[C]∥the ACM SIGIR International Conference on Research and Development in Information Retrieval.Singapore,2008:737-738 [14] Nam K K,Ackerman M S.Question in,Knowledge in?:a study of naver’s question answering community[C]∥Proceedings of CHI’09.Boston,MA,2009:779-788 [15] Rodrigues E M,Frayling N M.Socializing or knowledge sha-ring?:characterizing social intent in community question answering[C]∥Proceedings of CIKM 2009.Hong Kong,China,2009:1127-1136 [16] Liu Q L,Agichtein E,Dror G,et al.Predicting web searcher satisfaction with existing community-based answers[C]∥Procee-dings of SIGIR’11.Beijing,China,2011 [17] Jiang D,Pei J,Li H.Mining Search and Browse Logs for Web Search:A Survey[J].ACM Transactions on Computational Logic,2013,4(4):1-42 [18] Wikipedia.Uniform resource locator[EB/OL].http://en.wikipedia.org/wiki/Uniform_resource_locator [19] Hassan A,Jones R,Klinkner K L.Beyond DCG:User behavior as a predictor of a successful search[C]∥Proceedings of the third ACM international conference on Web search and data mining.2010:221-230 [20] Liu Y,Bian J,Agichtein E.Predicting information seeker satisfaction in community question answering[C]∥Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2008:483-490 |
No related articles found! |
|