计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 317-324.doi: 10.11896/jsjkx.220600068

• 信息安全 • 上一篇    下一篇

基于改进Self-paced Ensemble算法的浏览器指纹识别

张德升1, 陈博2, 张建辉2, 卜佑军2, 孙重鑫2, 孙嘉1   

  1. 1 郑州大学网络空间安全学院 郑州 450000
    2 中国人民解放军战略支援部队信息工程大学信息技术研究所 郑州 450000
  • 收稿日期:2022-06-07 修回日期:2022-10-14 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 张建辉(ndsczjh@163.com)
  • 作者简介:(835225140@qq.com)
  • 基金资助:
    国家自然科学基金(62176264)

Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm

ZHANG Desheng1, CHEN Bo2, ZHANG Jianhui2, BU Youjun2, SUN Chongxin2, SUN Jia1   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou,450000,China
    2 Information Technology Institute,PLA Strategic Support Force Information Engineering University,Zhengzhou 450000,China
  • Received:2022-06-07 Revised:2022-10-14 Online:2023-07-15 Published:2023-07-05
  • About author:ZHANG Desheng,born in 1997,postgraduate.His main research interests include cyberspace security and so on.ZHANG Jianhui,born in 1977,Ph.D,associate researcher,master supervisor.His main research interests include new network architecture,network routing technology,network data analysis and security control.
  • Supported by:
    National Natural Science Foundation of China(62176264).

摘要: 浏览器指纹技术凭借其无状态、跨域一致等优点,已经被许多网站应用到用户追踪、广告投放和安全验证等方面。浏览器指纹识别的过程是典型的不平衡数据的分类过程。针对当前浏览器指纹长期追踪过程中存在数据样本类不平衡导致指纹识别准确度低、长期追踪易失效等问题,提出了改进的Self-paced Ensemble(Improved SPE,ISPE)方法应用于浏览器指纹识别。对浏览器指纹样本欠采样过程和集成学习单个分类器的训练过程进行了改进,重点针对难以识别的浏览器指纹,添加类注意力机制并优化自协调因子,使分类器在训练和识别浏览器指纹的过程中更加注重边界样本的分类效果,从而提升总体的浏览器指纹识别准确度。在所收集的3 483条指纹和开源数据集中的15 000条指纹上进行了实验,结果表明,ISPE算法在浏览器指纹匹配识别的F1-score达到95.6%,相比Bi-RNN算法提高了16.8%。

关键词: 浏览器指纹, 用户追踪, Self-paced Ensemble, 欠采样, 集成学习

Abstract: Browser fingerprinting technology has been used by many websites for user tracking,advertising delivery and security verification due to its stateless,cross-domain consistency and other advantages.The task of browser fingerprint recognition is a typical classification task of imbalanced data.The data imbalance exists in browser fingerprint long-term tracking task,which will lead to low accuracy of fingerprint recognition and failure of long-term tracking.An improved Self-paced Ensemble(ISPE) method is proposed to identify browser fingerprints.And the undersampling process of browser fingerprint sample and the training process of single classifier in ensemble learning are improved.Focusing on the browser fingerprint which is difficult to identify,added attention-like mechanism and self-paced factor are optimized to make the classifier pay more attention to the boundary samples which are difficult to classify in the training process,to improve the overall accuracy of browser fingerprint recognition.The results show that the F1-score of ISPE algorithm for browser fingerprint recognition reaches 95.6%,which is 16.8% higher than that of Bi-RNN algorithm.It proves that the method has excellent performance for long-term browser fingerprint tracking.

Key words: Browser fingerprinting, User tracking, Self-paced Ensemble, Undersampling, Ensemble learning

中图分类号: 

  • TP393
[1]Cookie Policy - Intellias[EB/OL].[2021-12-28].https://intellias.com/cookie-policy/.
[2]Cookies:An overview of associated privacy and security risks-Infosec Resources[EB/OL].[2021-12-28].https://resources.infosecinstitute.com/topic/cookies-an-overview-of-associated-privacy-and-security-risks/.
[3]YEN T F,XIE Y,YU F,et al.Host Fingerprinting and Tra-cking on the Web:Privacy and Security Implications[C]//19th Annual Network and Distributed System Security Symposium,NDSS 2012.San Diego,California,USA,2012.
[4]ECKERSLEY P.How Unique Is Your Web Browser?[C]//Proceedings of the 10th International Conference on Privacy Enhancing Technologie.Berlin,Germany,2010:1-18.
[5]TRICKEL E,STAROV O,KAPRAVELOS A,et al.Everyone isDifferent:Client-side Diversification for Defending Against Extension Fingerprinting[C]//28th USENIX Security Symposium(USENIX Security 19).Santa Clara,CA:USENIX Association,2019:1679-1696.
[6]WU S,LI S,CAO Y,et al.Rendered Private:Making GLSLExecution Uniform to Prevent WebGL-based Browser Fingerprin-ting[C]//28th USENIX Security Symposium(USENIX Security 19).Santa Clara,CA:USENIX Association,2019:1645-1660.
[7]CAO Y,LI S,WIJMANS E.(Cross-)Browser Fingerprinting via OS and Hardware Level Features[C]//24th Annual Network and Distributed System Security Symposium,NDSS 2017.San Diego,California,USA,2017.
[8]TAO X M,HAO S Y,ZHANG D X,et al.A Review of Imba-lanced Data Classification Algorithms[J].Journal of Chongqing University of Posts and Telecommunications:Natural Science Edition,2013,25:1-11.
[9]LIU Z,CAO W,GAO Z,et al.Self-paced Ensemble for Highly Imbalanced Massive Data Classification[C]//36th IEEE International Conference on Data Engineering(ICDE 2020).Dallas,TX,USA:IEEE,2020:841-852.
[10]MUFIOZ-GARCIA Ó,MONTERRUBIO-MARTIN J,GAR-CIA-AUBERT D.Detecting browser fingerprint evolution for identifying unique users[J].International Journal of Electronic Business,2012,10(2):120-141.
[11]YAMADA T,SAITO T,TAKASU K,et al.Robust Identification of Browser Fingerprint Comparison Using Edit Distance[C]//10th International Conference on Broadband and Wireless Computing,Communication and Applications,BWCCA 2015.Krakow,Poland:IEEE Computer Society,2015:107-113.
[12]VASTEL A,LAPERDRIX P,RUDAMETKIN W,et al.FP-STALKER:Tracking Browser Fingerprint Evolutions[C]//2018 IEEE Symposium on Security and Privacy.San Francisco,California,USA:IEEE Computer Society,2018:728-741.
[13]LI X,CUI X,SHI L,et al.Constructing Browser Fingerprint Tracking Chain Based on LSTM Model[C]//Third IEEE International Conference on Data Science in Cyberspace(DSC 2018).Guangzhou,China:IEEE,2018:213-218.
[14]LIU Q X,LIU X Y,LUO C,et al.Android Browser Fingerprin-ting Method Based on Bidirectional Recurrent Neural Network [J].Journal of Computer Research and Development,2020,57:2294.
[15]NAKIBLY G,SHELEF G,YUDILEVICH S.Hardware Fingerprinting Using HTML5[J].arXiv:1503.01408,2015.
[16]MOWERY K,SHACHAM H.Pixel perfect:Fingerprinting canvas in HTML5[C]//Proceedings of W2SP.2012:1-12.
[17]LAPERDRIX P,RUDAMETKIN W,BAUDRY B.Beauty and the Beast:Diverting Modern Web Browsers to Build Unique Browser Fingerprints[C]//2016 IEEE Symposium on Security and Privacy(SP).2016:878-894.
[18]GitHub-fingerprintjs/fingerprintjs:Browser fingerprinting libr-ary with the highest accuracy and stability[EB/OL].[2021-12-29].https://github.com/fingerprintjs/fingerprintjs.
[19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[J/OL].Advances in Neural Information Proces-sing Systems,2017,2017:5999-6009.https://arxiv.org/abs/1706.03762v5.
[20]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[21]KARAKOULAS G,SHAWE-TAYLOR J.Optimizing classifers for imbalanced training sets[C]//Advances in Neural Information Processing Systems.1998.
[22]CHAWLA N V,LAZAREVIC A,HALL L O,et al.SMOTE-Boost:Improving Prediction of the Minority Class in Boosting[C]//Knowledge Discovery in Databases:PKDD 2003,7th European Conference on Principles and Practice of Knowledge Discovery in Databases.Cavtat-Dubrovnik,Croatia:Springer,2003:107-119.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!