Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250600030-11.doi: 10.11896/jsjkx.250600030

• Information Security • Previous Articles     Next Articles

Web Application Fingerprinting Method Based on Multi-level SimHash and Digital FeatureSnapshots

GU Xianjun1, HUANG Mengqi1, LIU Ming2, HAN Fuji4, TIAN Cong1 , ZHU Dongjun2,3   

  1. 1 Wuhan Power Supply Company of State Grid Hubei Electric Power Co.,Ltd.,Wuhan 430010,China
    2 JinYinHu Laboratory,Wuhan 430040,China
    3 School of Cyberspace Security,Huazhong University of Science and Technology,Wuhan 430040,China
    4 College of Control Science and Engineering,Zhejiang University,Hangzhou 310058,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:GU Xianjun,born in 1981,master,se-nior engineer.His main research in-terests include cybersecurity,digital system construction and artificial intelligence.
    ZHU Dongjun,born in 1987,doctoral engineer.His main research interests include penetration testing,big data security and cyber space mapping.
  • Supported by:
    Attack Detection and Dynamic Security Failure Analysis Instrument for Petrochemical(62127808),Research on Collaborative Malicious Behavior Recognition Based on General Configuration Data in Cyberspace(62172176) and National Key R&D Program of China(2022YFB3103400).

Abstract: Web application fingerprinting is a fundamental technique in cyberspace mapping,Web vulnerability exploitation,and cybersecurity situational awareness.Existing mainstream approaches primarily rely on manually crafted text-based rules and regular expression matching to identify Web applications.However,these methods face several limitations,such as the difficulty of rule extraction,challenges in distinguishing between similar sub-versions,and susceptibility to failure when page content changes.To address these issues,this paper proposes a Web application fingerprinting model based on a multi-level SimHash algorithm and digital feature snapshots.The method extracts representative page content that reflects core characteristics of Web applications,and maps it into high-dimensional digital fingerprints using multiple SimHash calculations to form computable and comparable feature snapshots.On this basis,a general fingerprinting model is constructed,with systematic definitions of its structure,key algorithms,and parameters.Furthermore,a practical implementation of the model is developed using HTML page content,and a series of experiments are conducted on various mainstream Web applications.Experimental results demonstrate that the proposed method outperforms traditional rule-based approaches in recognition accuracy,supports automatic fingerprint generation and sub-version identification,and exhibits robustness to page modifications to a certain extent.

Key words: Digital feature snapshot, Web application, Fingerprint recognition, Subversion recognition, Multi-level SimHash

CLC Number: 

  • TP393
[1] UPATHILAKE R,LI Y K,MATRAWY A.A classification of web browser fingerprinting techniques[C]//2015 7th International Conference on New Technologies,Mobility and Security(NTMS).IEEE,2015.
[2] WANGEN G B.Information Security Risk Assessment:AMethod Comparison[J].Computer,2017,50(4):52-61.
[3] BUI S,SHRIVASTAVA M.A case study of testing a web-based application using an open-source testing tool[J].Journal of Information Technology Management,2015,XXVI:19-30.
[4] JIN Y F,XIA B S.Research on Process of Web Security Penetration Testing[J].Network Security Technology & Application,2021(12):5-6.
[5] WANG C D,GUO Y B,ZHEN S H et al.Research on Network Asset DetectionTechnology[J].Computer Science,2018,45(12):24-31.
[6] KUMAR G.An improved ensemble approach for effective intrusion detection[J].The Journal of Supercomputing,2020,76(1):275-291.
[7] PAPAMARTZIVANOS D,GÓMEZ MÁRMOL F,KAMBOU-RAKIS G.Dendron:Genetic trees driven rule induction for network intrusion detection systems[J].Future Generation Computer Systems,2018,79:558-574.
[8] ZHOU S F.Research on Web Fingerprint Identification[D].Chongqing:Chongqing University of Posts and Telecommunications,2020.
[9] CAI D.Research on Network Security Level Protection Technology for Electronic Information Engineering[J].Cybersecurity &Informatization,2025(4):133-135.
[10] XIAO X,ZHOU X,YANG Z Y,et al.A comprehensive analysis of website fingerprinting defenses on Tor[J].Computers & Security,2024(136):103577.
[11] WANG W W.Reasearch on Web Recognition and ConfusionTechnology Based on Website Fingerprint[D].Wuhan:Wuhan University,2017.
[12] SHI Y M,YU W,ZHAO Y X.A Web Application Fingerprint Recognition Method Based on Machine Learning[J].Computer Modeling in Engineering & Sciences,2024,140(1):887-906.
[13] PATHIRAGE G S,MANATHUNGA K.Machine Learning and Browser Fingerprinting Based Approach for Web Bot Detection[C]//2024 6th International Conference on Advancements in Computing(ICAC).IEEE,2024.
[14] SADOWSKI C,GREG L.Simhash:Hash-based similarity de-tection[J/OL].https://www.webrankinfo.com/dossiers/wp-content/uploads/simhash.pdf.
[15] YAN S J,WANG W J,ZHANG Y Q.An effective web fingerprinting method[J].Journal of University of Chinese Academy of Sciences,2016,33(5):679-685.
[16] CAO L C,ZHAO J J,CUI X et al.Cyberspace device identification based on K-means with cosine distance measure[J].Journal of University of Chinese Academy of Sciences,2016,33(4):562-569.
[17] ZHAO D M,LI H,CUI X,et al.Approach to network security situational element extraction based on parallel reduction[J].Journal of Computer Applications,2017,37(4):1008-1013.
[18] TANG W L,TANG S F,ZHANG P.Research and improvement of Web fingerprint recognitionalgorithm based on cosine measurement[J].Computer Science,2019,46(10):295-298.
[19] ZHANG L H.Research on fingerprint generation methodof Internet of Things device based on web information[J].Modern Computer,2021(15):94-99,107.
[20] HONG X S,LI S Y,MA X K,et al.A website fingerprintingtechnology with time-sampling[J].Peer-to-Peer Networking and Applications,2024,17(2):944-960.
[21] TAN X B,PENG C,XIE P,et al.Inter-flow spatio-temporal correlation analysis based website fingerprinting using graph neural network[J].IEEE Transactions on Information Forensics and Security,2024,19:7169-7632.
[22] HUI Z H,ZHAI J T,WANG S Z,et al.A New Website Fingerprinting Method for Tor Hidden Service[J].IEEE Access,2024,13:8886-8897.
[23] KARTHIK R,RAGHAVENDRA K .W3-Scrape-A Windowsbased Reconnaissance Tool for Web Application Fingerprinting[C]//Proceedings of ICECIT-2012.Elsevier,2012:8-13.
[24] CHARIKAR M S.Similarity Estimation Techniques from Rounding Algorithms[C]//Proceedings of the thiry-fourth annual ACM symposium on Theory of computing(STOC '02).Canada,Montreal,2002:380-388.
[25] AO Z M.Research on the Improvement of Search AlgorithmBased on Web Page Similarity[D].Shanghai:Shanghai Normal University,2015.
[26] HAN F J,ZHU D J.Intelligent Recognition Method of Web Application Categories Based on Multi-Layer Simhash Algorithm[C]//2022 IEEE International Conference on Trust,Security and Privacy in Computing and Communications.2022.
[27] LLOYD S.Least squares quantization in PCM[J].IEEE Transactions on Information Theory,1982,28(2):129-137.
[28] ESTER M,KRIEGEL H P,XU X.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining(KDD'96).1996:226-231.
[1] SU Chaoran, ZHANG Dalong, HUANG Yong, DONG An. RF Fingerprint Recognition Based on SE Attention Multi-source Domain Adversarial Network [J]. Computer Science, 2025, 52(1): 412-419.
[2] WANG Chenzhuo, LU Yanrong, SHEN Jian. Study on Fingerprint Recognition Algorithm for Fairness in Federated Learning [J]. Computer Science, 2024, 51(6A): 230800043-9.
[3] LIU Ziwen, YU Lijuan, SU Yixing, ZHAO Yao, SHI Zhu. Test Case Generation Based on Web Application Front-end Behavior Model [J]. Computer Science, 2023, 50(7): 18-26.
[4] MA Qican, WU Zehui, WANG Yunchao, WANG Xinlei. Approach of Web Application Access Control Vulnerability Detection Based on State Deviation Analysis [J]. Computer Science, 2023, 50(2): 346-352.
[5] LI Zi-dong, YAO Yi-fei, WANG Wei-wei, ZHAO Rui-lian. Web Application Page Element Recognition and Visual Script Generation Based on Machine Vision [J]. Computer Science, 2022, 49(11): 65-75.
[6] GUO Jun-xia, GUO Ren-fei, XU Nan-shan and ZHAO Rui-lian. Study on Construction of EFSM Model for Web Application Based on Session [J]. Computer Science, 2018, 45(4): 203-207.
[7] HE Tao,MIAO Huai-kou and QIAN Zhong-sheng. Modeling and Test Case Generation for Ajax-based WA [J]. Computer Science, 2014, 41(8): 219-223.
[8] ZHENG Di-wen,SHEN Li-wei,PENG Xin and ZHAO Wen-yun. Component Composition Technology and Tool Based on AJAX for Web Application [J]. Computer Science, 2014, 41(11): 152-156.
[9] FANG Yi-meng,MA Yun,LIU Xuan-zhe and HUANG Gang. MobiTran:A Technique of Transforming PC Web Application for Smart Phones [J]. Computer Science, 2014, 41(11): 74-78.
[10] LIN Jie. Use Combination of Detection Systems to Reduce Errors of Judgment on Malicious Request [J]. Computer Science, 2013, 40(Z6): 344-348.
[11] LIU Yong-po,WU Ji and LIU Shuang-mei. Research of Generic Codec for Web Application Testing [J]. Computer Science, 2013, 40(8): 157-160.
[12] . Fingerprint Orientation Estimation Based on Morphological Operation [J]. Computer Science, 2012, 39(11): 246-248.
[13] . Ridge Based 3D Fingerprint Reconstruction Method [J]. Computer Science, 2012, 39(10): 282-285.
[14] LU Xiao-li,DONG Yun-wei,ZHAO Hong-bin. Object-oriented Web Application Testing Model [J]. Computer Science, 2010, 37(7): 134-136.
[15] PENG Shu-shen,GU Qing,CHEN Dao-xu. Study of Test Case Generation for Web Applications [J]. Computer Science, 2010, 37(6): 159-163.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!