Computer Science ›› 2025, Vol. 52 ›› Issue (8): 1-16.doi: 10.11896/jsjkx.250300156

• Discipline Frontier • Previous Articles     Next Articles

Privacy Policy Compliance Detection Method for Mobile Application Based on Large LanguageModel

WANG Limei1,2, HAN Linrui1,2, DU Zuwei1,2, ZHENG Ri1,2, SHI Jianzhong1,2, LIU Yiqun3   

  1. 1 Ministry of Education Laboratory of Philosophy and Social Sciences-The CUPL Data Law Lab,China University of Political Science and Law,Beijing 100088,China
    2 The Institute for Data Law,China University of Political Science and Law,Beijing 100088,China
    3 Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
  • Received:2025-03-28 Revised:2025-05-17 Online:2025-08-15 Published:2025-08-08
  • About author:WANG Limei,born in 1974,Ph.D,professor,Ph.D supervisor. Her main research interests include data law,cyber and information law.
  • Supported by:
    2022 National Key R&D Program “Social Governance and Smart Society Technology Support” Key Special Project(2022YFC3303000).

Abstract: Privacy policies serve as self-regulatory commitments by online service providers to legitimize the collection and utilization of personal information,aiming to enhance user trust and provide users with greater control over data processing.However,they face practical challenges including excessive length,technical jargon proliferation,and ambiguities in legal compliance.Traditional approaches rely on classification models that detect compliance through annotated policy texts.However,these methods suffer from oversimplified evaluation metrics,high annotation costs,and limited detection accuracy.This paper proposes a large language model(LLM)-based framework for mobile App privacy policy compliance detection,structured around three pillars:(1)establishing a multi-tier compliance evaluation system,(2)designing a hierarchical reasoning framework enhanced by Dynamic Optimal Trajectory Search(DOTS),and(3)implementing automated compliance verification.Firstly,this paper constructs a compliance evaluation system comprising 6 first-level,14 second-level,and 41 third-level indicators,grounded in nine legal frameworks including China's Civil Code and Personal Information Protection Law.Secondly,it develops the Dynamic Tri-Stage Hierarchical Compliance Evaluator(DOTS-THCE),a three-phase reasoning framework that enables few-shot prompting to guide LLMs in conducting multi-level dynamic assessments of privacy policies.Finally,it implements automated detection on the PPC-Bench dataset containing 4 821 privacy policies across 10 application categories collected from Tencent's “MyApp” store.Experimental results demonstrate that the Qwen2.5-7B-Instruct model augmented with DOTS-THCE outperforms baseline models(Deepseek-LLM-7B-Chat,Llama3.1-8B-Chinese-Chat,and GLM-4-9B-Chat) by a significant margin.The Qwen2.5-7B-Instruct@DOTS-THCE configuration achieves a macro-F1 score of 89.30%,surpassing traditional models including SVM,CNN,RNN,BERT,and Qwen2.5-7B-Instruct@RAG in terms of detection efficacy.This study not only pioneers LLM applications in privacy policy compliance detection,but also provides methodological insights for addressing data annotation scarcity in judicial AI systems.

Key words: Privacy policy, Compliance detection, Dynamic optimal trajectory search, Dynamic tri-stage hierarchical compliance evaluator, Large language model

CLC Number: 

  • TP183
[1]SHI J.Deconstruction of the Concept Data and Construction of the Data Law System On the Content and System of Data Law[J].Peking University Law Journal,2023,35(1):23-45.
[2]WANG L.How to Value the Property Rights of Natural Person Data Sources in Data Law[J].Exploration and Free Views,2024(4):109-121,179.
[3]中国互联网信息中心.第55次中国互联网络发展状况统计报告[EB/OL].(2025-01-17)[2025-01-26].https://www.cnnic.net.cn/n4/2025/0117/c88-11229.html.
[4]JIANG H,JIANG J.New Quality Productivity Formation:How Digital Platforms can Generate Greater Benefits[J].Enterprise Economy,2025(1):120-129.
[5]信息通信管理局.“深入推进APP治理扎实做好用户权益保护工作”获评2024年网络文明建设优秀案例[EB/OL].(2024-09-04)[2025-01-28].https://www.miit.gov.cn/jgsj/xgj/APPqhyhqyzxzzxd/gzdt/art/2024/art_a887f391224849a5975f6dd231b0d58c.html.
[6]YU P,XU T,SUN W,et al.Detecting Privacy Compliance of Mobile Applications from the Perspective of the“Minimum Necessary” Principle[J].Chinese Journal of Network and Information Security,2024,10(6):109-122.
[7]GUO Q,WU D.Research on Optimization of APP Privacy Policy Framework Based on Text Analysis[J].Journal of Information Resources Management,2021,11(1):18-29.
[8]MCDONALD A M,CRANOR L F.The Cost of Reading Privacy Policies[J].Isjlp,2008,4:543.
[9]LI H,ZHU H,DU S,et al.Privacy Leakage of Location Sharing in Mobile Social Networks:Attacks and Defense[J].IEEE Transactions on Dependable and Secure Computing,2016,15(4):646-660.
[10]LIU S,ZHANG F,ZHAO B,et al.APPCorp:A Corpus for Android Privacy Policy Document Structure Analysis[J].Frontiers of Computer Science,2023,17(3):173320.
[11]LIU S,ZHAO B,GUO R,et al.Have you been Properly Notified? Automatic Compliance Analysis of Privacy Policy Text with GDPR Article 13[C]//Proceedings of the Web Conference 2021.2021:2154-2164.
[12]COSTANTE E,SUN Y,PETKOVĆ M,et al.A Machine Learning Solution to Assess Privacy Policy Completeness:(short paper)[C]//Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society.2012:91-96.
[13]BHATIA J,BREAUX T D.Semantic Incompleteness in Privacy Policy Goals[C]//2018 IEEE 26th International Requirements Engineering Conference(RE).IEEE,2018:159-169.
[14]BHATIA J,BREAUX T D,REIDENBERG J R,et al.A Theory of Vagueness and Privacy Risk Perception[C]//2016 IEEE 24th International Requirements Engineering Conference(RE).IEEE,2016:26-35.
[15]ANDOW B,MAHMUD S Y,WANG W,et al.PolicyLint:Investigating Internal Privacy Policy Contradictions on Google Play[C]//28th USENIX Security Symposium(USENIX security 19).2019:585-602.
[16]SLAVIN R,WANG X,HOSSEINI M B,et al.Toward a Framework for Detecting Privacy Policy Violations in Android Application Code[C]//Proceedings of the 38th International Confe-rence on Software Engineering.2016:25-36.
[17]LI X,TANG P,ZHANG X,et al.GDPR-Oriented IntelligentChecking Method of Privacy Policies Compliance[J].Chinese Journal of Network and Information Security,2023,9(6):127-139.
[18]CONG Y,HAN L,MA J,et al.Research on Intelligent Judgment of Criminal Cases Based on Large Language Models[J].Computer Science,2025,52(5):248-259.
[19]CUI J,LI Z,YAN Y,et al.Chatlaw:Open-Source Legal Large Language Model with Integrated External Knowledge Bases[J].arXiv:2306.16092v1,2023.
[20]ZHU D,HIANG X,LI Y,et al.Automatic Summarization of Legal Texts Based on Large Language Models[J/OL].http://kns.cnki.net/kcms/detail/10.1478.G2.20241013.1125.002.html.
[21]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thoughtPrompting Elicits Reasoning in Large Language Models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
[22]BAI Y,JONES A,NDOUSSE K,et al.Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback[J].arXiv:2204.05862,2022.
[23]CHEN Z,DENG Y,YUAN H,et al.Self-play Fine-tuning Converts Weak Language Models to Strong Language Models[J].arXiv:2401.01335,2024.
[24]WILSON S,SCHAUB F,DARA A A,et al.The Creation and Analysis of a Website Privacy Policy Corpus[C]//Proceedings of the 54th Annual Meeting of the Association for Computa-tional Linguistics(Volume 1:Long Papers).2016:1330-1340.
[25]SARNE D,SCHLER J,SINGER A,et al.Unsupervised Topic Extraction from Privacy Policies[C]//Companion Proceedings of the 2019 World Wide Web Conference.2019:563-568.
[26]SATHYENDRA K M,WILSON S,SCHAUB F,et al.Identifying the Provision of Choices in Privacy Policy Text[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:2774-2779.
[27]LEBANOFF L,LIU F.Automatic Detection of Vague Words and Sentences in Privacy Policies[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.2018:3508-3517.
[28]ZIMMECK S,WANG Z,ZOU L,et al.Automated Analysis ofPrivacy Requirements for Mobile Apps[C]//NDSS.2017.
[29]KUZNETSOV M,NOVIKOVA E,KOTENKOI,et al.Privacy Policies of IoT Devices:Collection and Analysis[J].Sensors,2022,22(5):1838.
[30]MÜLLER N M,KOWATSCH D,DEBUS P,et al.On GDPRCompliance of Companies' Privacy Policies[C]//Text,Speech,and Dialogue:22nd International Conference,TSD 2019,Ljubljana,Slovenia,September 11-13,2019,Proceedings 22.Springer International Publishing,2019:151-159.
[31]TANG P,LI X,CHEN Y,et al.A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies:Taxonomy,Corpus and GDPR Concept Classifiers[J].arXiv:2410.04754,2024.
[32]ZHAO K,YU L,ZHOU S,et al.A Fine-rained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:10266-10277.
[33]ZHAO K,ZHAN X,YU L,et al.Demystifying Privacy Policy of Third-party Libraries in Mobile Apps[C]//2023 IEEE/ACM 45th International Conference on Software Engineering(ICSE).IEEE,2023:1583-1595.
[34]HARKOUS H,FAWAZ K,LEBRET R,et al.Polisis:Automated Analysis and Presentation of Privacy Policies using Deep Learning[C]//27th USENIX Security Symposium(USENIX Security 18).2018:531-548.
[35]TORRE D,ABUALHAIJA S,SABETZADEH M,et al.An Ai-assisted Approach for Checking the Completeness of Privacy Policies Against GDPR[C]//2020 IEEE 28th International Requirements Engineering Conference(RE).IEEE,2020:136-146.
[36]CEJAS O A,AZEEM M I,ABUALHAIJA S,et al.Nlp-based Automated Compliance Checking of Data Processing Agreements Against GDPR[J].IEEE Transactions on Software Engineering,2023,49(9):4282-4303.
[37]ZHU H,LUO Y,CHEN M,et al.Analyzing Compliance of Privacy Policy with Knowledge-Enhanced DeepLearning Model:From the Perspective of Integrity and Semantic Conflict[J].Data Analysis and Knowledge Discovery,2024,8(5):46-58.
[38]CHEN W,MA X,WANG X,et al.Program of ThoughtsPrompting:Disentangling Computation from Reasoning for Numerical Reasoning Tasks[J].arXiv:2211.12588,2022.
[39]ZHAO J,XIE Y,KAWAGUCHI K,et al.Automatic Model Selection with Large Language Models for Reasoning[C]//Fin-dings of the Association for Computational Linguistics:EMNLP 2023.2023:758-783.
[40]YAO S,YU D,ZHAO J,et al.Tree of Thoughts:Deliberate Problem Solving with Large Language Models[J].Advances in Neural Information Processing Systems,2023,36:11809-11822.
[41]BESTA M,BLACH N,KUBICEK A,et al.Graph of Thoughts:Solving Elaborate Problems with Large Language Models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:17682-17690.
[42]WANG X,LI C,WANG Z,et al.Promptagent:Strategic Planning with Language Models Enables Expert-level Prompt Optimization[J].arXiv:2310.16427,2023.
[43]MADAAN A,TANDON N,GUPTA P,et al.Self-refine:Iterative Refinement with Self-Feedback[J].Advances in Neural Information Processing Systems,2023,36:46534-46594.
[44]YUE M,YAO W,MI H,et al.DOTS:Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search[J].arXiv:2410.03864,2024.
[45]LI Y.The Compliance Review and lmprovement of China's Mobile App Privacy Policy:A Text Review on 49 Cases of Privacy Policy[J].Studies in Law and Business,2019,36(5):26-39.
[46]SINAEEPOURFARD A,MASIP-BRUIN X,GARCIA J,et al.A Survey on Data Lifecycle Models:Discussions Toward the 6Vs Challenges:Technical Resport[R].2015.
[47]ZHAO S,ZHANG H.Changes of the Logical Structure Theory of a Legal Rule and lts Reflection[J].Law and Social Development,2020,26(1):62-80.
[48]DAVIS F D,BAGOZZI R P,WARSHAW P R.User Acceptance of Computer Technology:A Comparison of Two Theoretical Models[J].Management Science,1989,35(8):982-1003.
[49]SAATY T L.Decision Making with the Analytic HierarchyProcess[J].International Journal of Services Sciences,2008,1(1):83-98.
[50]YANG A,YANG B,ZHANG B,et al.Qwen2.5 Technical Report[J].arXiv:2412.15115,2024.
[51]BI X,CHEN D,CHEN G,et al.Deepseek LLM:Scaling Open-source Language Models with Longtermism[J].arXiv:2401.02954,2024.
[52]WANG S,ZHENG Y,WANG G,et al.Llama3.1-8B-Chinese-Chat [EB/OL].https://huggingface.co/shenzhi-wang/Llama3.1-8B-Chinese-Chat.
[53]GLM T,ZENG A,XU B,et al.ChatGLM:A Family of Large Language Models from GLM-130B to GLM-4 All Tools[J].arXiv:2406.12793,2024.
[54]ZHAO Y,YAN Z,SHEN,Q,et al.Evaluating Privacy Policy for Mobile Health APPs with Machine Learning[J].Data Analysis and Knowledge Discovery,2022,6(5):112-126.
[1] WANG Dongsheng. Multi-defendant Legal Judgment Prediction with Multi-turn LLM and Criminal Knowledge Graph [J]. Computer Science, 2025, 52(8): 308-316.
[2] LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[3] CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[4] ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8.
[5] TU Ji, XIAO Wendong, TU Wenji, LI Lijian. Application of Large Language Models in Medical Education:Current Situation,Challenges and Future [J]. Computer Science, 2025, 52(6A): 240400121-6.
[6] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[7] ZOU Rui, YANG Jian, ZHANG Kai. Low-resource Vietnamese Speech Synthesis Based on Phoneme Large Language Model andDiffusion Model [J]. Computer Science, 2025, 52(6A): 240700138-6.
[8] ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[9] BAI Yuntian, HAO Wenning, JIN Dawei. Study on Open-domain Question Answering Methods Based on Retrieval-augmented Generation [J]. Computer Science, 2025, 52(6A): 240800141-7.
[10] ZHANG Le, CHE Chao, LIANG Yan. Hallucinations Proactive Relief in Diabetes Q&A LLM [J]. Computer Science, 2025, 52(6A): 240700182-10.
[11] YIN Baosheng, ZONG Chen. Research on Semantic Fusion of Chinese Polysemous Words Based on Large LanguageModel [J]. Computer Science, 2025, 52(6A): 240400139-7.
[12] HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[13] GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[14] CONG Yingnan, HAN Linrui, MA Jiayu, ZHU Jinqing. Research on Intelligent Judgment of Criminal Cases Based on Large Language Models [J]. Computer Science, 2025, 52(5): 248-259.
[15] CHEN Xuhao, HU Sipeng, LIU Hongchao, LIU Boran, TANG Dan, ZHAO Di. Research on LLM Vector Dot Product Acceleration Based on RISC-V Matrix Instruction Set Extension [J]. Computer Science, 2025, 52(5): 83-90.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!