计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 1-16.doi: 10.11896/jsjkx.250300156
王立梅1,2, 韩林睿1,2, 杜祖炜1,2, 郑日1,2, 时建中1,2, 刘奕群3
WANG Limei1,2, HAN Linrui1,2, DU Zuwei1,2, ZHENG Ri1,2, SHI Jianzhong1,2, LIU Yiqun3
摘要: 隐私政策是网络服务提供者对其合法采集和利用个人信息行为的自律性承诺,旨在增强用户对个人信息处理过程的信任并提升其控制能力。然而,实际应用中却存在内容冗长、术语复杂、合规边界模糊等问题。传统方法依赖分类模型,通过对隐私政策文本进行标注实现自动化合规检测,但存在评估标准单一化、标注数据获取成本高、模型泛化能力不足等局限性。对此,提出一种基于大语言模型的移动应用隐私政策合规性检测方法,核心流程为“构建合规性评估体系-设计层级式推理框架-实现自动化合规检测”。首先,依据《民法典》《个人信息保护法》等9部法律法规及国家标准,构建包含6个一级指标、14个二级指标和41个三级指标的合规性评估体系;其次,基于动态最优轨迹搜索方法设计三阶段层级式推理框架DOTS-THCE,通过小样本提示工程引导大语言模型实现隐私政策的多层次动态评估;最后,基于从“腾讯应用宝”移动应用商店采集的PPC-Bench数据集(涵盖10个类别、4 821份隐私政策文本)开展实验。实验结果表明,与Deepseek-LLM-7B-Chat,Llama3.1-8B-Chinese-Chat和GLM-4-9B-Chat相比,Qwen2.5-7B-Instruct模型经DOTS-THCE方法增强推理后性能更优。Qwen2.5-7B-Instruct@DOTS-THCE模型在隐私政策合规性检测中宏F1值达89.30%,显著优于SVM,CNN,RNN,BERT以及Qwen2.5-7B-Instruct@RAG等基线模型。研究不仅验证了大语言模型在隐私政策合规性检测中应用的有效性,更为破解司法领域高质量标注数据稀缺的困境提供了有益参考。
中图分类号:
[1]SHI J.Deconstruction of the Concept Data and Construction of the Data Law System On the Content and System of Data Law[J].Peking University Law Journal,2023,35(1):23-45. [2]WANG L.How to Value the Property Rights of Natural Person Data Sources in Data Law[J].Exploration and Free Views,2024(4):109-121,179. [3]中国互联网信息中心.第55次中国互联网络发展状况统计报告[EB/OL].(2025-01-17)[2025-01-26].https://www.cnnic.net.cn/n4/2025/0117/c88-11229.html. [4]JIANG H,JIANG J.New Quality Productivity Formation:How Digital Platforms can Generate Greater Benefits[J].Enterprise Economy,2025(1):120-129. [5]信息通信管理局.“深入推进APP治理扎实做好用户权益保护工作”获评2024年网络文明建设优秀案例[EB/OL].(2024-09-04)[2025-01-28].https://www.miit.gov.cn/jgsj/xgj/APPqhyhqyzxzzxd/gzdt/art/2024/art_a887f391224849a5975f6dd231b0d58c.html. [6]YU P,XU T,SUN W,et al.Detecting Privacy Compliance of Mobile Applications from the Perspective of the“Minimum Necessary” Principle[J].Chinese Journal of Network and Information Security,2024,10(6):109-122. [7]GUO Q,WU D.Research on Optimization of APP Privacy Policy Framework Based on Text Analysis[J].Journal of Information Resources Management,2021,11(1):18-29. [8]MCDONALD A M,CRANOR L F.The Cost of Reading Privacy Policies[J].Isjlp,2008,4:543. [9]LI H,ZHU H,DU S,et al.Privacy Leakage of Location Sharing in Mobile Social Networks:Attacks and Defense[J].IEEE Transactions on Dependable and Secure Computing,2016,15(4):646-660. [10]LIU S,ZHANG F,ZHAO B,et al.APPCorp:A Corpus for Android Privacy Policy Document Structure Analysis[J].Frontiers of Computer Science,2023,17(3):173320. [11]LIU S,ZHAO B,GUO R,et al.Have you been Properly Notified? Automatic Compliance Analysis of Privacy Policy Text with GDPR Article 13[C]//Proceedings of the Web Conference 2021.2021:2154-2164. [12]COSTANTE E,SUN Y,PETKOVĆ M,et al.A Machine Learning Solution to Assess Privacy Policy Completeness:(short paper)[C]//Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society.2012:91-96. [13]BHATIA J,BREAUX T D.Semantic Incompleteness in Privacy Policy Goals[C]//2018 IEEE 26th International Requirements Engineering Conference(RE).IEEE,2018:159-169. [14]BHATIA J,BREAUX T D,REIDENBERG J R,et al.A Theory of Vagueness and Privacy Risk Perception[C]//2016 IEEE 24th International Requirements Engineering Conference(RE).IEEE,2016:26-35. [15]ANDOW B,MAHMUD S Y,WANG W,et al.PolicyLint:Investigating Internal Privacy Policy Contradictions on Google Play[C]//28th USENIX Security Symposium(USENIX security 19).2019:585-602. [16]SLAVIN R,WANG X,HOSSEINI M B,et al.Toward a Framework for Detecting Privacy Policy Violations in Android Application Code[C]//Proceedings of the 38th International Confe-rence on Software Engineering.2016:25-36. [17]LI X,TANG P,ZHANG X,et al.GDPR-Oriented IntelligentChecking Method of Privacy Policies Compliance[J].Chinese Journal of Network and Information Security,2023,9(6):127-139. [18]CONG Y,HAN L,MA J,et al.Research on Intelligent Judgment of Criminal Cases Based on Large Language Models[J].Computer Science,2025,52(5):248-259. [19]CUI J,LI Z,YAN Y,et al.Chatlaw:Open-Source Legal Large Language Model with Integrated External Knowledge Bases[J].arXiv:2306.16092v1,2023. [20]ZHU D,HIANG X,LI Y,et al.Automatic Summarization of Legal Texts Based on Large Language Models[J/OL].http://kns.cnki.net/kcms/detail/10.1478.G2.20241013.1125.002.html. [21]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thoughtPrompting Elicits Reasoning in Large Language Models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837. [22]BAI Y,JONES A,NDOUSSE K,et al.Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback[J].arXiv:2204.05862,2022. [23]CHEN Z,DENG Y,YUAN H,et al.Self-play Fine-tuning Converts Weak Language Models to Strong Language Models[J].arXiv:2401.01335,2024. [24]WILSON S,SCHAUB F,DARA A A,et al.The Creation and Analysis of a Website Privacy Policy Corpus[C]//Proceedings of the 54th Annual Meeting of the Association for Computa-tional Linguistics(Volume 1:Long Papers).2016:1330-1340. [25]SARNE D,SCHLER J,SINGER A,et al.Unsupervised Topic Extraction from Privacy Policies[C]//Companion Proceedings of the 2019 World Wide Web Conference.2019:563-568. [26]SATHYENDRA K M,WILSON S,SCHAUB F,et al.Identifying the Provision of Choices in Privacy Policy Text[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:2774-2779. [27]LEBANOFF L,LIU F.Automatic Detection of Vague Words and Sentences in Privacy Policies[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.2018:3508-3517. [28]ZIMMECK S,WANG Z,ZOU L,et al.Automated Analysis ofPrivacy Requirements for Mobile Apps[C]//NDSS.2017. [29]KUZNETSOV M,NOVIKOVA E,KOTENKOI,et al.Privacy Policies of IoT Devices:Collection and Analysis[J].Sensors,2022,22(5):1838. [30]MÜLLER N M,KOWATSCH D,DEBUS P,et al.On GDPRCompliance of Companies' Privacy Policies[C]//Text,Speech,and Dialogue:22nd International Conference,TSD 2019,Ljubljana,Slovenia,September 11-13,2019,Proceedings 22.Springer International Publishing,2019:151-159. [31]TANG P,LI X,CHEN Y,et al.A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies:Taxonomy,Corpus and GDPR Concept Classifiers[J].arXiv:2410.04754,2024. [32]ZHAO K,YU L,ZHOU S,et al.A Fine-rained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:10266-10277. [33]ZHAO K,ZHAN X,YU L,et al.Demystifying Privacy Policy of Third-party Libraries in Mobile Apps[C]//2023 IEEE/ACM 45th International Conference on Software Engineering(ICSE).IEEE,2023:1583-1595. [34]HARKOUS H,FAWAZ K,LEBRET R,et al.Polisis:Automated Analysis and Presentation of Privacy Policies using Deep Learning[C]//27th USENIX Security Symposium(USENIX Security 18).2018:531-548. [35]TORRE D,ABUALHAIJA S,SABETZADEH M,et al.An Ai-assisted Approach for Checking the Completeness of Privacy Policies Against GDPR[C]//2020 IEEE 28th International Requirements Engineering Conference(RE).IEEE,2020:136-146. [36]CEJAS O A,AZEEM M I,ABUALHAIJA S,et al.Nlp-based Automated Compliance Checking of Data Processing Agreements Against GDPR[J].IEEE Transactions on Software Engineering,2023,49(9):4282-4303. [37]ZHU H,LUO Y,CHEN M,et al.Analyzing Compliance of Privacy Policy with Knowledge-Enhanced DeepLearning Model:From the Perspective of Integrity and Semantic Conflict[J].Data Analysis and Knowledge Discovery,2024,8(5):46-58. [38]CHEN W,MA X,WANG X,et al.Program of ThoughtsPrompting:Disentangling Computation from Reasoning for Numerical Reasoning Tasks[J].arXiv:2211.12588,2022. [39]ZHAO J,XIE Y,KAWAGUCHI K,et al.Automatic Model Selection with Large Language Models for Reasoning[C]//Fin-dings of the Association for Computational Linguistics:EMNLP 2023.2023:758-783. [40]YAO S,YU D,ZHAO J,et al.Tree of Thoughts:Deliberate Problem Solving with Large Language Models[J].Advances in Neural Information Processing Systems,2023,36:11809-11822. [41]BESTA M,BLACH N,KUBICEK A,et al.Graph of Thoughts:Solving Elaborate Problems with Large Language Models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:17682-17690. [42]WANG X,LI C,WANG Z,et al.Promptagent:Strategic Planning with Language Models Enables Expert-level Prompt Optimization[J].arXiv:2310.16427,2023. [43]MADAAN A,TANDON N,GUPTA P,et al.Self-refine:Iterative Refinement with Self-Feedback[J].Advances in Neural Information Processing Systems,2023,36:46534-46594. [44]YUE M,YAO W,MI H,et al.DOTS:Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search[J].arXiv:2410.03864,2024. [45]LI Y.The Compliance Review and lmprovement of China's Mobile App Privacy Policy:A Text Review on 49 Cases of Privacy Policy[J].Studies in Law and Business,2019,36(5):26-39. [46]SINAEEPOURFARD A,MASIP-BRUIN X,GARCIA J,et al.A Survey on Data Lifecycle Models:Discussions Toward the 6Vs Challenges:Technical Resport[R].2015. [47]ZHAO S,ZHANG H.Changes of the Logical Structure Theory of a Legal Rule and lts Reflection[J].Law and Social Development,2020,26(1):62-80. [48]DAVIS F D,BAGOZZI R P,WARSHAW P R.User Acceptance of Computer Technology:A Comparison of Two Theoretical Models[J].Management Science,1989,35(8):982-1003. [49]SAATY T L.Decision Making with the Analytic HierarchyProcess[J].International Journal of Services Sciences,2008,1(1):83-98. [50]YANG A,YANG B,ZHANG B,et al.Qwen2.5 Technical Report[J].arXiv:2412.15115,2024. [51]BI X,CHEN D,CHEN G,et al.Deepseek LLM:Scaling Open-source Language Models with Longtermism[J].arXiv:2401.02954,2024. [52]WANG S,ZHENG Y,WANG G,et al.Llama3.1-8B-Chinese-Chat [EB/OL].https://huggingface.co/shenzhi-wang/Llama3.1-8B-Chinese-Chat. [53]GLM T,ZENG A,XU B,et al.ChatGLM:A Family of Large Language Models from GLM-130B to GLM-4 All Tools[J].arXiv:2406.12793,2024. [54]ZHAO Y,YAN Z,SHEN,Q,et al.Evaluating Privacy Policy for Mobile Health APPs with Machine Learning[J].Data Analysis and Knowledge Discovery,2022,6(5):112-126. |
|