计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 148-154.doi: 10.11896/jsjkx.181001972
李兆璨1,2, 王利明2, 葛思江2,3, 马多贺2, 秦勃1
LI Zhao-can1,2, WANG Li-ming2, GE Si-jiang2,3, MA Duo-he2, QIN Bo1
摘要: 数据泄露是大数据应用面临的重要挑战之一。数字水印技术是实现数据追踪和版权保护的有效手段。当前的数字水印方法主要针对终端用户的多媒体文件流转场景,如图像、音视频等,缺少面向大数据环境的文本数据泄露防护的数字水印研究。文中提出了一种基于正交编码的大数据纯文本水印方法,该方法通过编码将明文水印转换为二进制字节流,设计基于行散列值和基于行序置换的正交编码水印方法。首先对二进制水印串分段,按照每行内容的散列值计算待嵌入水印段号,将对应水印段按照自定义规则转换为不可见字符串后嵌入到文本行末;再调整行序,使得每行内容的散列值与加入标志位的二进制水印串对应,以此将水印嵌入大数据纯文本中。水印提取方法为嵌入方法的逆过程。所提方法能够抵抗大数据环境下复杂数据行序变换运算等操作对水印的破坏,同时通过嵌入脆弱水印来达到文本篡改检测的效果。基于所提方法设计并实现了一个大数据纯文本水印系统,采用Spark分布式处理架构来解决海量文本的水印嵌入和提取性能问题,达到了对数据泄露快速追踪溯源的目的,提高了大数据的安全性。实验和理论分析证明,该方法具有较好的水印容量性能和良好的隐蔽性,同时能够抵御多种内容攻击;由于纯文本没有格式,格式攻击对该方法无效,其具有良好的鲁棒性。
中图分类号:
[1]FENG D G,ZHANG M,LI H.Big data security and privacy protection[J].Chinese Journal of Computers,2014,37(1):246-258.(in Chinese) 冯登国,张敏,李昊.大数据安全与隐私保护[J].计算机学报,2014,37(1):246-258.[2]BRASSIL J T,LOW S,MAXEMCHUK N F,et al.Electronic marking and identification techniques to discourage document copying[J].IEEE Journal on Selected Areas in Communications,1995,13(8):1495-1504.[3]BRASSIL J T,LOW S,MAXEMCHUK N F.Copyright protection for the electronic distribution of text documents[J].Proceedings of the IEEE,1999,87(7):1181-1196.[4]CAI F F,LIU Y,YIN X L.Text Watermarking Scheme for Word Documents[J].Computer Science,2012(S2):39-40.[5]XIAO C,ZHANG C,ZHENG C.FontCode:Embedding Information in Text Documents using Glyph Perturbation[J].ACM Transactions on Graphics (TOG),2018,37(2):15.[6]CHEN Q,XING X X.Research on performance evaluation benchmark of formatted text watermarking[J].Application Research of Computers,2014,31(9):2764-2768.(in Chinese) 陈青,邢晓溪.格式化文本水印性能评估基准研究[J].计算机应用研究,2014,31(9):2764-2768.[7]KAUR M,MAHAJAN K.Performance Evaluation of Natural Language Text Watermarking using Encryption Techniques[J].International Journal of Computer Applications,2015,129(3):22-28.[8]ATALLAH M J,RASKIN V,CROGAN M,et al.Natural Language Watermarking:Design,Analysis,and a Proof-of-Concept Implementation[C]//International Workshop on Information Hiding.Springer-Verlag,2001:185-199.[9]ATALLAH M J,MCDONOUGH C J,RASKIN V,et al.Natural language processing for information assurance and security:an overview and implementations[C]//The Workshop on New Security Paradigms.ACM,2001:51-65.[10]LI G S,CHEN J P,MA H Y,et al.Method for Text Watermarking Based on Subject-verb Encoding[J].Computer Science,2015,42(S2):374-377.[11]LIN X J,TANG X H,WANG J.A Reversible Text Watermarking Algorithm Based on Coding and Synonymy Substitution[J].Journal of Chinese Information Processing,2015,29(4):151-158.(in Chinese) 林新建,唐向宏,王静.编码与同义词替换结合的可逆文本水印算法[J].中文信息学报,2015,29(4):151-158.[12]KAMARUDDIN N S,KAMSIN A,POR L Y,et al.A Review of Text Watermarking:Theory,Methods,and Applications[J].IEEE Access,2018,6:8011-8028.[13]MIR N.Copyright for web content using invisible text water- marking[J].Computers in Human Behavior,2014,30:648-653.[14]TALEBY A M,DANA M H,TABASI S H.An innovative technique for web text watermarking (AITW)[J].Information Security Journal:A Global Perspective,2016,25(4/5/6):191-196.[15]ZHANG Z Y,LI Q M,QI Y.Text watermarking design based on invisible characters[J].Journal of Nanjing University of Science and Technology,2017,41(4):405-411.(in Chinese) 张震宇,李千目,戚湧.基于不可见字符的文本水印设计[J].南京理工大学学报:自然科学版,2017,41(4):405-411.[16]BAI J,XU Y H,YANG Y.An Algorithm of Text Steganography[J].Application Research of Computers,2004,21(12):147-148.(in Chinese) 白剑,徐迎晖,杨榆.利用文本载体的信息隐藏算法研究[J].计算机应用研究,2004,21(12):147-148.[17]FU Y,WANG B B.Extra space coding for embedding Wartermark into text documents and its performance[J].Journal of Xian Highway University,2002,22(3):85-87.(in Chinese) 傅瑜,王保保.文本水印附加空格编码方法的实现及其性能[J].长安大学学报(自然科学版),2002,22(3):85-87.[18]SUN L.Design of Document Watermarking Algorithm Based on Space Encoding[J].Science Technology & Engineering,2007,7(17):4504-4507.(in Chinese) 孙利.基于空格编码的文本数字水印算法设计[J].科学技术与工程,2007,7(17):4504-4507.[19]TIWARI N.Digital Watermarking Applications,Parameter Mea- sures and Techniques[J].International Journal of ComputerScience and Network Security (IJCSNS),2017,17(3):184.[20] KAUR B,SHARMA S.Digital watermarking and security techniques:A review[J].International Journal of Computer Science Technology,2017,8(2):44-47. |
[1] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[2] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[3] | 赵耿, 王超, 马英杰. 基于混沌序列相关性的峰均比抑制研究 Study on PAPR Reduction Based on Correlation of Chaotic Sequences 计算机科学, 2022, 49(5): 250-255. https://doi.org/10.11896/jsjkx.210400292 |
[4] | 沈家芳, 钱丽萍, 杨超. 面向集能型中继窄带物联网的非正交多址接入和多维网络资源优化 Non-orthogonal Multiple Access and Multi-dimension Resource Optimization in EH Relay NB-IoT Networks 计算机科学, 2022, 49(5): 279-286. https://doi.org/10.11896/jsjkx.210400239 |
[5] | 孙轩, 王焕骁. 政务大数据安全防护能力建设:基于技术和管理视角的探讨 Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives 计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010 |
[6] | 杨辉, 陶力宏, 朱建勇, 聂飞平. 基于锚点的快速无监督图嵌入 Fast Unsupervised Graph Embedding Based on Anchors 计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098 |
[7] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[8] | 赵耿, 宋鑫宇, 马英杰. 混沌子载波调制的无人机安全数据链路 Secure Data Link of Unmanned Aerial Vehicle Based on Chaotic Sub-carrier Modulation 计算机科学, 2022, 49(3): 322-328. https://doi.org/10.11896/jsjkx.210200022 |
[9] | 吕由, 吴文渊. 基于同态加密的线性系统求解方案 Linear System Solving Scheme Based on Homomorphic Encryption 计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124 |
[10] | 张海波, 张益峰, 刘开健. 基于NOMA-MEC的车联网任务卸载、迁移与缓存策略 Task Offloading,Migration and Caching Strategy in Internet of Vehicles Based on NOMA-MEC 计算机科学, 2022, 49(2): 304-311. https://doi.org/10.11896/jsjkx.210100157 |
[11] | 佘维, 霍丽娟, 田钊, 刘炜, 宋轩. 面向纯文本信息隐藏的区块链隐蔽通信模型 Blockchain Covert Communication Model for Plain Text Information Hiding 计算机科学, 2022, 49(1): 345-352. https://doi.org/10.11896/jsjkx.201000112 |
[12] | 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究 Research on Big Data Governance for Science and Technology Forecast 计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207 |
[13] | 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳. 面向大数据分析的智能交互向导系统 Smart Interactive Guide System for Big Data Analytics 计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083 |
[14] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[15] | 尤凌, 管张均. 一种低复杂度的水声OFDM通信系统子载波分配算法 Low-complexity Subcarrier Allocation Algorithm for Underwater OFDM Acoustic CommunicationSystems 计算机科学, 2021, 48(6A): 387-391. https://doi.org/10.11896/jsjkx.201100064 |
|