计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 148-154.doi: 10.11896/jsjkx.181001972

• 信息安全 • 上一篇    下一篇

基于正交编码的大数据纯文本水印方法

李兆璨1,2, 王利明2, 葛思江2,3, 马多贺2, 秦勃1   

  1. (中国海洋大学信息科学与工程学院 山东 青岛266100)1 ;
    (中国科学院信息工程研究所 北京100093)2;
    (中国科学院大学网络空间安全学院 北京100049)3
  • 收稿日期:2018-10-23 出版日期:2019-12-15 发布日期:2019-12-17
  • 通讯作者: 王利明(1978-),男,博士,副研究员,CCF会员,主要研究方向为大数据安全、网络安全等,E-mail:wangliming@iie.ac.cn。
  • 作者简介:李兆璨(1994-),女,硕士生,主要研究方向为大数据安全;葛思江(1994-),女,硕士生,主要研究方向为云安全;马多贺(1982-),男,博士,副研究员,CCF会员,主要研究方向为云安全、网络与系统安全等;秦勃(1964-),男,博士,教授,主要研究方向为图形图像处理、高性能计算、机器学习。
  • 基金资助:
    本文受国家重点研发计划(2017YFB1010000)资助。

Big Data Plain Text Watermarking Based on Orthogonal Coding

LI Zhao-can1,2, WANG Li-ming2, GE Si-jiang2,3, MA Duo-he2, QIN Bo1   

  1. (College of Information Science and Engineering,Ocean University of China,Qingdao,Shandong 266100,China)1;
    (Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China)2;
    (College of Cyberspace Security,University of Chinese Academy of Sciences,Beijing 100049,China)3
  • Received:2018-10-23 Online:2019-12-15 Published:2019-12-17

摘要: 数据泄露是大数据应用面临的重要挑战之一。数字水印技术是实现数据追踪和版权保护的有效手段。当前的数字水印方法主要针对终端用户的多媒体文件流转场景,如图像、音视频等,缺少面向大数据环境的文本数据泄露防护的数字水印研究。文中提出了一种基于正交编码的大数据纯文本水印方法,该方法通过编码将明文水印转换为二进制字节流,设计基于行散列值和基于行序置换的正交编码水印方法。首先对二进制水印串分段,按照每行内容的散列值计算待嵌入水印段号,将对应水印段按照自定义规则转换为不可见字符串后嵌入到文本行末;再调整行序,使得每行内容的散列值与加入标志位的二进制水印串对应,以此将水印嵌入大数据纯文本中。水印提取方法为嵌入方法的逆过程。所提方法能够抵抗大数据环境下复杂数据行序变换运算等操作对水印的破坏,同时通过嵌入脆弱水印来达到文本篡改检测的效果。基于所提方法设计并实现了一个大数据纯文本水印系统,采用Spark分布式处理架构来解决海量文本的水印嵌入和提取性能问题,达到了对数据泄露快速追踪溯源的目的,提高了大数据的安全性。实验和理论分析证明,该方法具有较好的水印容量性能和良好的隐蔽性,同时能够抵御多种内容攻击;由于纯文本没有格式,格式攻击对该方法无效,其具有良好的鲁棒性。

关键词: 纯文本, 大数据, 数字水印, 正交, 追踪溯源

Abstract: Data leakage is one of the biggest challenges for big data applications.Digital watermarking is an effective way for data tracking and copyright protection.However,the current digital watermarking method is mainly focus on multimedia file,such as images,audio and video files.There are little digital watermarking methods for data protection in the big data environment.Therefore,this paper proposed a plain text watermarking method based on orthogonal co-ding for big data.First,the plain text watermark is converted into a binary byte stream by coding.The orthogonal watermarking method based on row hash value and row-sequence permutation are designed.The binary watermark string is divided into segments and numbers.The watermark segment number to be embedded is calculated according to the hash value of each line of content,and the corresponding watermark segment is converted into an invisible string which is embedded to the end of line.Then,the line order is adjusted so that the hash value of each line corresponds to the binary watermark string with the flag added,which achieves the embedding of the watermark.Watermark extraction method is the inverse process of the embedding method.It can resist the destruction of watermark by operations such as replacement operation for row order in big data environment,and achieve the effect of text tampering detection by embedding fragile watermarks at the same time.Based on the proposed method,a big data watermarking system was designed and implemented.Spark was adopted to solve the problem of watermark embedding and extraction performance of massive texts,which can quickly trace the source of data leakage and improve the security of big data.Experimental and theoretical analysis prove that the proposed method has better watermark capacity performance and good concealment.At the same time,it has strong robustness since it can resist multiple content attacks and format attacks.

Key words: Big data, Digital watermark, Orthogonal, Plain text, Traceability

中图分类号: 

  • TP309.2
[1]FENG D G,ZHANG M,LI H.Big data security and privacy protection[J].Chinese Journal of Computers,2014,37(1):246-258.(in Chinese)
冯登国,张敏,李昊.大数据安全与隐私保护[J].计算机学报,2014,37(1):246-258.
[2]BRASSIL J T,LOW S,MAXEMCHUK N F,et al.Electronic marking and identification techniques to discourage document copying[J].IEEE Journal on Selected Areas in Communications,1995,13(8):1495-1504.
[3]BRASSIL J T,LOW S,MAXEMCHUK N F.Copyright protection for the electronic distribution of text documents[J].Proceedings of the IEEE,1999,87(7):1181-1196.
[4]CAI F F,LIU Y,YIN X L.Text Watermarking Scheme for Word Documents[J].Computer Science,2012(S2):39-40.
[5]XIAO C,ZHANG C,ZHENG C.FontCode:Embedding Information in Text Documents using Glyph Perturbation[J].ACM Transactions on Graphics (TOG),2018,37(2):15.
[6]CHEN Q,XING X X.Research on performance evaluation benchmark of formatted text watermarking[J].Application Research of Computers,2014,31(9):2764-2768.(in Chinese)
陈青,邢晓溪.格式化文本水印性能评估基准研究[J].计算机应用研究,2014,31(9):2764-2768.
[7]KAUR M,MAHAJAN K.Performance Evaluation of Natural Language Text Watermarking using Encryption Techniques[J].International Journal of Computer Applications,2015,129(3):22-28.
[8]ATALLAH M J,RASKIN V,CROGAN M,et al.Natural Language Watermarking:Design,Analysis,and a Proof-of-Concept Implementation[C]//International Workshop on Information Hiding.Springer-Verlag,2001:185-199.
[9]ATALLAH M J,MCDONOUGH C J,RASKIN V,et al.Natural language processing for information assurance and security:an overview and implementations[C]//The Workshop on New Security Paradigms.ACM,2001:51-65.
[10]LI G S,CHEN J P,MA H Y,et al.Method for Text Watermarking Based on Subject-verb Encoding[J].Computer Science,2015,42(S2):374-377.
[11]LIN X J,TANG X H,WANG J.A Reversible Text Watermarking Algorithm Based on Coding and Synonymy Substitution[J].Journal of Chinese Information Processing,2015,29(4):151-158.(in Chinese)
林新建,唐向宏,王静.编码与同义词替换结合的可逆文本水印算法[J].中文信息学报,2015,29(4):151-158.
[12]KAMARUDDIN N S,KAMSIN A,POR L Y,et al.A Review of Text Watermarking:Theory,Methods,and Applications[J].IEEE Access,2018,6:8011-8028.
[13]MIR N.Copyright for web content using invisible text water- marking[J].Computers in Human Behavior,2014,30:648-653.
[14]TALEBY A M,DANA M H,TABASI S H.An innovative technique for web text watermarking (AITW)[J].Information Security Journal:A Global Perspective,2016,25(4/5/6):191-196.
[15]ZHANG Z Y,LI Q M,QI Y.Text watermarking design based on invisible characters[J].Journal of Nanjing University of Science and Technology,2017,41(4):405-411.(in Chinese)
张震宇,李千目,戚湧.基于不可见字符的文本水印设计[J].南京理工大学学报:自然科学版,2017,41(4):405-411.
[16]BAI J,XU Y H,YANG Y.An Algorithm of Text Steganography[J].Application Research of Computers,2004,21(12):147-148.(in Chinese)
白剑,徐迎晖,杨榆.利用文本载体的信息隐藏算法研究[J].计算机应用研究,2004,21(12):147-148.
[17]FU Y,WANG B B.Extra space coding for embedding Wartermark into text documents and its performance[J].Journal of Xian Highway University,2002,22(3):85-87.(in Chinese)
傅瑜,王保保.文本水印附加空格编码方法的实现及其性能[J].长安大学学报(自然科学版),2002,22(3):85-87.
[18]SUN L.Design of Document Watermarking Algorithm Based on Space Encoding[J].Science Technology & Engineering,2007,7(17):4504-4507.(in Chinese)
孙利.基于空格编码的文本数字水印算法设计[J].科学技术与工程,2007,7(17):4504-4507.
[19]TIWARI N.Digital Watermarking Applications,Parameter Mea- sures and Techniques[J].International Journal of ComputerScience and Network Security (IJCSNS),2017,17(3):184.
[20] KAUR B,SHARMA S.Digital watermarking and security techniques:A review[J].International Journal of Computer Science Technology,2017,8(2):44-47.
[1] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[2] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[3] 赵耿, 王超, 马英杰.
基于混沌序列相关性的峰均比抑制研究
Study on PAPR Reduction Based on Correlation of Chaotic Sequences
计算机科学, 2022, 49(5): 250-255. https://doi.org/10.11896/jsjkx.210400292
[4] 沈家芳, 钱丽萍, 杨超.
面向集能型中继窄带物联网的非正交多址接入和多维网络资源优化
Non-orthogonal Multiple Access and Multi-dimension Resource Optimization in EH Relay NB-IoT Networks
计算机科学, 2022, 49(5): 279-286. https://doi.org/10.11896/jsjkx.210400239
[5] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[6] 杨辉, 陶力宏, 朱建勇, 聂飞平.
基于锚点的快速无监督图嵌入
Fast Unsupervised Graph Embedding Based on Anchors
计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098
[7] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[8] 赵耿, 宋鑫宇, 马英杰.
混沌子载波调制的无人机安全数据链路
Secure Data Link of Unmanned Aerial Vehicle Based on Chaotic Sub-carrier Modulation
计算机科学, 2022, 49(3): 322-328. https://doi.org/10.11896/jsjkx.210200022
[9] 吕由, 吴文渊.
基于同态加密的线性系统求解方案
Linear System Solving Scheme Based on Homomorphic Encryption
计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124
[10] 张海波, 张益峰, 刘开健.
基于NOMA-MEC的车联网任务卸载、迁移与缓存策略
Task Offloading,Migration and Caching Strategy in Internet of Vehicles Based on NOMA-MEC
计算机科学, 2022, 49(2): 304-311. https://doi.org/10.11896/jsjkx.210100157
[11] 佘维, 霍丽娟, 田钊, 刘炜, 宋轩.
面向纯文本信息隐藏的区块链隐蔽通信模型
Blockchain Covert Communication Model for Plain Text Information Hiding
计算机科学, 2022, 49(1): 345-352. https://doi.org/10.11896/jsjkx.201000112
[12] 王俊, 王修来, 庞威, 赵鸿飞.
面向科技前瞻预测的大数据治理研究
Research on Big Data Governance for Science and Technology Forecast
计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207
[13] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳.
面向大数据分析的智能交互向导系统
Smart Interactive Guide System for Big Data Analytics
计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083
[14] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓.
基于深度学习的民事案件判决结果分类方法研究
Study on Judicial Data Classification Method Based on Natural Language Processing Technologies
计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130
[15] 尤凌, 管张均.
一种低复杂度的水声OFDM通信系统子载波分配算法
Low-complexity Subcarrier Allocation Algorithm for Underwater OFDM Acoustic CommunicationSystems
计算机科学, 2021, 48(6A): 387-391. https://doi.org/10.11896/jsjkx.201100064
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!