Computer Science ›› 2019, Vol. 46 ›› Issue (12): 148-154.doi: 10.11896/jsjkx.181001972

• Information Security • Previous Articles     Next Articles

Big Data Plain Text Watermarking Based on Orthogonal Coding

LI Zhao-can1,2, WANG Li-ming2, GE Si-jiang2,3, MA Duo-he2, QIN Bo1   

  1. (College of Information Science and Engineering,Ocean University of China,Qingdao,Shandong 266100,China)1;
    (Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China)2;
    (College of Cyberspace Security,University of Chinese Academy of Sciences,Beijing 100049,China)3
  • Received:2018-10-23 Online:2019-12-15 Published:2019-12-17

Abstract: Data leakage is one of the biggest challenges for big data applications.Digital watermarking is an effective way for data tracking and copyright protection.However,the current digital watermarking method is mainly focus on multimedia file,such as images,audio and video files.There are little digital watermarking methods for data protection in the big data environment.Therefore,this paper proposed a plain text watermarking method based on orthogonal co-ding for big data.First,the plain text watermark is converted into a binary byte stream by coding.The orthogonal watermarking method based on row hash value and row-sequence permutation are designed.The binary watermark string is divided into segments and numbers.The watermark segment number to be embedded is calculated according to the hash value of each line of content,and the corresponding watermark segment is converted into an invisible string which is embedded to the end of line.Then,the line order is adjusted so that the hash value of each line corresponds to the binary watermark string with the flag added,which achieves the embedding of the watermark.Watermark extraction method is the inverse process of the embedding method.It can resist the destruction of watermark by operations such as replacement operation for row order in big data environment,and achieve the effect of text tampering detection by embedding fragile watermarks at the same time.Based on the proposed method,a big data watermarking system was designed and implemented.Spark was adopted to solve the problem of watermark embedding and extraction performance of massive texts,which can quickly trace the source of data leakage and improve the security of big data.Experimental and theoretical analysis prove that the proposed method has better watermark capacity performance and good concealment.At the same time,it has strong robustness since it can resist multiple content attacks and format attacks.

Key words: Big data, Digital watermark, Orthogonal, Plain text, Traceability

CLC Number: 

  • TP309.2
[1]FENG D G,ZHANG M,LI H.Big data security and privacy protection[J].Chinese Journal of Computers,2014,37(1):246-258.(in Chinese)
冯登国,张敏,李昊.大数据安全与隐私保护[J].计算机学报,2014,37(1):246-258.
[2]BRASSIL J T,LOW S,MAXEMCHUK N F,et al.Electronic marking and identification techniques to discourage document copying[J].IEEE Journal on Selected Areas in Communications,1995,13(8):1495-1504.
[3]BRASSIL J T,LOW S,MAXEMCHUK N F.Copyright protection for the electronic distribution of text documents[J].Proceedings of the IEEE,1999,87(7):1181-1196.
[4]CAI F F,LIU Y,YIN X L.Text Watermarking Scheme for Word Documents[J].Computer Science,2012(S2):39-40.
[5]XIAO C,ZHANG C,ZHENG C.FontCode:Embedding Information in Text Documents using Glyph Perturbation[J].ACM Transactions on Graphics (TOG),2018,37(2):15.
[6]CHEN Q,XING X X.Research on performance evaluation benchmark of formatted text watermarking[J].Application Research of Computers,2014,31(9):2764-2768.(in Chinese)
陈青,邢晓溪.格式化文本水印性能评估基准研究[J].计算机应用研究,2014,31(9):2764-2768.
[7]KAUR M,MAHAJAN K.Performance Evaluation of Natural Language Text Watermarking using Encryption Techniques[J].International Journal of Computer Applications,2015,129(3):22-28.
[8]ATALLAH M J,RASKIN V,CROGAN M,et al.Natural Language Watermarking:Design,Analysis,and a Proof-of-Concept Implementation[C]//International Workshop on Information Hiding.Springer-Verlag,2001:185-199.
[9]ATALLAH M J,MCDONOUGH C J,RASKIN V,et al.Natural language processing for information assurance and security:an overview and implementations[C]//The Workshop on New Security Paradigms.ACM,2001:51-65.
[10]LI G S,CHEN J P,MA H Y,et al.Method for Text Watermarking Based on Subject-verb Encoding[J].Computer Science,2015,42(S2):374-377.
[11]LIN X J,TANG X H,WANG J.A Reversible Text Watermarking Algorithm Based on Coding and Synonymy Substitution[J].Journal of Chinese Information Processing,2015,29(4):151-158.(in Chinese)
林新建,唐向宏,王静.编码与同义词替换结合的可逆文本水印算法[J].中文信息学报,2015,29(4):151-158.
[12]KAMARUDDIN N S,KAMSIN A,POR L Y,et al.A Review of Text Watermarking:Theory,Methods,and Applications[J].IEEE Access,2018,6:8011-8028.
[13]MIR N.Copyright for web content using invisible text water- marking[J].Computers in Human Behavior,2014,30:648-653.
[14]TALEBY A M,DANA M H,TABASI S H.An innovative technique for web text watermarking (AITW)[J].Information Security Journal:A Global Perspective,2016,25(4/5/6):191-196.
[15]ZHANG Z Y,LI Q M,QI Y.Text watermarking design based on invisible characters[J].Journal of Nanjing University of Science and Technology,2017,41(4):405-411.(in Chinese)
张震宇,李千目,戚湧.基于不可见字符的文本水印设计[J].南京理工大学学报:自然科学版,2017,41(4):405-411.
[16]BAI J,XU Y H,YANG Y.An Algorithm of Text Steganography[J].Application Research of Computers,2004,21(12):147-148.(in Chinese)
白剑,徐迎晖,杨榆.利用文本载体的信息隐藏算法研究[J].计算机应用研究,2004,21(12):147-148.
[17]FU Y,WANG B B.Extra space coding for embedding Wartermark into text documents and its performance[J].Journal of Xian Highway University,2002,22(3):85-87.(in Chinese)
傅瑜,王保保.文本水印附加空格编码方法的实现及其性能[J].长安大学学报(自然科学版),2002,22(3):85-87.
[18]SUN L.Design of Document Watermarking Algorithm Based on Space Encoding[J].Science Technology & Engineering,2007,7(17):4504-4507.(in Chinese)
孙利.基于空格编码的文本数字水印算法设计[J].科学技术与工程,2007,7(17):4504-4507.
[19]TIWARI N.Digital Watermarking Applications,Parameter Mea- sures and Techniques[J].International Journal of ComputerScience and Network Security (IJCSNS),2017,17(3):184.
[20] KAUR B,SHARMA S.Digital watermarking and security techniques:A review[J].International Journal of Computer Science Technology,2017,8(2):44-47.
[1] CHEN Jing, WU Ling-ling. Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment [J]. Computer Science, 2022, 49(8): 108-112.
[2] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[3] LI Bo, XIANG Hai-yun, ZHANG Yu-xiang, LIAO Hao-de. Application Research of PBFT Optimization Algorithm for Food Traceability Scenarios [J]. Computer Science, 2022, 49(6A): 723-728.
[4] ZHAO Geng, WANG Chao, MA Ying-jie. Study on PAPR Reduction Based on Correlation of Chaotic Sequences [J]. Computer Science, 2022, 49(5): 250-255.
[5] SHEN Jia-fang, QIAN Li-ping, YANG Chao. Non-orthogonal Multiple Access and Multi-dimension Resource Optimization in EH Relay NB-IoT Networks [J]. Computer Science, 2022, 49(5): 279-286.
[6] SUN Xuan, WANG Huan-xiao. Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives [J]. Computer Science, 2022, 49(4): 67-73.
[7] YANG Hui, TAO Li-hong, ZHU Jian-yong, NIE Fei-ping. Fast Unsupervised Graph Embedding Based on Anchors [J]. Computer Science, 2022, 49(4): 116-123.
[8] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[9] ZHAO Geng, SONG Xin-yu, MA Ying-jie. Secure Data Link of Unmanned Aerial Vehicle Based on Chaotic Sub-carrier Modulation [J]. Computer Science, 2022, 49(3): 322-328.
[10] LYU You, WU Wen-yuan. Linear System Solving Scheme Based on Homomorphic Encryption [J]. Computer Science, 2022, 49(3): 338-345.
[11] SHE Wei, HUO Li-juan, TIAN Zhao, LIU Wei, SONG Xuan. Blockchain Covert Communication Model for Plain Text Information Hiding [J]. Computer Science, 2022, 49(1): 345-352.
[12] WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[13] YU Yue-zhang, XIA Tian-yu, JING Yi-nan, HE Zhen-ying, WANG Xiao-yang. Smart Interactive Guide System for Big Data Analytics [J]. Computer Science, 2021, 48(9): 110-117.
[14] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[15] YOU Ling, GUAN Zhang-jun. Low-complexity Subcarrier Allocation Algorithm for Underwater OFDM Acoustic CommunicationSystems [J]. Computer Science, 2021, 48(6A): 387-391.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!