计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 557-564.doi: 10.11896/jsjkx.210100015

• 信息安全 • 上一篇    下一篇

基于神经网络机器翻译的自然语言信息隐藏

周小诗, 张梓葳, 文娟   

  1. 中国农业大学信息与电气工程学院 北京100083
  • 出版日期:2021-11-10 发布日期:2021-11-12
  • 通讯作者: 文娟(wenjuan@cau.edu.cn)
  • 作者简介:zhouxiaoshi0713@163.com
  • 基金资助:
    国家自然科学基金(61802410);中国高校科学基金(2019TC047)

Natural Language Steganography Based on Neural Machine Translation

ZHOU Xiao-shi, ZHANG Zi-wei, WEN Juan   

  1. College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:ZHOU Xiao-shi,born in 1994,postgra-duate.Her main research interests include deep transfer learning for image classification,deep learning for text information steganography,natural language processing,machine learning.
    WEN Juan,born in 1982,Ph.D,asso-ciate professor.Her main research inte-rests include artificial intelligence,information hiding,and natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61802410) and Chinese Universities Scientific Fund (2019TC047).

摘要: 生成式自然语言信息隐藏在自然语言生成过程中嵌入秘密信息。目前主流的生成式自然语言隐藏方法采用一个简单的循环神经网络(Recurrent Neural Networks,RNN)或长短时记忆网络(Long Short-Term Memory,LSTM)进行载密文本的生成。这种方法生成的载密文本长度有限,且句子和句子之间没有语义关联。为了解决这个问题,提出了能够生成长句且句与句之间能保持语义关系的机器翻译隐写算法Seq2Seq-Stega。采用序列到序列(Sequence to Sequence,Seq2Seq)模型作为文本隐写的编码器和解码器,源语句的信息可以保证目标载密句的语义关联性。此外,根据每一时刻模型计算的单词概率分布,设计了候选池的选词策略,并引入了平衡源语句与目标句的贡献度的注意力超参数。通过实验比较了不同选词阈值和注意力参数下模型的隐藏容量和生成文本的质量。与其他3种生成式模型的对比实验表明,该算法能够保持长距离语义关联,并具有较好的抗隐写分析能力。

关键词: 机器翻译, 文本生成, 语义距离, 自然语言信息隐藏, 自注意力机制

Abstract: Generation-based natural language steganography embeds secret information during text generation under the guidance of secret bitstream.The current generation-based steganographic methods are based on recurrent neural networks (RNN) or long short-term memory (LSTM),which can only generate short stego text because the semantic quality becomes worse as the length of the sentence increases.Moreover,there is hardly any semantic connection between sentences.To address this issue,this paper proposes a neural machine translation steganography algorithm,namely Seq2Seq-Stega,that can generate long text in which semantic relationship maintains well between words and sentences.An encoder-decoder model based on sequence-to-sequence (Seq2Seq) structure is used as our translation model.The source sentence can offer extra information and ensure the semantic relevance between the target stego sentences.In addition,according to the word probability distribution obtained by the model,we design a word selection strategy to form the candidate pool.An attention hyperparameter is introduced to balance the contribution of the source sentence and the target sentence.Experimental results show the hidden capacity and the text quality under different word selection thresholds and attention parameters.Comparative experiments with other three generation-based models show that Seq2Seq-Stega can maintain long-distance semantic connections and better resist steganalysis.

Key words: Machine translation, Natural language steganography, Self-attention mechanism, Semantic relevance, Text generation

中图分类号: 

  • G316
[1]ZHANG H,CAO Y,ZHAO X F.Motion vector-based videosteganography with preserved local optimality[J].Multimedia Tools and Applications,2016,75(2):13503-13519.
[2]LIN T J,CHUNG K L,CHANG P C,et al.An improved DCT-based perturbation scheme for high capacity data hiding in H.264/AVC intra frames[J].Journal of Systems & Software,2013,86(3):604-614.
[3]JANGID S,SHARMA S.High PSNR based video steganography by MLC(multi-level clustering) algorithm[C]//2017 International Conference on Intelligent Computing and Control Systems (ICICCS).2017.
[4]BAI Y Q,JIANG G Y,ZHU Z J,et al.Reversible data hiding scheme for high dynamic range images based on multiple prediction error expansion[J].Signal Processing:Image Communication,2020,91.
[5]EVSUTIN O,MELMAN A,MESHCHERYAKOV R.Algo-rithm of error-free information embedding into the DCT domain of digital images based on the QIM method using adaptive masking of distortions[J].Signal Processing,2020,179(10):107811-107829
[6]PU W.Deep SAR Imaging and Motion Compensation[J].IEEE Transactions on Image Processing,2021,PP(99):1-1.
[7]NANDAL V,SINGH P.Hybrid Optimized Image Steganography with Cryptography[C]//Computational Methods and Data Engineering.2021.
[8]ATOUM M S.Evolutionary Detection Accuracy of Secret Data in Audio Steganography for Securing 5G-Enabled Internet of Things[J].Symmetry,2020,12(12):2071-2088.
[9]SOLIMAN N F,KHALIL M I,ALGARNI A D,et al.Efficient HEVC Steganography Approach Based on Audio Compression and Encryption in QFFT Domain for Secure Multimedia Communication[J].Multimedia Tools and Applications,2020(2):4789-4823.
[10]JIANG S,YE D,HUANG J,et al.SmartSteganogaphy:Light-weight generative audio steganography model for smart embedding application[J].Journal of Network and Computer Applications,2020,165:102689.
[11]WANG X,YANG L T,SONG L,et al.A Tensor-based Multi-Attributes Visual Feature Recognition Method for Industrial Intelligence[J].IEEE Transactions on Industrial Informatics,2020(99):1-1.
[12]JIA J,ZHANG G,HU C,et al.Information hiding method for long distance transmission in multi-channel IOT based on symmetric encryption algorithm[J].Journal of Ambient Intelligence and Humanized Computing,2021,10(2):1007-1017.
[13]YANG Z,ZHANG S,HU Y,et al.VAE-Stega:Linguistic Steganography Based on Variational Auto-Encoder[J].IEEE Transactions on Information Forensics and Security,2020,16(10):1109-1124.
[14]CHAUDHARY S,DAVE M,SANGHI A.Aggrandize text security and hiding data through text steganography[C]//2016 IEEE 7th Power India International Conference (PIICON).IEEE,2016.
[15]FU Z J,SUN X M,ZHOU L,et al.New forensic methods for ooxml format documents[C]//2013 In International Workshop on Digital Watermarking,2013.
[16]BARMAW I,AR I.Linguistic Based Steganography Using Lexical Substitution and Syntactical Transformation[C]//International Conference on It Convergence & Security.IEEE,2016:1-6.
[17]ZHANG J,WANG W,YANG X,et al.A word-frequency-preserving steganographic method based on synonym substitution[J].International Journal of Computational Science and Engineering,2016,1(1):1.
[18]TOPKARA M,TOPKARA U,ATALLAH M J.Informationhiding through errors:a confusing approach[J].Proc Spie,2007,6505.
[19]AGRAWAL R,SHARMA M,SINGH B K.Hiding Patient Information in Medical Images:A Robust Watermarking Algorithm for Healthcare System[M]//Advances in Biomedical Engineering and Technology.2021.
[20]JIN C,ZHANG D,PAN M.Chinese Text Information HidingBased on Paraphrasing Technology[C]//Information Science & Management Engineering.IEEE,2010.
[21]KANG H,WU H,ZHANG X.Generative Text Steganography Based on LSTM Network and Attention Mechanism with Keywords[J].Electronic Imaging,2020,291(8):1-8.
[22]GROTHOFF C,GROTHOFF K,ALKHUTOVA L,et al.Translation-Based Steganography[C]//7th International Workshop on Information Hiding.Springer,Berlin,Heidelberg,2005.
[23]MINH-THANG LUONG† *.Addressing the Rare Word Problem in Neural Machine Translation[J].Bulletin of University of Agricultural Sciences & Veterinary Medicine Cluj Napoca Veterinary Medicine,2015,27(2):82-86.
[24]KARTIKA A S.Steganografi linguistik metode NICETEXTmenggunakan kata dan variasi pola kalimat dasar bahasa indonesia[J].UT-Computer Science,2014,1740(3):865-880.
[25]POLIDORI C,NIEVES-ALDREY J L,GILBERT F,et al.Hidden in taxonomy:Batesian mimicry by a syrphid fly towards a Patagonian bumblebee[J].Insect Conservation & Diversity,2014,7(1):32-40.
[26]SHNIPEROV A N,NIKITINA K A.A text steganographymethod based on Markov chains[J].Automatic Control & Computer Sciences,2016,50(8):802-808.
[27]DESOK Y,ABDELRAHMA N.Jokestega:Automatic joke generation-based steganography methodology[J].International Journal of Security & Networks,2012,7(3):148-160.
[28]MANSOOR F,MOHSEN R.An email-based high capacity text steganography using repeating characters[J].International Journal of Computers & Applications,2018:1-7.
[29]LUO Y,HUANG Y,LI F,et al.Text Steganography Based on Ci-poetry Generation Using Markov Chain Model[J].Ksii Transactions on Internet & Information Systems,2016,10(9):4568-4584.
[30]YANG Z L,GUO X Q,CHEN Z M,et al.RNN-Stega:Linguistic Steganography Based on Recurrent Neural Networks[J].IEEE Transactions on Information Forensics and Security,2019,14(5):1280-1295.
[31]STUTSMAN R,GROTHOFF C,ATALLAH M,et al.Lost in just the translation[C]//Proceedings of the 2006 ACM Symposium on Applied Computing.ACM,2006:338-345.
[32]GROTHOFF C,GROTHOFF K,ALKHUTOVA L,et al.Translation-based steganography[C]//International Workshop on Information Hiding.2009.
[33]MENG P,SHI Y Q,HUANG L,et al.LinL:Lost in n-best List[C]//International Workshop on Information Hiding.Springer Berlin Heidelberg,2011.
[34]AHMADNIA B,DORR B J.Impact of a New Word Embedding Cost Function on Farsi-Spanish Low-Resource Neural Machine Translation[C]//The Thirty-Third International Flairs Conference.2020.
[35]O'BRIEN S,ROSSETTI A.Neural machine translation and the evolution of the localisation sector:Implications for training[J].The Journal of Internationalization and Localization,2020,7(1):95-121.
[36]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].Computer ence,2014,1405(2):36-50.
[37]ABAD I,MARTI N.TensorFlow:Learning Functions at Scale[J].Acm Sigplan Notices A Monthly Publication of the Special Interest Group on Programming Languages,2016,51(9):1-1.
[38]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:A Simple Way to Prevent Neural Networks from Overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[39]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Association for Computa-tional Linguistics,2002.
[40]YANG Z,JIN S,HUANG Y,et al.Automatically GenerateSteganographic Text Based on Markov Model and Huffman Coding[J].arXiv:1811.04720.
[41]SHNIPEROV A N,NIKITINA K A.A text steganography method based on Markov chains[J].Automatic Control & Computer Sciences,2016,50(8):802-808.
[42]CHEN Z,HUANG L,MENG P,et al.Blind Linguistic Steganalysis against Translation Based Steganography[C]//International Workshop on Digital Watermarking.Springer,Berlin,Heidelberg,2010.
[43]WEN J,ZHOU X,ZHONG P,et al.Convolutional Neural Network Based Text Steganalysis[J].IEEE Signal Processing Letters,2019,PP(3):1-1.
[44]LEE J S,HSIANG J.Patent claim generation by fine-tuningOpenAI GPT-2[J].World Patent Information,2020,62:101983.
[1] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[2] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[3] 陈章辉, 熊贇.
基于解耦-检索-生成的图像风格化描述生成模型
Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate
计算机科学, 2022, 49(6): 180-186. https://doi.org/10.11896/jsjkx.211100129
[4] 董振恒, 任维平, 游新冬, 吕学强.
融入新能源领域术语知识的机器翻译方法
Machine Translation Method Integrating New Energy Terminology Knowledge
计算机科学, 2022, 49(6): 305-312. https://doi.org/10.11896/jsjkx.210500117
[5] 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀.
基于BERT-GRU-ATT模型的中文实体关系分类
Chinese Entity Relations Classification Based on BERT-GRU-ATT
计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[6] 刘俊鹏, 苏劲松, 黄德根.
融合特定语言适配模块的多语言神经机器翻译
Incorporating Language-specific Adapter into Multilingual Neural Machine Translation
计算机科学, 2022, 49(1): 17-23. https://doi.org/10.11896/jsjkx.210900005
[7] 于东, 谢婉莹, 谷舒豪, 冯洋.
基于语种关联度课程学习的多语言神经机器翻译
Similarity-based Curriculum Learning for Multilingual Neural Machine Translation
计算机科学, 2022, 49(1): 24-30. https://doi.org/10.11896/jsjkx.210800254
[8] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[9] 刘妍, 熊德意.
面向小语种机器翻译的平行语料库构建方法
Construction Method of Parallel Corpus for Minority Language Machine Translation
计算机科学, 2022, 49(1): 41-46. https://doi.org/10.11896/jsjkx.210900012
[10] 刘创, 熊德意.
多语言问答研究综述
Survey of Multilingual Question Answering
计算机科学, 2022, 49(1): 65-72. https://doi.org/10.11896/jsjkx.210900003
[11] 张玮琪, 汤轶丰, 李林燕, 胡伏原.
基于场景图的段落生成序列图像方法
Image Stream From Paragraph Method Based on Scene Graph
计算机科学, 2022, 49(1): 233-240. https://doi.org/10.11896/jsjkx.201100207
[12] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[13] 宁秋怡, 史小静, 段湘煜, 张民.
基于风格感知的无监督领域适应算法
Unsupervised Domain Adaptation Based on Style Aware
计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094
[14] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳.
基于多级特征和全局上下文的纵膈淋巴结分割算法
Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context
计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067
[15] 刘小蝶.
基于边界感知的复杂名词短语的识别和转换研究
Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception
计算机科学, 2021, 48(6A): 299-305. https://doi.org/10.11896/jsjkx.200500157
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!