Computer Science ›› 2021, Vol. 48 ›› Issue (11A): 557-564.doi: 10.11896/jsjkx.210100015

• Information Security • Previous Articles     Next Articles

Natural Language Steganography Based on Neural Machine Translation

ZHOU Xiao-shi, ZHANG Zi-wei, WEN Juan   

  1. College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:ZHOU Xiao-shi,born in 1994,postgra-duate.Her main research interests include deep transfer learning for image classification,deep learning for text information steganography,natural language processing,machine learning.
    WEN Juan,born in 1982,Ph.D,asso-ciate professor.Her main research inte-rests include artificial intelligence,information hiding,and natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61802410) and Chinese Universities Scientific Fund (2019TC047).

Abstract: Generation-based natural language steganography embeds secret information during text generation under the guidance of secret bitstream.The current generation-based steganographic methods are based on recurrent neural networks (RNN) or long short-term memory (LSTM),which can only generate short stego text because the semantic quality becomes worse as the length of the sentence increases.Moreover,there is hardly any semantic connection between sentences.To address this issue,this paper proposes a neural machine translation steganography algorithm,namely Seq2Seq-Stega,that can generate long text in which semantic relationship maintains well between words and sentences.An encoder-decoder model based on sequence-to-sequence (Seq2Seq) structure is used as our translation model.The source sentence can offer extra information and ensure the semantic relevance between the target stego sentences.In addition,according to the word probability distribution obtained by the model,we design a word selection strategy to form the candidate pool.An attention hyperparameter is introduced to balance the contribution of the source sentence and the target sentence.Experimental results show the hidden capacity and the text quality under different word selection thresholds and attention parameters.Comparative experiments with other three generation-based models show that Seq2Seq-Stega can maintain long-distance semantic connections and better resist steganalysis.

Key words: Machine translation, Natural language steganography, Self-attention mechanism, Semantic relevance, Text generation

CLC Number: 

  • G316
[1]ZHANG H,CAO Y,ZHAO X F.Motion vector-based videosteganography with preserved local optimality[J].Multimedia Tools and Applications,2016,75(2):13503-13519.
[2]LIN T J,CHUNG K L,CHANG P C,et al.An improved DCT-based perturbation scheme for high capacity data hiding in H.264/AVC intra frames[J].Journal of Systems & Software,2013,86(3):604-614.
[3]JANGID S,SHARMA S.High PSNR based video steganography by MLC(multi-level clustering) algorithm[C]//2017 International Conference on Intelligent Computing and Control Systems (ICICCS).2017.
[4]BAI Y Q,JIANG G Y,ZHU Z J,et al.Reversible data hiding scheme for high dynamic range images based on multiple prediction error expansion[J].Signal Processing:Image Communication,2020,91.
[5]EVSUTIN O,MELMAN A,MESHCHERYAKOV R.Algo-rithm of error-free information embedding into the DCT domain of digital images based on the QIM method using adaptive masking of distortions[J].Signal Processing,2020,179(10):107811-107829
[6]PU W.Deep SAR Imaging and Motion Compensation[J].IEEE Transactions on Image Processing,2021,PP(99):1-1.
[7]NANDAL V,SINGH P.Hybrid Optimized Image Steganography with Cryptography[C]//Computational Methods and Data Engineering.2021.
[8]ATOUM M S.Evolutionary Detection Accuracy of Secret Data in Audio Steganography for Securing 5G-Enabled Internet of Things[J].Symmetry,2020,12(12):2071-2088.
[9]SOLIMAN N F,KHALIL M I,ALGARNI A D,et al.Efficient HEVC Steganography Approach Based on Audio Compression and Encryption in QFFT Domain for Secure Multimedia Communication[J].Multimedia Tools and Applications,2020(2):4789-4823.
[10]JIANG S,YE D,HUANG J,et al.SmartSteganogaphy:Light-weight generative audio steganography model for smart embedding application[J].Journal of Network and Computer Applications,2020,165:102689.
[11]WANG X,YANG L T,SONG L,et al.A Tensor-based Multi-Attributes Visual Feature Recognition Method for Industrial Intelligence[J].IEEE Transactions on Industrial Informatics,2020(99):1-1.
[12]JIA J,ZHANG G,HU C,et al.Information hiding method for long distance transmission in multi-channel IOT based on symmetric encryption algorithm[J].Journal of Ambient Intelligence and Humanized Computing,2021,10(2):1007-1017.
[13]YANG Z,ZHANG S,HU Y,et al.VAE-Stega:Linguistic Steganography Based on Variational Auto-Encoder[J].IEEE Transactions on Information Forensics and Security,2020,16(10):1109-1124.
[14]CHAUDHARY S,DAVE M,SANGHI A.Aggrandize text security and hiding data through text steganography[C]//2016 IEEE 7th Power India International Conference (PIICON).IEEE,2016.
[15]FU Z J,SUN X M,ZHOU L,et al.New forensic methods for ooxml format documents[C]//2013 In International Workshop on Digital Watermarking,2013.
[16]BARMAW I,AR I.Linguistic Based Steganography Using Lexical Substitution and Syntactical Transformation[C]//International Conference on It Convergence & Security.IEEE,2016:1-6.
[17]ZHANG J,WANG W,YANG X,et al.A word-frequency-preserving steganographic method based on synonym substitution[J].International Journal of Computational Science and Engineering,2016,1(1):1.
[18]TOPKARA M,TOPKARA U,ATALLAH M J.Informationhiding through errors:a confusing approach[J].Proc Spie,2007,6505.
[19]AGRAWAL R,SHARMA M,SINGH B K.Hiding Patient Information in Medical Images:A Robust Watermarking Algorithm for Healthcare System[M]//Advances in Biomedical Engineering and Technology.2021.
[20]JIN C,ZHANG D,PAN M.Chinese Text Information HidingBased on Paraphrasing Technology[C]//Information Science & Management Engineering.IEEE,2010.
[21]KANG H,WU H,ZHANG X.Generative Text Steganography Based on LSTM Network and Attention Mechanism with Keywords[J].Electronic Imaging,2020,291(8):1-8.
[22]GROTHOFF C,GROTHOFF K,ALKHUTOVA L,et al.Translation-Based Steganography[C]//7th International Workshop on Information Hiding.Springer,Berlin,Heidelberg,2005.
[23]MINH-THANG LUONG† *.Addressing the Rare Word Problem in Neural Machine Translation[J].Bulletin of University of Agricultural Sciences & Veterinary Medicine Cluj Napoca Veterinary Medicine,2015,27(2):82-86.
[24]KARTIKA A S.Steganografi linguistik metode NICETEXTmenggunakan kata dan variasi pola kalimat dasar bahasa indonesia[J].UT-Computer Science,2014,1740(3):865-880.
[25]POLIDORI C,NIEVES-ALDREY J L,GILBERT F,et al.Hidden in taxonomy:Batesian mimicry by a syrphid fly towards a Patagonian bumblebee[J].Insect Conservation & Diversity,2014,7(1):32-40.
[26]SHNIPEROV A N,NIKITINA K A.A text steganographymethod based on Markov chains[J].Automatic Control & Computer Sciences,2016,50(8):802-808.
[27]DESOK Y,ABDELRAHMA N.Jokestega:Automatic joke generation-based steganography methodology[J].International Journal of Security & Networks,2012,7(3):148-160.
[28]MANSOOR F,MOHSEN R.An email-based high capacity text steganography using repeating characters[J].International Journal of Computers & Applications,2018:1-7.
[29]LUO Y,HUANG Y,LI F,et al.Text Steganography Based on Ci-poetry Generation Using Markov Chain Model[J].Ksii Transactions on Internet & Information Systems,2016,10(9):4568-4584.
[30]YANG Z L,GUO X Q,CHEN Z M,et al.RNN-Stega:Linguistic Steganography Based on Recurrent Neural Networks[J].IEEE Transactions on Information Forensics and Security,2019,14(5):1280-1295.
[31]STUTSMAN R,GROTHOFF C,ATALLAH M,et al.Lost in just the translation[C]//Proceedings of the 2006 ACM Symposium on Applied Computing.ACM,2006:338-345.
[32]GROTHOFF C,GROTHOFF K,ALKHUTOVA L,et al.Translation-based steganography[C]//International Workshop on Information Hiding.2009.
[33]MENG P,SHI Y Q,HUANG L,et al.LinL:Lost in n-best List[C]//International Workshop on Information Hiding.Springer Berlin Heidelberg,2011.
[34]AHMADNIA B,DORR B J.Impact of a New Word Embedding Cost Function on Farsi-Spanish Low-Resource Neural Machine Translation[C]//The Thirty-Third International Flairs Conference.2020.
[35]O'BRIEN S,ROSSETTI A.Neural machine translation and the evolution of the localisation sector:Implications for training[J].The Journal of Internationalization and Localization,2020,7(1):95-121.
[36]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].Computer ence,2014,1405(2):36-50.
[37]ABAD I,MARTI N.TensorFlow:Learning Functions at Scale[J].Acm Sigplan Notices A Monthly Publication of the Special Interest Group on Programming Languages,2016,51(9):1-1.
[38]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:A Simple Way to Prevent Neural Networks from Overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[39]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a method for automatic evaluation of machine translation[C]//Association for Computa-tional Linguistics,2002.
[40]YANG Z,JIN S,HUANG Y,et al.Automatically GenerateSteganographic Text Based on Markov Model and Huffman Coding[J].arXiv:1811.04720.
[41]SHNIPEROV A N,NIKITINA K A.A text steganography method based on Markov chains[J].Automatic Control & Computer Sciences,2016,50(8):802-808.
[42]CHEN Z,HUANG L,MENG P,et al.Blind Linguistic Steganalysis against Translation Based Steganography[C]//International Workshop on Digital Watermarking.Springer,Berlin,Heidelberg,2010.
[43]WEN J,ZHOU X,ZHONG P,et al.Convolutional Neural Network Based Text Steganalysis[J].IEEE Signal Processing Letters,2019,PP(3):1-1.
[44]LEE J S,HSIANG J.Patent claim generation by fine-tuningOpenAI GPT-2[J].World Patent Information,2020,62:101983.
[1] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[2] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[3] CHEN Zhang-hui, XIONG Yun. Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate [J]. Computer Science, 2022, 49(6): 180-186.
[4] DONG Zhen-heng, REN Wei-ping, YOU Xin-dong, LYU Xue-qiang. Machine Translation Method Integrating New Energy Terminology Knowledge [J]. Computer Science, 2022, 49(6): 305-312.
[5] ZHAO Dan-dan, HUANG De-gen, MENG Jia-na, DONG Yu, ZHANG Pan. Chinese Entity Relations Classification Based on BERT-GRU-ATT [J]. Computer Science, 2022, 49(6): 319-325.
[6] LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23.
[7] YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30.
[8] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[9] LIU Yan, XIONG De-yi. Construction Method of Parallel Corpus for Minority Language Machine Translation [J]. Computer Science, 2022, 49(1): 41-46.
[10] LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering [J]. Computer Science, 2022, 49(1): 65-72.
[11] HU Yan-li, TONG Tan-qian, ZHANG Xiao-yu, PENG Juan. Self-attention-based BGRU and CNN for Sentiment Analysis [J]. Computer Science, 2022, 49(1): 252-258.
[12] NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278.
[13] LIU Xiao-die. Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception [J]. Computer Science, 2021, 48(6A): 299-305.
[14] WANG Xi, ZHANG Kai, LI Jun-hui, KONG Fang, ZHANG Yi-tian. Generation of Image Caption of Joint Self-attention and Recurrent Neural Network [J]. Computer Science, 2021, 48(4): 157-163.
[15] GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
Full text



No Suggested Reading articles found!