符合监管合规性的自动合成新闻检测方法研究

doi:10.11896/jsjkx.210300083

摘要/Abstract

摘要： 自动合成新闻已经被广泛应用于金融市场分析、事件报道等格式化新闻信息中,具有重大的社会影响。然而,采用集中式的存储方式进行监管,容易导致信息被监管者或第三方窃取和篡改。因此,在保证检测效率以及准确率的前提下保护信息不被窃取和篡改显得尤为重要。本课题提出一种符合监管合规性的自动合成新闻检测方法,目的是通过发放数据访问令牌,在确保只有监管机构能处理信息的同时将数据活动记录在分布式账本中。该方法设计了两类分布式账本,并通过智能合约调用,以实现认证授权机制和日志记录,只有诚实地参与才能获得区块链的认可并证明符合监管的合规性。此外,该方法采用轻量级的检测算法IDF-FastText赋予边缘节点计算能力,从源头上遏止各种自动合成新闻的肆意传播,实现监管的及时性。将基于GAN的GPT-2检测模型部署在服务器上以供监管机构进行检测结果的验证。最终,通过实验证明了所提设计理念的可行性。

关键词: 区块链, 深度学习, 数据保护, 新闻检测

Abstract: Automatic Synthetic news has been widely used in formatted news information such as financial market analysis and event reports,which has great social impact.However,when centralized storage is adopted for regulatory,it is easy for regulators or third parties to steal and tamper with the information.Therefore,under the premise of ensuring detection efficiency and accur-acy,it is particularly important to protect private information from being leaked.In this paper,an automaticsynthetic news detection method is proposed,which meets the requirements of regulations.The goal is to record data activities in distributed ledger while ensuring that only regulatory agencies can process news information by data access token.This method designs two types of distributed ledgers and calls them through intelligent contracts to realize authorization mechanism and log recording.Only honest participation can be recognized by the blockchain and prove compliance with regulation.Furthermore,the method endows edge nodes with computing power by adopting lightweight detection algorithms IDF-FastText,prevents the proliferation of various synthetic news from the source,and realizes the timeliness of regulation.The GPT-2 detection algorithm based on general adversarial networks(GAN) is deployed on the server for the regulator to verify the detection results.Finally,the feasibility of the proposed design concept is proved by experiments.

Key words: Blockchain, Data protection, Deep learning, News detection

中图分类号:

TP391.9

毛典辉, 黄晖煜, 赵爽. 符合监管合规性的自动合成新闻检测方法研究[J]. 计算机科学, 2022, 49(6A): 523-530. https://doi.org/10.11896/jsjkx.210300083

MAO Dian-hui, HUANG Hui-yu, ZHAO Shuang. Study on Automatic Synthetic News Detection Method Complying with Regulatory Compliance[J]. Computer Science, 2022, 49(6A): 523-530. https://doi.org/10.11896/jsjkx.210300083

参考文献

[1] HOVYD.The enemy in your own camp:How well can we detect statistically-generated fake reviews-an adversarial study[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2016:351-356.
[2] BAO M J,LI J X,ZHANG J,et al.Learning Semantic Coherence for Machine Generated Spam Text Detection[C]//2019 International Joint Conference on Neural Networks(IJCNN).IEEE,2019:1-8.
[3] SOLAIMAN I,BRUNDAGE M,CLARK J,et al.Releasestra-tegies and the social impacts of language models[J].arXiv:1908.09203,2019.
[4] TRUONG N B,SUN K,LEE G M,et al.Gdpr-compliant personal data management:A blockchain-based solution[J].IEEE Transactions on Information Forensics and Security,2019,15:1746-1761.
[5] YU Z,GUO C,XIE Y B,et al.Research on Blockchain Based Pharmaceutical Anti-counterfeit Traceability System[J].Computer Engineering and Applications,2020,56(3):35-41.
[6] LI M J,WANG D,ZENG X S,et al.Design of Blockchain Based Food Safety Traceability System[J].Food Science,2019,40(3):279-285.
[7] KIRKMAN S,NEWMAN R.A cloud data movement policy architecture based on smart contracts and the ethereum blockchain[C]//2018 IEEE International Conference on Cloud Engineering(IC2E).IEEE,2018:371-377.
[8] WANG L,LIU W Y,HAN X W.Blockchain-based government information resource sharing[C]//2017 IEEE 23rd International Conference on Parallel and Distributed Systems(ICPADS).IEEE,2017:804-809.
[9] HANIFATUNNISA R,RAHARDJO B.Blockchain based e-vot-ing recording system design[C]//2017 11th International Conference on Telecommunication Systems Services and Applications(TSSA).IEEE,2017:1-6.
[10] HUCKLE S,WHITE M.Fake news:A technological approach to proving the origins of content,using blockchains[J].Big data,2017,5(4):356-371.
[11] FRAGA-LAMAS P,FERNÁNDEZ-CARAMÉS T M.FakeNews,Disinformation,and Deepfakes:Leveraging Distributed Ledger Technologies and Blockchain to Combat Digital Deception and Counterfeit Reality[J].IT Professional,2020,22(2):53-59.
[12] SHAO C,CIAMPAGLIA G L,FLAMMINI A,et al.Hoaxy:A platform for tracking online misinformation[C]//Proceedings of the 25th International Conference Companion on World Wide Web.2016:745-750.
[13] SWIRE B,ECKER U K H,LEWANDOWSKY S.The role of familiarity in correcting inaccurate information[J].Journal of Experimental Psychology:Learning,Memory,and Cognition,2017,43(12):1948.
[14] NGUYEN M T,LABBÉ C.Engineering a tool to detect automatically generated papers[C]//BIR 2016 Bibliometric-enhanced Information Retrieval.2016:54-62.
[15] AMANCIOD R.Comparing the topological properties of realand artificially generated scientific manuscripts[J].Scientometrics,2015,105(3):1763-1779.
[16] BERESNEVA D.Computer-generated text detection using machine learning:A systematic review[C]//International Confe-rence on Applications of Natural Language to Information Systems.Cham:Springer,2016:421-426.
[17] GEHRMANN S,STROBELT H,RUSH A M.Gltr:Statistical detection and visualization of generated text[J].arXiv:1906.04043,2019.
[18] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018.
[19] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J].OpenAI Blog,2019,1(8):9.
[20] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Fasttext.zip:Compressing text classification models[J].arXiv:1612.03651,2016.

相关文章 15

[1]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4]	王子凯, 朱健, 张伯钧, 胡凯. 区块链与智能合约并行方法研究与实现 Research and Implementation of Parallel Method in Blockchain and Smart Contract 计算机科学, 2022, 49(9): 312-317. https://doi.org/10.11896/jsjkx.210800102
[5]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[6]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[10]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[11]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[12]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[13]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14]	周航, 姜河, 赵琰, 解相朋. 适用于各单元共识交易的电力区块链系统优化调度研究 Study on Optimal Scheduling of Power Blockchain System for Consensus Transaction ofEach Unit 计算机科学, 2022, 49(6A): 771-776. https://doi.org/10.11896/jsjkx.210600241
[15]	祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed