计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 388-398.doi: 10.11896/jsjkx.240500100

• 信息安全 • 上一篇    

基于干扰样本分布优化的工控异常检测改进SVM模型

顾兆军1, 扬雪影1,2, 隋翯3   

  1. 1 中国民航大学信息安全测评中心 天津 300300
    2 中国民航大学计算机科学与技术学院 天津 300300
    3 中国民航大学航空工程学院 天津 300300
  • 收稿日期:2024-05-23 修回日期:2024-09-28 发布日期:2025-07-17
  • 通讯作者: 隋翯(hsui@cauc.edu.cn)
  • 作者简介:(zjgu@cauc.edu.cn)
  • 基金资助:
    国家自然科学基金(U2333201)

Improved SVM Model for Industrial Control Anomaly Detection Based on InterferenceSample Distribution Optimization

GU Zhaojun1, YANG Xueying1,2, SUI He3   

  1. 1 Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
    2 College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
    3 College of Aeronautical Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Received:2024-05-23 Revised:2024-09-28 Published:2025-07-17
  • About author:GU Zhaojun,born in 1966,Ph.D,professor.His main research interests include network and information security,civil aviation information system.
    SUI He,born in 1987,Ph.D,lecturer,is a member of CCF(No.E7954M).His main research interests include industrial control system network and information security.
  • Supported by:
    National Natural Science Foundation of China(U2333201).

摘要: 针对现有的工业控制系统异常检测分类方法大多无法有效处理类不平衡和重叠耦合的问题,提出了一种基于干扰样本分布优化的工控异常检测改进SVM模型(Improved SVM Model Based on Adaptive Differential Evolution with Sphere,SJADE_SVM),该模型将基于超球体覆盖的自适应差分进化过采样技术与支持向量机相结合。首先,通过改进超球体覆盖算法和构建概率公式,来识别和排除干扰样本;然后,改进合成少数派过采样技术,通过对安全样本采样,缓解类不平衡和重叠耦合问题;最后,使用自适应差分进化算法优化样本的位置和属性,同时使用SVM进行分类。在6个真实工控数据集和4个UCI公开数据集上共设计3组实验,包括与逻辑回归和高斯朴素贝叶斯等异常检测分类算法的性能对比、改善样本分布方法的实验对比以及算法的运行时间对比。实验结果表明,该模型在F-score和G-mean评价指标上分别提高了38.29%和10.54%,分类效果稳居前三,且在α=0.05的非参数双侧Wilcoxon符号秩检验和Friedman检验等统计实验中表现出显著的性能优势。

关键词: 异常检测, 采样, 支持向量机, 重叠, 自适应差分进化

Abstract: Most of the existing anomaly detection and classification methods for industrial control systems cannot effectively deal with the problems of class imbalance and overlapping coupling.This paper proposes improved SVM model based on adaptive differential evolution with sphere(SJADE_SVM).The model combines the adaptive differential evolution oversampling technique based on hypersphere coverage with support vector machine.Firstly,by improving the hypersphere covering algorithm,the probability formula is constructed to identify and eliminate interference samples.Then improve synthetic mionrity oversampling technique to relieve class imbalance and overlap coupling by sampling safe samples.Finally,adaptive differential evolution algorithm is used to optimize the location and properties of the samples,and SVM is used to classify the samples.Based on six real industrial control data sets and four UCI public data sets,three sets of experiments are designed to compare the performance of anomaly detection classification algorithms such as logistic regression and Gaussian naive Bayes, improve the experimental comparison of sample distribution methods,meanwhile,compare the running time of algorithm.The experimental results show that the model has a maximum improvement of 38.29% and 10.54% in F-score and G-mean,respectively,ranking the top three in classification results,and also shows significant performance advantages in statistical experiments such as non-parametric bilateral Wilcoxon signed rank test and Friedman test with α=0.05.

Key words: Anomaly detection, Sampling, Support vector machine, Overlap, Adaptive differential evolution

中图分类号: 

  • TP391
[1]ZHU J W,PANG G S.Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts[J].arXiv:2403.06495,2024.
[2]CHANDOLA V,BANERJEE A,KUMAR V.Anomaly detec-tion:A survey[J].ACM Computing Surveys,2009,41(3):1-58.
[3]SANTOS M S,ABREU P H,JAPKOWICZ N,et al.On thejoint-effect of class imbalance and overlap:a critical review[J].Artificial Intelligence Review,2022,55:6207-6275.
[4]BIKKU T,NANDAM S R,AKEPOGU A R.A contemporary feature selection and classification framework for imbalanced bio- medical datasets[J].Egyptian Informatics Journal, 2018, 19(3):191-198.
[5]WANG L,WU C.Dynamic imbalanced business credit evalu-ation based on Learn++ with sliding time window and weight sampling and FCM with multiple kernels[J].Information Sciences,2020,520:305-323.
[6]GAO L,ZHANG L,LIU C,et al.Handling imbalanced medical image data:A deep-learning-based one-class classification approach[J].Artificial Intelligence in Medicine,2020,108:101935.
[7]QIN Z,LIU Z,ZHU P,et al.A GAN-based image synthesis method for skin lesion classification[J].Computer Methods and Programs in Biomedicine,2020,195:105568.
[8]RODRIGUEZ D,HERRAIZ I,HARRISON R,et al.Preliminary comparison of techniques for dealing with imbalance in software defect prediction[C]//Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering.ACM,2014:1-10.
[9]PENG Y,JIANG C Q,XIE F,et al.Research Progress on Information Security of Industrial Control System[J].Journal of Tsinghua University(Natural Science Edition),2012,52(10):1396-1408.
[10]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
[11]ZHANG C,ZHANG H,HU X.A Contrastive Study of Machine Learning on Funding Evaluation Prediction[J].IEEE Access,2019,7:106307-106315.
[12]YU Z H,ZHANG B S,HU G X,et al.Early Fault Diagnosis Model Design of Reciprocating Compressor Valve Based on Multiclass Support Vector Machine and Decision Tree[J].Scientific Programming,2022,2022:1-7.
[13]GAO J F,SHI W G,TAN J X,et al.Support vector machines based approach for fault diagnosis of valves in reciprocating pumps[C]//IEEE CCECE 2002.Canadian Conference on Electrical and Computer Engineering.2002:1622-1627.
[14]LEE H K,KIM S B.An overlap-sensitive margin classifier for imbalanced and overlapping data[J].Expert Systems with Applications,2018,98:72-83.
[15]VUTTIPITTAYAMONGKOL P,ELYAN E,PETROVSKI A.Onthe class overlap problem in imbalanced data classification[J].Knowledge-Based Systems,2021,212:106631.
[16]WEI G,MU W,SONG Y,et al.An improved and random synthetic minority oversampling technique for imbalanced data[J].Knowledge-Based Systems,2022,248:108839.
[17]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[18]BATISTA G E A P A,PRATI R C,MONARD M C.A study of the behavior of several methods for balancing machine learning training data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29.
[19]NAPIERALA K,STEFANOWSKI J,WILK S.Learning fromImbalanced Data in Presence of Noisy and Borderline Examples[C]//International Conference on Rough Sets and Current Trends in Computing.2010:158-167.
[20]VERBIEST N,RAMENTOL E,CORNELIS C,et al.Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection[J].Applied Soft Computing,2014,22:511-517.
[21]SÁEZ J A,LUENGO J,STEFANOWSKI J,et al.SMOTE-IPF:Addressing the noisy and borderline examples problem in imba- lanced classification by a re-sampling method with filtering[J].Information Sciences,2015,291:184-203.
[22]PAN T,ZHAO J,WU W,et al.Learning imbalanced datasets based on SMOTE and Gaussian distribution[J].Information Sciences,2020,512:1214-1233.
[23]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A New Over-Sampling Method in Imbalanced Data Sets Learning[C]//Advances in Intelligent Computing.Berlin:Springer,2005:878-887.
[24]BUNKHUMPORNPAT C,SINAPIROMSARAN K,LURSINSAP C.Safe-Level-SMOTE:Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]//Advances in Knowledge Discovery and Data Mi- ning.Berlin:Springer,2009:475-482.
[25]HE H B,BAI Y,GARCIA E A,et al.ADASYN:Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks(IEEE World Congress on Computational Intelligence).IEEE,2008:1322-1328.
[26]TOMEK I.Two Modifications of CNN[J].IEEE Transactions on Systems,Man,and Cybernetics,1976,SMC-6(11):769-772.
[27]TORGO L,RIBEIRO R P,PFAHRINGER B,et al.SMOTE for Regression[C]//Progress in Artificial Intelligence.Berlin:Springer,2013:378-389.
[28]WILSON D L.Asymptotic Properties of Nearest NeighborRules Using Edited Data[J].IEEE Transactions on Systems,Man,and Cybernetics,1972,SMC-2(3):408-421.
[29]KHOSHGOFTAARHO T M,REBOURS P.Improving Soft-ware Quality Prediction by Noise Filtering Techniques[J].Journal of Computer Science and Technology,2007,22:387-396.
[30]LYAQINI S,HADRI A,ELLAHYANI A,et al.Primal dual algorithm for solving the nonsmooth Twin SVM[J].Engineering Applications of Artificial Intelligence,2024,128:107567.
[31]CHAMBOLLE A,POCK T.A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging[J].Journalof Mathematical Imaging and Vision,2011,40(1):120-145.
[32]EZZIANE H,HOUASSINE H,MOULAHOUM S,et al.A Novel Method to Identification Type,Location,and Extent of Transformer Winding Faults Based on FRA and SMOTE-SVM[J].Russian Journal of Nondestructive Testing,2022,58:391-404.
[33]HOODA S,MANN S.Imbalanced Data Learning With a Novel Ensemble Technique:Extrapolation-SMOTE SVM Bagging[J].International Journal of Grid and Distributed Computing,2020,13(1):1202-1207.
[34]SHEN J,WU J,XU M,et al.A Hybrid Method to Predict Postoperative Survival of Lung Cancer Using Improved SMOTE and Adaptive SVM[J].Computational and Mathematical Methods in Medicine,2021,2021:2213194.
[35]DEEPA T,PUNITHAVALLI M.A new sampling technique and SVM classification for feature selection in high-dimensional imbalanced dataset[C]//2011 3rd International Conference on Electronics Computer Technology.2011:395-398.
[36]ZHOU H,YU K M,CHEN Y C,et al.A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset[J].IEEE Access,2021,9:29719-29735.
[37]MATHEW J,PANG C K,LUO M,et al.Classification of Im-balanced Data by Oversampling in Kernel Space of Support Vector Machines[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(9):4065-4076.
[38]GUO J,WU H,CHEN X,et al.Adaptive SV-BorderlineSMOTE-SVM algorithm for imbalanced data classification[J].Applied Soft Computing,2024,150:110986.
[39]ZHAI W,XIONG X,MO G,et al.A Bagging-SVM field-road trajectory classification model based on feature enhancement[J].Computers and Electronics in Agriculture,2024,217:108635.
[40]ZHOU Z,ZHOU Y D.Distancepreserving sampling method for construction of maximin distance designs in hyperspheres[J].Scientia Sinica Mathematica.2020,50(5):751.
[41]PROCOPIUC O,AGARWAL P K,AEGE L,et al.Bkd-Tree:A Dynamic Scalable kd-Tree[C]//Lecture Notes in Computer Science.2003:46-65.
[42]EISENHART C,ROSENBLATT J R.W.J.Youden,1900-1971[J].Annals of Mathematical Statistics,1972,43(4):1035-1040.
[43]STORN R,PRICE K.Differential Evolution:A Simple and Efficient Adaptive Scheme for Global Optimization Over Continuous Spaces[J].Journal of Global Optimization,1997(11);341-359.
[44]ZHANG J Q,SANDERSON A C.JADE:Adaptive Differential Evolution With Optional External Archive[J].IEEE Transactions on Evolutionary Computation,2009,13(5):945-958.
[45]MURPHY P M,AHA D W.UCI Machine Learning Repository[EB/OL].www.ics.uci.edu/mlearn/MLRepository.html.
[46]RAJAB A,HUANG C T,AL-SHARGABI M,et al.Countering Burst Header Packet Flooding Attack in Optical Burst Switching Network[C]//Information Security Practice and Experience.Cham:Springer,2016:315-329.
[47]HOSSAIN M K,HAUQE M M,DEWAN M A.A Comparative Analysis of Semi-Supervised Learning in Detecting Burst Header Packet Flooding Attack in Optical Burst Switching Network[J].Computers,2021,10(8):95.
[48]PAN S,MORRIS T,ADHIKARI U.Developing a Hybrid Intrusion Detection System Using Data Mining for Power Systems[J].IEEE Transactions on Smart Grid,2015,6(6):3104-3113.
[49]PULIDO E S,LUIS A M,SILVA J J,et al.Design and Implementation of a Parallel-Connected Fault Current Attenuator for Power Distribution Systems[J].IEEE Journal of Emerging and Selected Topics in Power Electronics,2022,10(1):402-412.
[50]MEENA G,CHOUDHARY R R.A review paper on IDS classification using KDD 99 and NSL KDD dataset in WEKA[C]//2017 International Conference on Computer,Communications and Electronics(Comptelix).IEEE,2017:553-558.
[51]PANIGRAHI R,BORAH S,BHOI AK,et al.A Consolidated Decision Tree-Based Intrusion Detection System for Binary and Multiclass Imbalanced Datasets[J].Mathematics,2021,9(7):751.
[52]DRAPER-GIL G,LASHKARI A H,MAMUN M S I,et al.Characterization of Encrypted and VPN Traffic using Time-related Features[C]//Proceedings of the 2nd International Conference on Information Systems Security and Privacy.2016:407-414.
[53]ZAMAN N,RAGAB K,ABDULLAH A B.Wireless SensorNetworks and Energy Efficiency:Protocols,Routing and Mana- gement[M].IGI Global,2012.
[54]AHMED C M,PALLETI V R,MATHUR A P.WADI:a water distribution testbed for research in the design of secure cyber physical systems[C]//Proceedings of the 3rd International Workshop on Cyber Physical Systems for Smart Water Networks.ACM,2017:25-28.
[55]ADEPU S,PALLETI V R,MISHRA G,et al.Investigation of cyber attacks on a water distribution system[C]//International Conference on Applied Cryptography and Network Security.2020:274-291.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!