Computer Science ›› 2019, Vol. 46 ›› Issue (2): 1-10.doi: 10.11896/j.issn.1002-137X.2019.02.001

• Big Data & Data Science •     Next Articles

Big Data Analytics and Insights in Distribution Characteristics of Supply Chain Finance

LIU Ying   

  1. School of Management Science and Information Engineering,Jilin University of Finance and Economics,Changchun 130117,China
    Jilin Province Key Laboratory of Logistics Industry Economy and Intelligent Logistics,Changchun 130117,China
    Laboratory of Internet Finance,Jilin University of Finance and Economics,Changchun 130117,China
  • Received:2018-08-30 Online:2019-02-25 Published:2019-02-25

Abstract: The semi-structured,unstructured and massive supply chain finance data make the analysis method relatively complicated in large data environment.How to use the unique characteristics of large samples to improve classification performance is worth exploring for the research on large data samples.This paper analyzed the main factors,which affectthe classification model of credit risk based on the distribution characteristics of financial data in supply chain,proposed distribution characteristics of credit data after researching the relevant achievements over the years,including imbalance data,noise and outliers,nonlinear multidimensional and so on,and then discussed further solutions to mine the know-ledge of the massive financial data,which provides an effective theoretical basis for the construction of credit risk model.

Key words: Supply chain finance, Credit risk, Big data, Distribution characteristics, Imbalance data, Outliers, Multi-dimension

CLC Number: 

  • TP399
[1]TRUONG N,LI Z,VIRGINIA S,et al.Big data analytics in supply chain management:A state-of-the-art literature review[J].Computers and Operations Research,2018,98:254-264.
[2]RICHARD L V,MATTHEW E,CARL W O.Big data:What it is and why you should care[M].IDC Go-to-Market Services,2011.
[3]RICHARD A T,PETRI T H.Big data applications in operations/supply-chain management:A literature review[J].Computers & Industrial Engineering,2016,101:528-543.
[4]GANTZ J,REINSEL D.Extracting value from chaos[M].IDC Go-to-Market Services,2011:1-12.
[5]HUANG Y Y,HANDFIELD R B.Measuring the benefits of ERP on supply management maturity model:a ‘big data’ me-thod[J].International Journal of Operation & Production Ma-nagement,2015,35 (1):2-25.
[6]CHEN C L P,ZHANG C Y.Data-intensive applications,challenges,techniques and technologies:a survey on Big Data[J].Information Sciences,2014,275(11):314-347.
[7]BABICEANU R F,SEKER R.Big Data and virtualization for manufacturing cyber-physical systems:a survey of the current status and future outlook[J].Computers in industry,2016,81:128-137.
[8]CHAO L M,XING C X,ZHANG Y.Data Science Studies: State-of-the-art and Trends[J].Computer Science,2018,45(1):1-13.(in Chinese)
[9]XU H L,TANG S,MAO R,et al.Various Pivots Based Outlier Dectection Algorithm in Metric Space[J].Chinese Journal of Computers,2017,40(12):2839-2855.(in Chinese)
[11]Demica Limited Company.Research report:A study on the growth of supply chain finance,as evidenced by SCF[EB/OL].
[12]XIAO J,XUE S T,HUANG J,et al.A Semi-Supervised Co-Training Model for Customer Credit Scoring[J].Chinese Journal of Management Science,2016,24(6):124-131.(in Chinese)
[13]YANG J,ZHOU Y G.Credit risk spillovers among financial institutions around the global credit crisis:Firm-level evidence[J].Management Science,2013,59(10):2343-2359.
[14]CHEN H,CHIANG R H,STOREY V C.Business intelligence and analytics:From big data to big impact[J].MIS Quarterly,2012,36(4):1165-1188.
[15]ARCHENAA J,ANITA E M.A survey of big data analytics in healthcare and government[J].Procedia Computer Science,2015,50:408-413.
[16]VATRAPU R,MUKKAMALA R R,HUSSAIN A,et al.Social set analysis:A set theoretical approach to big data analytics[J].IEEE Access,2016,4:2542-2571.
[17]KHAN Z,ANJUM A,SOOMRO K,et al.Towards cloud based big data analytics for smart future cities[J].Journal of Cloud Computing,2015,4(1):2.
[18]FIOSINA J,FIOSINS M,MULLER J P.Big data processing and mining for next generation intelligent transportation systems[J].Journal Teknologi,2013,63(3):21-38.
[19]SLEDGIANOWSKI D,GOMAA M,TAN C.Toward integra- tion of Big Data,technology and information systems competencies into the accounting curriculum[J].Journal of Accounting Education,2017,38:81-93.
[20]CERCHIELLO P,GIUDICI P.Big data analysis for financial risk management[J].Journal of Big Data,2016,3(1):1-12.
[21]ZHAO N,ZHANG X F,ZHANG L J.Overview of Imbalanced Data Classification[J].Chinese Journal of Computers,2018,45(S1):22-27.(in Chinese)
[22]DEBASHREE D,SAROJ K B,BISWAJIT P.Redundancy-dri- ven modified Tomek-link based undersampling:A solution to class imbalance[J].Pattern Recognition Letters,2017,93(1):3-12.
[23]YANG Z,ABHISHEK K S,KWOK L T.Imbalanced classification by learning hidden data structure[J].IIE Transations,2016,48(7):614-628.
[24]YI B H,ZHU J J,LI J.Imbalanced Data Classification on Micro-Credit Company Customer Credit Risk Assessment Using Improved SMOTE Support Vector Machine[J].Chinese Journal of Mangement Science,2016,24(3):24-30.(in Chinese)
[25]PIERRI F,STANGHELLINI E,BISTONI N.Risk analysis and retrospective unbalanced data[J].Revstat-statistical Journal,2016,14(2):157-169.
[26]LI S,SONG W F,QIN H,et al.Deep variance network:An ite- rative,improved CNN framework for unbalanced training datasets[J].Pattern Recognition,2018,81:294-308.
[27]XIONG B Y,WANG G Y,DENG W B.Under-Sampling Method Based on Sample Weight for Imbalanced Data[J].Journal of Computer Research and Development,2016,53(11):2613-2622.(in Chinese)
[28]CHICLANA F,MATA F,PEREZ L G,et al.Type-1 OWA Unbalanced Fuzzy Linguistic Aggregation Methodology:Application to Eurobonds Credit Risk Evaluation[J].International Journal of Intelligent Systems,2018,33(5):1071-1088.
[29]VAPNIK.The nature of statistical learning theory [M].New York:Springer,1995:1-14.
[30]SHAO Y H,CHEN W J,ZHANG J J,et al.An efficient weighted Lagrangian twin support vector machine for imbalanced data classification [J].Pattern Recognition,2014,47(9):3158-3167.
[31]CHENG Y Q.Credit Rating of Small Enterprises Based on Unbalanced Data[J].Operations Research and Management Science,2016,25(6):181-189.(in Chinese)
[32]GOMEZ C L,CAMPS V G,BRUZZONE L.Mean map kernel methods for semisupervised cloud classification[J].IEEE Tran-sactions on Geoscience and Remote Sensing,2010,48(1):207-220.
[33]XIA Z G,XIA S X,CAI S Y,et al.Semi-supervised Gaussian process classification algorithm addressing the class imbalance[J].Journal on Communications,2013,34(5):42-51.(in Chinese)
[34]LI X F,LI J,DONG Y F,et al.A New Learning Algorithm for Imbalanced Data-PCBoost[J].Chinese Journal of Computers,2012,35(2):202-209.(in Chinese)
[35]LI K W,YANG L,LIU W Y,et al.Classification Method of Imbalanced Data Based on RSBoost[J].Computer Science,2015,42(9):249-252.(in Chinese)
[36]ZHU B,HE C Z,LI H Y.Research on Credit Scoring Model Based on Transfer Learning[J].Operations Research and Ma-nagement Science,2015,24(2):201-207.(in Chinese)
[37]CHANG Y C,CHANG K H,CHU H H,et al.Establishing decision tree-based short-term default credit risk assessment mo-dels[J].Communications in Statistics-theory and Methods,2016,45(23):6803-6815.
[38]SUN J,LEE Y C,LI H,et al.Combining B&B-based hybrid feature selection and the imbalance-oriented multiple-classifier ensemble for imbalanced credit risk assessment[J].Technological and Economic Development of Economy,2015,21(3):351-378.
[39]LIU F,MAO Z Z,LI L.Outlier detection for control process data based on fuzzy ARHMM[J].Chinese Journal of Scientific Instrument,2010,31(5):984-990.(in Chinese)
[40]GRACES H,SBARBARO D.Outliers detection in environmental monitoring databases[J].Engineering Application of Artificial Intelligence,2011,24(2):341-349.
[41]JIA R D,LIU J H,MAO Z Z,et al.Outlier detection for batch processes based on robust M-estimation[J].Chinese Journal of Scientific Instrument,2013,34(8):1726-1731.(in Chinese)
[42]JIANG Z,ZHAN Y Z.Noise control and related algorithm for semi-supervised classification[J].Journal of Jiangsu University(Natural Science Edition),2015,36(4):435-438.(in Chinese)
[43]WU J H,ZHANG Y,WANG X J.The Measurement Study of Corporate Bond Default Risk under the Information Disclosure Distortion[J].Jouranl of Applied Statistics and Management,2017,36(1):175-190.(in Chinese)
[44]JIANG M F,TSENG S S,SU C M.Two-phase clustering process for outliers detection[J].Pattern Recognition Letters,2001,22(6-7):691-700.
[45]ZHUANG H,ZHANG J,BROVA G,et al.Mining query-based subnetwork outliers in heterogeneous information networks[C]∥IEEE International Conference on Data Mining,Piscataway.NJ:IEEE,2014:1127-1132.
[46]ZHU L,QIU Y Y,YU S,et al.A Fast KNN-Based MST Outlier Detection Method Chinese[J].Journal of Computers,2017,40(12):2856-2870.(in Chinese)
[47]PENG T,YANG N Y,XU Y B,et al.An Outlier Detection Method Based on Ranking and Clustering in Bi-typed Heterogeneous Network[J].Acta Electronica Sinica,2018,46(2):281-288.(in Chinese)
[48]LIU Y,WANG L M,JIANG J H,et al.SVM Credit Risk Eva- luation Method Based on Eliminating Outliers[J].Journal of Jilin University (Science Edition),2016,54(6):1395-1400.(in Chinese)
[49]KNORR E M,NG R T.Algorithms for mining distance-based outliers in large datasets[C]∥ Proceedings of the 24th International Conference on Very Large Data Bases.New York,USA,1998:392-403.
[50]WANG Y,PARTHASARATHY S,TATIKONDA S.Locality sensitive outlier detection:A ranking driven approach[C]∥Proceedings of the IEEE 27th International Conference on Data Engineering.Hannover,Germany,2011:410-421.
[51]PILLUTLA M R,RAVAL N,BANSAL P,et al.LSH based outlier detection and its application in distributed setting[C]∥Proceedings of the 20th ACM International Conference on Information and Knowledge Management.Glasgow,UK,2011:2289-2292.
[52]WANG X T,SHEN D R,BAI M,et al.BOD:An Efficient Algorithm for Distributed Outlier Detection[J].Chinese Journal of Computers,2016,39(1):36-50.(in Chinese)
[53]JIANG F,SUI Y F,CAO C G.Distance metrics and outlier detection in rough sets[J].Control and Decision,2013,28(1):188-192.(in Chinese)
[54]YAO X,YU L A.A fuzzy proximal support vector machine model and its application to credit risk analysis[J].Systems Engineering-Theory & Practice,2012,32(3):549-554.(in Chinese)
[55]LIU J L,LI J P,XU W X,et al.A Robust Weighted Adaptive LpLS-SVM Method for Credit Risk Assessment[J].Chinese Journal of Management Science,2010,18(5):28-33.(in Chinese)
[56]BHADURI K,MATTHEWS B L,GIANNELLA C R.Algo- rithms for speeding up distance-based outlier detection[C]∥Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego,USA,2011:859-867.
[57]BREUNIG M M.LOF:Identifying density-based local outliers [J].ACM Sigmod Record,2015,29(2):93-104.
[58]JIN W,TUNG A K H,HAN J,et al.Ranking outliers using symmetric neighborhood relationship[J].Lecture Notes in Computer Science,2006,3918:577-593.
[59]ZHOU S B,XU W X.Deviation-based local outlier detection algorithm[J].Chinese Journal of Scientific Instrument,2014,35(10):2293-2298.(in Chinese)
[60]LIU Z T,XU J P,WU M,et al.Review of Emotional Feature Extraction and Dimension Reduction Method for Speech Emotion Recognition[J/OL].Chinese Journal of Computers, Chinese)
[61]MENG D Y,XU C,XU Z B.A New Manifold Reconstruction Method Based on Isomap[J].Chinese Journal of Computers,2010,33(3):545-554.(in Chinese)
[62]ZHANG R C,DU Y B,XUE L G,et al.A hybrid large sample credit evaluation model based on combining similar samples[J].Journal of Management Sciences in China,2018,21(7):77-90.(in Chinese)
[63]CHEN W S,DU Y K.Using Neural Networks and Data Mining Techniques for the Financial Distress Prediction Model[J].Expert Systems with Applications,2009,36:4075-4086.
[64]PAN H P,ZHANG C Z.FEPA-An Adaptive Integrated Prediction Model of Financial Time Series[J].Chinese Journal of Management Science,2018,26(6):26-38.(in Chinese)
[65]WEST D.Neural network credit scoring models[J].Computer &Operations Research,2000,27:1131-1152.
[66]HUA Z,WANG Z,XU X,et al.Predicting Corporate Financial Distress Based on Integration of Support Vector Machine and Logistic Regression[J].Expert Systems with Applications,2007,33(2):434-440.
[67]XIONG Z B.Research on Credit Evaluation Model Based on Nonlinear Principal Component Analysis[J].The Journal of Quantitative & Technical Economics,2013(10):138-151.(in Chinese)
[68]ZHANG H X,MAO Z Z.Research of multidimensional time series credit evaluation based on gray-fuzz analysis model[J].Journal of Management Sciences in China,2011,14(1):28-37.(in Chinese)
[69]ZHANG J,ZHANG B B.The Application of Generalized Semi-parametric Additive Credit Score Model Based on Group-LASSO Method[J].Journal of Applied Statistics and Management,2016,35(3):517-524.(in Chinese)
[70]TENENBAUM J B,SILVA V,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
[71]LI F Y,DENG X.The Application Analysis of SVM Model Based on Isomap in the Credit Risk Assessment of Listed Companies[J].Journal of Hebei University (Philosophy and Social Science),2013,38(1):102-107.(in Chinese)
[72]LIN F,YEH C C,LEE M Y.The use of hybrid manifold lear ning and support vector machines in the prediction of business failure[J].Knowledge-Based Systems,2011,24(1):95-101.
[73]RIBEIRO B,VIEIRA A,DUARTE J,et al.Learning manifolds for bankruptcy analysis[M]∥Advances in Neuro-Information Processing—ICONIP 2008.Berlin:Springer,2008:723-730.
[74]TONG G G,LI S W.Construction and Application Research of Isomap-RVM Credit Assessment Model[J].Mathematical Problems in Engineering,2015,2015:1-7.
[75]XUE A R,YAO L,JU S G,et al.Survey of Outlier Mining[J].Computer Science,2008,35(11):13-18.(in Chinese)
[76]CHEN F L,LI F C.Combination of feature selection approaches with svm in credit scoring[J].Expert System Application,2010,37:4902-4909.
[77]LIU Y,ZHANG L J,HAN Y N,et al.Credit Risk Evaluation Model of Supply Chain Finance Based on Particle Swarm Coo-perative Optimization Algorithm[J].Journal of Jilin University(Science Edition),2018,56(1):119-125.(in Chinese)
[78]HUANG C L,CHEN M C,WANG C J.Credit scoring with a data mining approach based on support vector machines[J].Expert System Application,2007,33:847-856.
[79]WANG D,ZHANG Z Q,BAI R Q,et al.A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring[J].Journal of Computational and Applied Mathematics,2018,329:307-321.
[80]HAGSTROM M.High-performance analytics fuels innovation and inclusive growth:Use big data,hyper connectivity and speed to intelligence to get true value in the digital economy[J].Journal of Advanced Analytics,2012,2:3-4.
[1] YE Ya-zhen, LIU Guo-hua, ZHU Yang-yong. Two-step Authorization Pattern of Data Product Circulation [J]. Computer Science, 2021, 48(1): 119-124.
[2] ZHAO Hui-qun, WU Kai-feng. Big Data Valuation Algorithm [J]. Computer Science, 2020, 47(9): 110-116.
[3] MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122.
[4] CHAO Le-men. Course Design and Redesign for Introduction to Data Science [J]. Computer Science, 2020, 47(7): 1-7.
[5] YANG Kai-zhong, TI Meng-tao and XIE Ying-bai. Improved Bat Optimization Algorithm Based on Compass Operator [J]. Computer Science, 2020, 47(6A): 135-138.
[6] GU Rong-Jie, WU Zhi-ping and SHI Huan. New Approach for Graded and Classified Cloud Data Access Control for Public Security Based on TFR Model [J]. Computer Science, 2020, 47(6A): 400-403.
[7] LI Yong. Stock Investment Strategy Development Based on BigQuant Platform [J]. Computer Science, 2020, 47(6A): 612-615.
[8] GE Yu-ming, HAN Qing-wen, WANG Miao-qiong, ZENG Ling-qiu, LI Lu. Application Mode and Challenges of Vehicular Big Data [J]. Computer Science, 2020, 47(6): 59-65.
[9] LIU Ji-qin, SHI Kai-quan. Big Data Decomposition-Fusion and Its Intelligent Acquisition [J]. Computer Science, 2020, 47(6): 66-73.
[10] XIANG Wei, WANG Xin-wei. Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision [J]. Computer Science, 2020, 47(5): 103-109.
[11] ZENG Wei-liang, WU Miao-sen, SUN Wei-jun, XIE Sheng-li. Comprehensive Review of Autonomous Taxi Dispatching Systems [J]. Computer Science, 2020, 47(5): 181-189.
[12] YU Xin-yi, SHI Tian-feng, TANG Quan-rui, YIN Hui-wu, OU Lin-lin. Industrial Equipment Management System for Predictive Maintenance [J]. Computer Science, 2020, 47(11A): 667-672.
[13] HAO Xiu-mei, SHI Kai-quan. Big Data Intelligent Retrieval and Big Data Block Element Intelligence Separation [J]. Computer Science, 2020, 47(11): 113-121.
[14] WANG Yang, LI Peng, JI Yi-mu, FAN Wei-bei, ZHANG Yu-jie, WANG Ru-chuan, CHEN Guo-liang. High Performance Computing and Astronomical Data:A Survey [J]. Computer Science, 2020, 47(1): 1-6.
[15] KONG Fan-yu, ZHOU Yu-feng, CHEN Gang. Traffic Flow Prediction Method Based on Spatio-Temporal Feature Mining [J]. Computer Science, 2019, 46(7): 322-326.
Full text



[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[7] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[8] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[9] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[10] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .