计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 110-116.doi: 10.11896/jsjkx.191000156
赵会群1, 吴凯锋2
ZHAO Hui-qun1, WU Kai-feng2
摘要: “大数据”已经成为计算机领域使用频率最高的专业词汇之一,而且已经逐渐变成了一个商品名称。无论是从学术研究角度,还是从数据交易需求角度,对大数据集的可用性进行评价都是一个新的问题。文中提出了一个大数据可用性评价模型,为学术和流通领域提供参考。结合大数据的4V(Volume,Variety,Velocity,Value)特性,分段统计样本数据的4V特性分布,从而给出基于分段分布的大数据特性概率模型,以及大数据可用性加权评价模型。文中还提出了实现大数据分块抽样的算法,以及大数据评价模型的各个特性加权系数的估计算法。结合视频大数据的可用性评价需求,展示所提模型和算法的具体应用。大数据可用性评价模型可以用于数据科学实验的数据评价,也可以用于大数据交易市场的数据集定价。给出了实际评价工作中,标准化(商品化)数据集以及确定数据评价基准等具体操作方面的解决方案。应用案例对所提模型有支持作用,进一步检验了模型的可行性。
中图分类号:
[1] LI J Z,LIU X M.An important aspect of big data:data availabi-lity [J].Computer Research and Development,2013,50(6):1147-1162. [2] WANG S,WANG H J,XI X P,et al.Architectural Big Data:Challenges,Status Quo and Prospects[J].Chinese Journal of Computers,2011,34(10):1741-1752. [3] LIANG J Y,WANG F,DANG C Y,et al.An efficient rough feature selection algorithm with a multigranulation view[J].Interational Journal of Approximate Reasoning,2012,53:912-926. [4] ZHOU H X,CHEN S C.A Canonical Correlation Analysis of Ordered Discrimination[J].Journal of Software,2014,25(9):2018-2025. [5] HUO W,MENG X F.Research on Trajectory Privacy Protec-tion Technology[J].Chinese Journal of Computers,2011,34(10):1820-1830. [6] CHENG Y X.Methodology and Practice of Data Asset Management in the Age of Big Data[J].Computer Applications and Software,2018,35(11):326-329. [7] ZHAO Z R.Analysis of Domestic Big Data Transaction Pricing[J].Information Security & Communication Secrecy,2017(5):61-67. [8] CHEN Y,ZHOU J E,DU J Q.A Credit Evaluation MethodBased on Transaction Data[J].Computer Applications and Software,2018,35(5):168-171. [9] VINAYAK R,BORKAR,MICHAEL J.Big Data Platforms:What’s The Next?[J].XRDS·FALL,2012(1):44-49. [10] WANG W,ZHANG M J,WANG J.Research on Risk FactorIdentification in Big Data Transaction Business Process [J/OL].[2019-07-08].http://kns.cnki.net /kcms/detail/11.1762.G3.20190603.0844.004.html. [11] YE Q Q,MENG X F,ZHU M J,et al.A Review of Localized Differential Privacy Research[J].Journal of Software,2018,29(7):1981-2005. [12] WANG H L,TIAN Y L,YIN X.Big Data Confirmation Scheme Based on Blockchain[J].Computer Science,2018,45(2):15-19,24. [13] HE C,WANG Y R.Research on the Difficulties and Countermeasures of Big Data Trading Platform in China[J].Modern Love Newspaper,2017,37(8):98-105,153. [14] NIYATOD,ABUALSHEIKHM,PING WING,et al.Marketmodel and optimal pricing scheme of big data and internet of things(IOT)[J/OL].Arxiv,2016:1-6.https://xueshu.baidu.com/usercenter/paper/show?paperid=8038a12a20a285199b002c907070d4f9&site=xueshu_se. [15] DEEP S,KOUTRIS P.The design of arbitrage-free data pricing schemes[J].Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2017(12):1-18. [16] TAN X T,GU Y Y,RUAN T,et al.Confidence Interval Method for Data Set Classification Availability Evaluation[J].Computer Science,2019,46(1):78-85. [17] WU X D,DONG B B,CAO X Z,et al.Data Governance Technology [J/OL].[2019-07-02].https://doi.org/10.13328/j.cnki.jos.005854. [18] GUO B,LI Q,DUAN X L,et al.Personal Data Banking-A New Model of Personal Big Data Asset Management and Value-added Services Based on Bank Architecture[J].Computer Journal,2017,40(1):126-143. [19] EMC Solution Group.Big data-as-a-service:A market and technology perspective[R].2012. [20] LIU H F,ZHENG H,AHMAD M,et al.A new user similarity model to improve the accuracy of collaborative filtering[J].Knowledge-Based Systems,2014(56):156-166. [21] ZHAO H Q,SUN J,ZHAO R X.A Model for Assessing the Dependability of Internetware Software Systems[C]//IEEE 39th Annual International Computers,Software & Applications Conference.2015:578-581. [22] LE H S.Dealing with the new user cold-start problem in recommender systems:A comparative review[J].Information Systems,2016,58:87-104. [23] KATARYA R,VERMA O P.Recent developments in affective recommender systems[J/OL].Physica A Statal Mechanics & Its Applications,2016:182-190.https://xueshu.baidu.com/usercenter/paper/show?paperid=8038a12a20a285199b002c907070d4f9&site=xueshu_se. [24] TOMMASO D N,JESSICA R,PAOLO T,et al.Adaptive multi-attribute diversity for recommender systems[J].Information Sciences,2017,3:234-253. [25] MARÍA D C R H,SERGIO I,RAMÓN H R T L.DataGen-CARS:A generator of synthetic data for the evaluation of context-aware recommendation systems[J].Pervasive and Mobile Computing,2017,7:516-541. [26] LI J Z,WANG H Z,GAO H.Research Progress in Big Data Usa-bility[J].Journal of Software,2016,27(7):1605-1625. [27] Guiyang Big Data Trading Center.2016 China Big Data Transaction White Paper[OL].http://www.gbdex.com/website/view/bigData.jsp. |
[1] | 夏奴奴, 杨晋吉, 赵淦森, 莫晓珊. 基于概率模型的云辅助的轻量级无证书认证协议的形式化验证 Formal Verification of Cloud-aided Lightweight Certificateless Authentication Protocol Based on Probabilistic Model 计算机科学, 2019, 46(8): 206-211. https://doi.org/10.11896/j.issn.1002-137X.2019.08.034 |
[2] | 周女琪, 周宇. 基于概率模型检测的Web服务组合多目标验证 Multi-objective Verification of Web Service Composition Based on Probabilistic Model Checking 计算机科学, 2018, 45(8): 288-294. https://doi.org/10.11896/j.issn.1002-137X.2018.08.052 |
[3] | 刘爽, 魏欧, 郭宗豪. 基于概率模型检测和遗传算法的基因调控网络的无限范围优化控制 Infinite-horizon Optimal Control of Genetic Regulatory Networks Based on Probabilistic Model Checking and Genetic Algorithm 计算机科学, 2018, 45(10): 313-319. https://doi.org/10.11896/j.issn.1002-137X.2018.10.058 |
[4] | 杜伊,何洋,洪玫. 概率模型检测在动态能耗管理中的应用 Application of Probabilistic Model Checking in Dynamic Power Management 计算机科学, 2018, 45(1): 261-266. https://doi.org/10.11896/j.issn.1002-137X.2018.01.046 |
[5] | 刘付勇,高贤强,张著. 基于改进贝叶斯概率模型的推荐算法 Improved Bayesian Probabilistic Model Based Recommender System 计算机科学, 2017, 44(5): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2017.05.052 |
[6] | 郭宗豪,魏欧. 使用模型检测解决概率布尔网络优化控制 Optimal Control of Probabilistic Boolean Networks Using Model Checking 计算机科学, 2017, 44(5): 193-198. https://doi.org/10.11896/j.issn.1002-137X.2017.05.035 |
[7] | 刘云恒,刘耀宗. 基于Hadoop的公安视频大数据的处理方法 Hadoop-based Public Security Video Big Data Processing Method 计算机科学, 2016, 43(Z6): 448-451. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.105 |
[8] | 杨蓓,周兰江,余正涛,刘丽佳. 半监督学习的老挝语词性标注方法研究 Research on Semi-supervised Learning Based Approach for Lao Part of Speech Tagging 计算机科学, 2016, 43(9): 103-106. https://doi.org/10.11896/j.issn.1002-137X.2016.09.019 |
[9] | 张恒巍,韩继红,寇 广,卫 波. 云计算环境中服务动态选择算法研究 Research on Service Dynamic Selection Algorithm in Cloud Computing 计算机科学, 2015, 42(5): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2015.05.050 |
[10] | 开金宇,缪淮扣,高洪皓. Web服务计算组合流程QoS验证 Verification QoS of Web Services Compositional Processes 计算机科学, 2015, 42(12): 120-123. |
[11] | 刘建伟,黎海恩,罗雄麟. 概率图模型表示理论 Representation Theory of Probabilistic Graphical Models 计算机科学, 2014, 41(9): 1-17. https://doi.org/10.11896/j.issn.1002-137X.2014.09.001 |
[12] | 余娟,贺昱曜,冯晓华. 改进的分布估计算法求解软硬件划分问题 Solving HW/SW Partitioning Problem by Improved Estimation of Distribution Algorithm 计算机科学, 2014, 41(9): 285-289. https://doi.org/10.11896/j.issn.1002-137X.2014.09.054 |
[13] | 柴变芳,贾彩燕,于剑. 基于统计推理的社区发现模型综述 Overview of Community Detection Models on Statistical Inference 计算机科学, 2012, 39(8): 1-. |
[14] | 王晶 戎玫 张广泉 祝义. 基于概率模型检测的Web服务组合验证 Validation of Web Service Composition Based on Probabilistic Model Checking 计算机科学, 2012, 39(1): 120-123. |
[15] | 梁家荣,花仁杰. 具有失效链路的star网络可靠性分析 Reliability Analysis of star Network with Link Failures 计算机科学, 2010, 37(6): 106-110. |
|