计算机科学 ›› 2016, Vol. 43 ›› Issue (7): 28-34.doi: 10.11896/j.issn.1002-137X.2016.07.004

• 目次 • 上一篇    下一篇

基于云计算的汉字文化数字化平台的架构研究

杨颐,张桂刚,王健,黄卫星,苏海霞   

  1. 中国科学院自动化研究所 北京100190,中国科学院自动化研究所 北京100190,中国科学院自动化研究所 北京100190,中国科学院自动化研究所 北京100190,中国科学院自动化研究所 北京100190
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受2014年度中央文化产业发展专项资金(Y4T1011CA1),国家科技支撑计划重点项目(2015BAK25B04,2015BAK25B03)资助

Cloud Computing Architecture of Chinese Character Culture Digitization System

YANG Yi, ZHANG Gui-gang, WANG Jian, HUANG Wei-xing and SU Hai-xia   

  • Online:2018-12-01 Published:2018-12-01

摘要: 汉字是中华文明的核心元素,起到了记载中国历史和传承中华文化的重要作用。计算机科学为汉字文化数字化提供了重要的技术手段。在汉字文化的数字化信息日益增多的情况下,引入云计算和大数据技术进行数据存储、管理和分析,成为了汉字数字化的一个重要研究方向。汉字文化数字化系统便是在这种需求下研发的一套交互式汉字文化综合体验软件平台,其架构具有可扩展性、高可用性以及高安全性,并使用了数据预取、缓存机制等技术加快了数据的访问速度。同时,该系统提供了实时、非实时和半实时数据分析的功能,可以支撑汉字文化大数据的分析,更好地满足用户未来对汉字文化大数据的应用服务。该系统架构设计的有效性通过实验得到了验证。

关键词: 汉字,数字化,云计算,大数据,Hadoop,架构

Abstract: Chinese Character is a core element of Chinese civilization,which plays an important role in Chinese culture and history.Computer technology provides essential methods for Chinese character digitization.Chinese character culture digitization system (CCCDS) was developed for digitizing not only Chinese character but also Chinese culture around the characters.To deal with the rapidly increasing digitized Chinese character culture information,cloud computing and big data techniques were introduced as important means for data store,data management,and data analytics.The system provides an interactive user experience platform whose architecture is of scalable/scalability,high availability,and security.The system possesses real-time/history data analysis modules for the big data analysis in order to satisfy the requirements of applications and services based on Chinese character culture.The architecture of the system was validated by experiments.

Key words: Chinese character,Digitization,Cloud computing,Big data,Hadoop,Architecture

[1] Wang Xuan.A brief introduction to the computerized Chinese Character Editing and laser type setting system[J].Chinese Journal of Computers,1981(2):83-89(in Chinese) 王选.计算机-激光汉字编辑排版系统简介[J].计算机学报,1981(2):83-89
[2] Yu Shi-wen.The Application of Grammatical Analysis Tech-nique in Chinese Input[J].Journal of Chinese information processing,1988,2(3):20-26(in Chinese) 俞士汶.中文输入中语法分析技术的应用[J].中文信息学报,1988,2(3):20-26
[3] Ma Xiao-hu,Yang Yi-ming,Huang Wen-fan,et al.Research on the Technology of the Automatic Generation “Jiaguwen” Outline Font and Building of Universal Jiaguwen Font[J].Applied Linguistics,2004(3):105-111(in Chinese) 马小虎,杨亦鸣,黄文帆,等.甲骨文轮廓字形生成技术研究与通用甲骨文字库的建设[J].语言文字应用,2004(3):105-111
[4] Chen Bin-ren.An Attempt to Digitize Ancient Rare Books[J].New Technology of Library and Information Service,1998(1):22-25(in Chinese) 陈秉仁.古籍善本数字化的尝试[J].现代图书情报技术,1998(1):22-25
[5] National Institute of Standards and Technology[S].2011
[6] Muller A l,Wilson S.Virtualization with Vmware EsxServer[M].Syngress Publishing,2005
[7] Apache Tomcat.http://tomcat.apache.org
[8] Brittain J,Darwin I F.Tomcat:The Definitive Guide(2nd Edition)[M].O’Reilly Media,2007
[9] Nginx.http://nginx.org
[10] Reese W.Nginx:the high-performance web server and reverse proxy [J].Linux Journal,2008,2008(173)
[11] MySQL.MySQL AB.http://www.mysql.com
[12] Bunch C,Chohan N,Krintz C,et al.An Evaluation of Distributed Datastores Using the AppScale Cloud Platform[C]∥Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing,2010(CLOUD’10).Washington,DC,USA:IEEE Computer Society,2010:305-312
[13] Hadoop.http://hadoop.apache.org
[14] Shvachko K,Kuang Hai-rong,Radia S,et al.The Hadoop Distributed File System[C]∥2010 IEEE 26th Symposium Mass Storage Systems and Technologies (MSST),2010.Incline Village,NV:IEEE,2010:1-10
[15] Venner J.Pro Hadoop [M].Apress,2009
[16] White T.Hadoop:The Definitive Guide[M].O’Reilly Media,Yahoo Press,2009
[17] Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[C]∥Sixth Symposium on Operating System Design and Implementation 2004(OSDI’04).New York,NY,USA:ACM,2008:107-113
[18] Ceph.http://ceph.com
[19] Weil S A,Brandt S A,Miller E L,et al.Ceph:A Scalable,High-Performance Distributed File System[C]∥Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI).Seattle,WA,Berkeley,CA,USA:USENIX Association,2006:307-320
[20] Danga Interactive.MogileFS.http://www.danga.com/mogilefs
[21] Mencached.http://memcached.org
[22] Fitzpatrick B.Distributed caching with memcached [J].Linux Journal,2004,2004(124):72-76
[23] Apache Flume.https://flume.apache.org
[24] Apache HBase.http://hbase.apache.org
[25] Chang F,Dean J,Ghemawat S.Bigtable:a distributed storagesystem for structured data[J].ACM Transactions on Computer Systems,2008,6(2):205-218
[26] Khetrapal A,Ganesh V.HBase and Hypertable for large scale distributed storage,systems:A Performance evaluation,for Open Source BigTable Implementations[EB/OL].http://www.ankurkhetrapal.com/downloads/Hypertable.HBaseEval2.pdf
[27] Carstoiu D,Cernian A,Olteanu A.Hadoop Hbase-0.20.2 performance evaluation[C]∥2010 4th International Conference on New Trends in Information Science and Service Science (NISS),2010.IEEE,2010:84-87
[28] Apache Spark.http://spark.apache.org
[29] Zaharia M,Chowdhury M,Franklin M J,et al.Spark:Cluster Computing with Working Sets[C]∥ Proceedings of the 2nd USENIX conference on Hot topics in cloud computing,2010(HotCloud’10).Berkeley,CA,USA:USENIX Association,2010:1765-1773
[30] Zaharia M,Chowdhury M,Das T,et al.Spark:Resilient distri-buted datasets:a fault-tolerant abstraction for in-memory cluster computing[C]∥Resilient Distributed Datasets:a fault-tolerant Abstraction for in-memory Cluster Computing,2012.Berkeley,CA,USA:USENIX Association,2012:141-146
[31] Kerberos.http://www.kerberos.org
[32] Kohl J,Neuman C.The Kerberos Network Authentication Ser-vice (V5) [R].United States:RFC Editor,1993
[33] Kwok T,Nguyen T,Lam L.A software as a service with multi-tenancy support for an electronic contract management application[C]∥Proceeding Int.Conf.on Services Computing (SCC),2008.Washington,DC,USA:IEEE Computer Society,2008:179-186
[34] Wang Zhi-hu,Guo Chang-jie,Gao Bo,et al.A study and performance evaluation of the multi-tenant data tier design patterns for service oriented computing[C]∥Proceeding of the International Conference on e-Business Engineering (ICEBE),2008.Washington DC,USA:IEEE Computer Society,2008:94-101
[35] Apache Kafka.http://kafka.apache.org
[36] Kreps J,Narkhede N,Rao J.Kafka:A distributed messagingsystem for log processing[C]∥Proceedings of 6th International Workshop on Networking Meets Databases (NetDB) .Athens,Greece:ACM,2011
[37] Karlton F P,Kocher P.The SSL3.0 Protocol[R].NetscapeCommunications Corp,1996
[38] Dierks T,Allen C.The TLS Protocol Version 1.0[R].IE TF RFC2246,January 1999
[39] Kaplan M J.SaaS Survey Shows New Model Becoming Mainstream[J].Cutter Consortium Executive Update,2005,6(22):1-5
[40] Chong F,Carraro G,Wolter R.Multi-Tenant Data Architecture[EB/OL].(2006).https://msdn.microsft.com/en-us/library/aa479086.aspx
[41] Chong F,Carraro G.Architecture Strategies for Catching theLong Tail[EB/OL].(2006).https://msdn.microsoft.com/en-us/library/aa479069.aspx
[42] Tu Jian-guo.On the Coding and Composing of Chinese Character[J].Library,2002(1):60(in Chinese) 涂建国.汉字编码和汉字排检法[J].图书馆,2002(1):60
[43] Ni Xiao-jun.A High-Performance Unicode/GB Transcoding Algorithm[J].Computer Technology and Development,2009,19(9):21-24(in Chinese) 倪晓军.高效Unicode/GB编码转换算法的设计和实现[J].计算机技术与发展,2009,19(9):21-24
[44] Han Ying,Li Jian-yu,Huang Xiang-lin,et al.The complex networks of the parts word in Chinese structure[J].Journal of Harbin Engineering University,2006,27(z1):580-583(in Chinese) 韩莹,李健瑜,黄祥林,等.构成汉字偏旁字符的复杂网络[J].哈尔滨工程大学学报,2006,27(z1):580-583
[45] Lu Hao-ru,Yang Yuan-yuan.A General Discussion For Handprinted Chinese Character Recognition[J].Computer Applications and Software,1994(2):1-8(in Chinese) 路浩如,杨源远.手写体汉字识别问题综论[J].计算机应用与软件,1994(2):1-8
[46] Yang Zhao,Tao Da-peng,Zhang Shu-ye,et al.Similar handwritten Chinese character recognition based on deep neural networks with big data[J].Journal on Communications,2014,5(9):184-189(in Chinese) 杨钊,陶大鹏,张树业,等.大数据下的基于深度神经网的相似汉字识别[J].通信学报,2014,5(9):184-189
[47] Zhou Gui-bin.Personalized Handwritten Chinese Character Re-cognition System Based on Cloud Computing[D].Guangzhou:South China University of Technology,2012(in Chinese) 周贵斌.基于云计算平台的个性化手写识别系统的研究[D].广州:华南理工大学,2012
[48] Cloudera.http://www.cloudera.com
[49] phpMyAdmin.http://www.phpmyadmin.net
[50] Zhang Gui-gang,Li Chao.A Semantic++ MapReduce Parallel Programming Model [J].International Journal of Semantic Computing,2014,8(3):1-21
[51] Zhang Gui-gang,Li Chao,Zhang Yong,et al.MapReduce++:Efficient Processing of MapReduce Jobs in the Cloud [J].Journal of Computational Information Systems,2012,8(14):5757-5764
[52] Zhang Gui-gang,Wang Jian,et al.A Semantic++ MapReduce-A Preliminary Report[C]∥2014 IEEE International Conference on Semantic Computing,2014.Newport Beach,CA,USA,2014:330-336
[53] Zhang Gui-gang,Li Chao,Zhang Yong,et al.Massive Data Query Optimization on Large Clusters[J].Journal of Computational Information Systems,2012,8(8):1391-1398
[54] Zhang Gui-gang,Li Chao,Zhang Yong,et al.An Efficient Massive Data Processing Model in the Cloud—A Preliminary Report[C]∥Proceedings 7th ChinaGrid Annual Conference,2012.IEEE Computer Society Press,2012:148-155
[55] Apache JMeter.http://jmeter.apache.org

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!