Computer Science ›› 2019, Vol. 46 ›› Issue (10): 63-70.doi: 10.11896/jsjkx.190200346
• Big Data & Data Science • Previous Articles Next Articles
WANG Wei-hong, LIANG Chao-kai, MIN Yong
CLC Number:
[1]中国互联网络信息中心.CNNIC 发布第43次《中国互联网络发展状况统计报告》 [EB/OL].(2019-02-02).http://www.cac.gov.cn/2019-02/28/c_1124175677.html. [2]HAMMER J,MCHUGH J,GARCIA-MOLIN H.Semistruc-tured data:the TSIMMIS experience[C]//East-European Conference on Advances in Databases and Information Systems.British Computer Society,1997:1-8. [3]AROCENA G O,MENDELZON A O.WebOQL:restructuring documents,databases and Webs[C]//International Conference on Data Engineering.IEEE,1998:24-33. [4]NOVELLA T,HOLUBOVÁ I.User-Friendly and Extensible Web Data Extraction[M]//Advances in Information Systems Development.Cham:Springer,2018:225-241. [5]BU Z,ZHANG C,XIA Z,et al.An FAR-SW based approach for webpage information extraction[J].Information Systems Frontiers,2014,16(5):771-785. [6]OITA M,SENELLART P.FOREST:Focused object retrieval by exploiting significant tag paths[C]//Proceedings of the 18th International Workshop on Web and Databases.ACM,2015:55-61. [7]SAHUGUET A,AZAVANT F.Building intelligent web applications using lightweight wrappers[J].Data & Knowledge Engineering,2001,36(3):283-316. [8]LIU L,PU C,HAN W.XWRAP:An XML-Enabled Wrapper Construction System for Web Information Sources[C]//International Conference on Data Engineering.IEEE,2002. [9]BUTTLER D,LIU L,PU C.A fully automated object extraction system for the World Wide Web[C]//Proceedings 21st International Conference on Distributed Computing Systems.IEEE,2001:361-370. [10]CHANG C H,HSU C N,LUI S C.Automatic information extraction from semi-structured web pages by pattern discovery[J].Decision Support Systems,2003,35(1):129-147. [11]WEN Y,ZENG Q,DUAN H,et al.An Automatic Web Data Extraction Approach based on Path Index Trees[J].International Journal of Performability Engineering,2018,14(10):2449-2460. [12]LIU B,GROSSMAN R,ZHAI Y.Mining data records in web pages[C]//Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2003:601-606. [13]ZHAI Y,LIU B.Web data extraction based on partial tree alignment[C]//Proceedings of the 14th international conference on World Wide Web.ACM,2005:76-85. [14]HUANG X,GAO Y,HUANG L,et al.Web Content Extraction Using Clustering with Web Structure[C]// International Symposium on Neural Networks.Cham:Springer,2017:95-103. [15]CAI D,YU S,WEN J R,et al.Vips:a vision-based page segmentation algorithm:Technical Report MSR-TR-2003-79 [R].2003. [16]ZHAO H,MENG W,WU Z,et al.Fully automatic wrapper generation for search engines[C]//Proceedings of the 14th international conference on World Wide Web.ACM,2005:66-75. [17]SIMON K,LAUSEN G.ViPER:augmenting automatic information extraction with visual perceptions[C]//Proceedings of the 14th ACM international conference on Information and know-ledge management.ACM,2005:381-388. [18]LIU W,MENG X,MENG W.Vide:A vision-based approach for deep web data extraction[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(3):447-460. [19]WAI F K,YONG L W,THING V L L,et al.CMDR:Classifying nodes for mining data records with different HTML structures[C]//TENCON 2017-2017 IEEE Region 10 Conference.IEEE,2017:1862-1862. [20]LIU J,LIN L,CAI Z,et al.Deep web data extraction based on visual information processing[J].Journal of Ambient Intelligence and Humanized Computing,2017,10(1):1-11. [21]GOGAR T,HUBACEK O,SEDIVY J.Deep neural networks for web page information extraction[C]//IFIP International Conference on Artificial Intelligence Applications and Innovations.Cham:Springer,2016:154-163. |
[1] | WANG Zhi-juan and LI Fu-xian. Survey on Cross-language Named Entity Translation Pairs Extraction [J]. Computer Science, 2017, 44(Z6): 14-18. |
[2] | LI Hui, TANG Meng and CHEN Hao. Summary of Research on Website Structure Optimization Based on User Behaviour Analysis [J]. Computer Science, 2016, 43(Z6): 384-386. |
[3] | GENG Zeng-min, SHANG Shu-yuan, SHAO Xin-yan, ZHOU Yi-ling and MA Lin. Hierarchical Semantic-based Web Intelligent Fashion Image Retrieval Method [J]. Computer Science, 2016, 43(Z11): 252-255. |
[4] | . Weh-based Term Translation Extraction and Verification Method [J]. Computer Science, 2012, 39(7): 170-174. |
[5] | XIANG Dong,ZHAO Yong,CHEN Yang. Method of Structure and Fusion for Uncertainty Seminar Information [J]. Computer Science, 2012, 39(3): 192-195. |
[6] | ZHU Yan-xu,WANG Huai-min,SHI Dian-x,YIN Gang,YUAN Lin, LI Xiang. Indent Shape Based Approach for Mining Repeated Patterns of HTML Documents [J]. Computer Science, 2011, 38(8): 165-168. |
[7] | . [J]. Computer Science, 2009, 36(3): 193-195. |
[8] | GUO Wen-hong ,FAN Xue-feng (College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China). [J]. Computer Science, 2009, 36(1): 201-204. |
[9] | LI Hong-yu LIU Qing-jiang (Department of Computer and information, A Cheng College, Haerbin Normal University, Haerbin 150301 ,China). [J]. Computer Science, 2008, 35(5): 292-293. |
[10] | . [J]. Computer Science, 2008, 35(2): 150-153. |
[11] | TIAN Chang -Peng (Chongqing Technology and Business University, Chongqing, 400067). [J]. Computer Science, 2007, 34(6): 78-80. |
[12] | REN Zhong-Sheng ,XUE Yong-Sheng (Department of Computer Science, Xiamen University, Xiamen 361005). [J]. Computer Science, 2007, 34(10): 133-136. |
[13] | DAI Dong-Bo ,YIN Jian (Department of Computer Science, Zhongshan University, Guangzhou 510275). [J]. Computer Science, 2006, 33(4): 126-129. |
[14] | . [J]. Computer Science, 2006, 33(2): 135-138. |
[15] | . [J]. Computer Science, 2005, 32(12): 193-196. |
|