Computer Science ›› 2021, Vol. 48 ›› Issue (11A): 218-224.doi: 10.11896/jsjkx.210100230
• Big Data & Data Science • Previous Articles Next Articles
CHEN Ying-ren, GUO Ying-nan, GUO Xiang, NI Yi-tao, CHEN Xing
CLC Number:
[1]CNNIĆs 45th Statistical Report on the Development of China'sInternet[EB/OL].http://www.cac.gov.cn/2020-04/27/c_1589535470378587.html. [2]CUI C,GONG J.Overview of Web Information Extraction Research[J].Computer Knowledge and Technology:Academic Exchange,2011,7(4):2279-2280. [3]CAFARELLA M J,HALEVY A Y,WANG D Z,et al.Web-Tables:Exploring the power of tables on the web[J].Procee-dings of the VLDB Endowment,2008,1(1):538-549. [4]ZHANG J.Research and Implementation of Web InformationAutomatic Extraction Technology[D].Wuhan:Wuhan University of Technology,2009. [5]EMILIO F,ROBERT B.Automatic Wrapper Adaptation byTree Edit Distance Matching[C]//Proceedings of the 2nd International CIMA Workshop.Springer,2011:41-54. [6]CHIDLOVSKII B.Automatic Repairing of Web Wrappers[C]//Proceeding of the Third International Workshop.ACM,2001:24-30. [7]KNOBLOCK C A,LERMAN K,MINTON S N.Wrapper Maintenance:A Machine Learning Approach[J].Computer Science,2011,18(1):2003. [8]MENG X,HU D,LI C.Schema-guided wrapper maintenance for web-data extraction[C]//Fifth ACM CIKM International Workshop on Web Information and Data Management.ACM,2003:1-8. [9]KOWALKIEWICZ M,KACZMAREK T,ABRAMOWICZ W.myPortal:Robust Extraction and Aggregation of Web Content[C]//Proceedings of the 32nd International Conference on Very Large Data Bases.DBLP,2006:1219-1222. [10]DALVI N N,BOHANNON P,SHA F.Robust web extraction:an approach based on a probabilistic tree-edit model[C]//ACM Sigmod International Conference on Management of Data.ACM,2009:335-348. [11]LEOTTA M,STOCCO A,RICCA F,et al.Reducing Web Test Cases Aging by Means of Robust XPath Locators[C]//IEEE International Symposium on Software Reliability Engineering Workshops.IEEE,2014:449-454. [12]LIU D,WANG X,LI H,et al.Robust Web Extraction Based on Minimum Cost Script Edit Model[J].Procedia Engineering,2012,29(1):1119-1125. [13]CHU Y C,HSU C C,LEE C J,et al.Automatic data extraction of websites using data path matching and alignment[C]//Fifth International Conference on Digital Information Processing & Communications.IEEE,2015. [14]LIU D L,LIU X,MA L,et al.Domain adaptation of web data extraction based on bootstrapping method[C]//International Conference on Electronics.2017. [15]GULHANE P,MADAAN A,MEHTA R,et al.Web-scale information extraction with vertex[C]//2011 IEEE 27th International Conference on Data Engineering.IEEE,2011:1209-1220. [16]WONG T L,LAM W.Adapting Web information extractionknowledge via mining site-invariant and site-dependent features[J].ACM Transactions on Internet Technology,2007,7(1):6. [17]YANG P,ZHENG Q L,PENG H,et al.A stepwise learning approach to automatic discovery of interest data blocks[C]//Proceedings of 2004 International Conference on Machine Learning and Cybernetics.IEEE,2004:1441-1446. [18]DENG J S,ZHENG Q L,PENG H.Web page information extraction based on keyword clustering and node distance[J].Computer Science,2007(4):217-220. [19]CHANG Y S.Adaptable wrapper generation for web page format change[C]//Proc.5th Int.Conf.on Applied Computer Science.World Scientific and Engineering Academy and Society,Stevens Point,Wisconsin,USA,2006:147-152. [20]LIU D,MA L,LIU X.Research on Adaptive Wrapper in Deep Web Data Extraction[C]//International Conference on Internet of Vehicles.Cham:Springer,2015:409-423. [21]TEKALE A A,NANDGAONKAR S S.Automatic wrapper adaptation system[J].International Journal of Scientific & Engi- neering Research,2013,4(3):7. [22]REIS D C,GOLGHER P B,SILVA A S,et al.Automatic webnews extraction using tree edit distance[C]//Proceedings of the 13th International Conference on World Wide Web (WWW 2004).ACM,2004:502-511. [23]KIM Y,PARK J,KIM T,et al.Web Information Extraction byHTML Tree Edit Distance Matching[C]//Proceedings of the 5th International Conference on Convergence Information Technology.ACM,2007:2455-2460. [24]FERRARA E,BAUMGARTNER R.Automatic wrapper adap-tation by tree edit distance matching[M]//Combinations of IntelligentMethodsandApplications.Berlin:Springer,2011:41-54. [25]JOSHI S,AGRAWAL N,KRISHNAPURAM R,et al.A bag of paths mode! for measuring structural similarity in Web documents[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2003:577-582. [26]FERRARA E,DE MEO P,FIUMARA G,et al.Web data extraction,applications and techniques:A survey[J].Knowledge-Based Systems,2014,70:301-323. |
[1] | WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220. |
[2] | WANG Yi, LI Zheng-hao, CHEN Xing. Recommendation of Android Application Services via User Scenarios [J]. Computer Science, 2022, 49(6A): 267-271. |
[3] | NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278. |
[4] | LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297. |
[5] | WU Lan, WANG Han, LI Bin-quan. Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks [J]. Computer Science, 2021, 48(6A): 357-363. |
[6] | MA Chuang, TIAN Qing, SUN He-yang, CAO Meng, MA Ting-huai. Unsupervised Domain Adaptation Based on Weighting Dual Biases [J]. Computer Science, 2021, 48(2): 217-223. |
[7] | LIU Shan-shan, ZHU Hai-long, HAN Xiao-xia, MU Quan-qi, HE Wei. Enterprise Risk Assessment Model Based on Principal Component Regression and HierarchicalBelief Rule Base [J]. Computer Science, 2021, 48(11A): 570-575. |
[8] | YUAN Chen-hui, CHENG Chun-ling. Deep Domain Adaptation Algorithm Based on PE Divergence Instance Filtering [J]. Computer Science, 2020, 47(8): 151-156. |
[9] | WANG Jing-yu, LIU Si-rui. Research Progress on Risk Access Control [J]. Computer Science, 2020, 47(7): 56-65. |
[10] | SHI Chao-wei, MENG Xiang-ru, MA Zhi-qiang, HAN Xiao-yang. Virtual Network Embedding Algorithm Based on Topology Comprehensive Evaluation and Weight Adaptation [J]. Computer Science, 2020, 47(7): 236-242. |
[11] | ZHONG Ya,GUO Yuan-bo,LIU Chun-hui,LI Tao. User Attributes Profiling Method and Application in Insider Threat Detection [J]. Computer Science, 2020, 47(3): 292-297. |
[12] | TAN Jian-hao, YIN Wang, LIU Li-ming, WANG Yao-nan. Robust Long-term Adaptive Object Tracking Based onMulti-correlation Filtering Strategy [J]. Computer Science, 2020, 47(12): 169-176. |
[13] | YANG Pei-jian, WU Xiao-fu, ZHANG Suo-fei, ZHOU Quan. Semantic Segmentation Transfer Algorithm Based on Atrous Convolution Discriminator [J]. Computer Science, 2020, 47(11): 174-178. |
[14] | LI Fang,LI Zhi-hui,XU Jin-xiu,FAN Hao,CHU Xue-sen,LI Xin-liang. Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System [J]. Computer Science, 2020, 47(1): 24-30. |
[15] | XU Fei-xiang,YE Xia,LI Lin-lin,CAO Jun-bo,WANG Xin. Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm [J]. Computer Science, 2020, 47(1): 199-204. |
|