计算机科学 ›› 2022, Vol. 49 ›› Issue (5): 113-119.doi: 10.11896/jsjkx.210700131

所属专题: 大数据&数据科学 虚拟专题

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向化学结构的线段聚类算法

朱哲清1,3, 耿海军1,3, 钱宇华1,2,3   

  1. 1 山西大学计算机与信息技术学院 太原030006
    2 山西大学计算智能与中文信息处理教育部重点实验室 太原030006
    3 山西大学大数据科学与产业研究院 太原030006
  • 收稿日期:2021-07-13 修回日期:2021-12-10 出版日期:2022-05-15 发布日期:2022-05-06
  • 通讯作者: 钱宇华(jinchengqyh@126.com)
  • 作者简介:(jxszzq@163.com)
  • 基金资助:
    国家自然科学基金(61672332);山西省重点研发计划(201903D421003);山西省教育厅科技成果转化培育项目(2020CG001);山西省应用基础研究计划(20210302123444);中国高校产学研创新基金(2021FNA02009)

Line-Segment Clustering Algorithm for Chemical Structure

ZHU Zhe-qing1,3, GENG Hai-jun1,3, QIAN Yu-hua1,2,3   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China
    3 Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China
  • Received:2021-07-13 Revised:2021-12-10 Online:2022-05-15 Published:2022-05-06
  • About author:ZHU Zhe-qing,born in 1982,postgra-duate.His main research interests include machine learning and data mi-ning.
    QIAN Yu-hua,born in 1976,Ph.D,professor,is a member of China Computer Federation.His main reserch interests include pattern recognition,feature selection,rough set theory,granular computing and artifificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61672332),Key R & D Program of Shanxi Province (201903D421003),Science and Technology Achievements Transformation and Cultivation Project of Shanxi Provincial Education Department(2020CG001),Applied Basic Research Plan of Shanxi Province(20210302123444) and China University Industry-University-Research Innovation Fund(2021FNA02009).

摘要: 化学键识别是化学结构识别任务的重要组成部分。化学键中的单键、双键和三键都是由线段组成的,采用霍夫变换进行线段检测时容易产生冗余数据和干扰数据。为此,提出了一种面向化学键的线段聚类算法,对霍夫变换检出的线段进行聚类,进而合并冗余线段。具体而言,基于线段间空间关系的分析,定义线段间的相对相似性与间隔相似性度量;利用这两种度量,进行基于线段合并的聚类方法。实验结果表明,所提出的相似性度量可以全面地刻画线段间的相似关系;该算法能获得较好的聚类结果,同时能够准确复原化学键组成线段的真实位置,是一种有效的化学结构图像预处理方法。

关键词: Hough变换, 化学键, 化学结构式识别, 线段聚类

Abstract: Chemical bond recognition is an important sub-task of chemical structure recognition.The single bonds,double bonds and triple bonds of the chemical structure are all composed of line segments,and it is easy to produce redundant data and interfe-rence data when the Hough transform is used for line segment detection.To this end,a clustering algorithm is proposed to cluster the line segments in chemical bonds detected by Hough transform,during which the redundant line segments can be merged dynamically.Specifically,based on the analysis of spatial relationship between the line segments,the relative similarity measure and interval similarity measure between line segments are defined.A clustering method based on the merging of line segments is carried out by using these two measures.Experimental results show that the proposed similarity measures can comprehensively des-cribe the similarity between line segments.The algorithm can obtain good clustering results,and accurately restore the true position of the line segments in the chemical bonds.It is therefore an effective method for chemical structure image preprocessing.

Key words: Chemical bond, Chemical structure recognition, Clustering of line segments, Hough transform

中图分类号: 

  • TP391
[1]QUIRÓS M,GRAŽULIS S,GIRDZIJAUSKAITÉ S,et al.Using SMILES strings for the description of chemical connecti-vity in the Crystallography Open Database[J].Journal of Cheminformatics,2018,10(1):1-17.
[2]MEMON J,SAMI M,KHAN R A,et al.Handwritten optical character recognition (OCR):A comprehensive systematic lite-rature review(SLR)[J].IEEE Access,2020,8:142642-142668.
[3]CASEY R,BOYER S,HEALEY P,et al.Optical recognition of chemical graphics[C]//Proceedings of 2nd International Confe-rence on Document Analysis and Recognition (ICDAR’93).IEEE,1993:627-631.
[4]PARK J,ROSANIA G R,SHEDDEN K A,et al.Automated extraction of chemical structure information from digital raster images[J].Chemistry Central Journal,2009,3(1):1-16.
[5]RAJAN K,ZIELESNY A,STEINBECK C.DECIMER:towards deep learning for chemical image recognition[J].Journal of Cheminformatics,2020,12(1):1-9.
[6]LIANG X,GUO Q,QIAN Y,et al.Evolutionary deep fusionmethod and its application in chemical structure recognition[J].IEEE Transactions on Evolutionary Computation,2021,25(5):883-893.
[7]OLDENHOF M,ARANY A,MOREAU Y,et al.ChemGra-pher:optical graph recognition of chemical compounds by deep learning[J].Journal of Chemical Information and Modeling,2020,60(10):4506-4517.
[8]STEPHENS R S.Probabilistic approach to the Hough trans-form[J].Image & Vision Computing,1990,9(1):66-71.
[9]GIOI R,JAKUBOWICZ J,MOREL J M,et al.LSD:A Fast Line Segment Detector with a False Detection Control[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,32(4):722-732.
[10]LEE J G,HAN J,WHANG K Y.Trajectory clustering:a partition-and-group framework[C]//Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data.2007:593-604.
[11]EL BAHI H,ZATNI A.Document text detection in videoframes acquired by a smartphone based on line segment detector and dbscan clustering[J].Journal of Engineering Science and Technology,2018,13(2):540-557.
[12]LEE S,HYEON D,PARK G,et al.Directional-DBSCAN:Par-king-slot detection using a clustering method in around-view monitoring system[C]//2016 IEEE Intelligent Vehicles Symposium (IV).IEEE,2016:349-354.
[13]WANG W,XIA F,NIE H,et al.Vehicle trajectory clusteringbased on dynamic representation learning of internet of vehicles[J].IEEE Transactions on Intelligent Transportation Systems,2020,22(6):3567-3576.
[14]LIU B,LIU H Z.Lane Detection Algorithm Based on Improved Enet Network[J].Computer Science,2020,47(4):142-149.
[15]LUO J N,ZHANG J M.Rail Area Extraction Using Extended Haar-like Features and DBSCAN Clustering[J].Computer Science,2020,47(6A):153-156.
[16]LI X,LI J,MU T.A Local Map Construction Method for SLAM Problem Based on DBSCAN Clustering Algorithm[C]//International Conference on Bio-Inspired Computing:Theories and Applications.Springer:Singapore,2019:540-549.
[17]CHEN J Y,GUO Z J,YIN Y K.Full Traversal Path Planning and System Design of Intelligent Lawn Mower Based on Hybrid Algorithm[J].Computer Science,2021,48(6A):633-637.
[18]BLUM L C,REYMOND J L.970 million druglike small molecules for virtual screening in the chemical universe database GDB-13[J].Journal of the American Chemical Society,2009,131(25):8732-8733.
[19]PROBST D,REYMOND J L.SmilesDrawer:parsing and dra-wing SMILES-encoded molecular structures using client-side JavaScript[J].Journal of Chemical Information and Modeling,2018,58(1):1-7.
[1] 李超,刘宏哲,袁家政,郑永荣.
一种基于帧间关联的实时车道线检测算法
Real-time Lane Detection Algorithm Based on Inter-frame Correlation
计算机科学, 2017, 44(2): 317-323. https://doi.org/10.11896/j.issn.1002-137X.2017.02.055
[2] 何立新,孔斌,杨静,许媛媛,王斌.
基于特征分解与组合的圆形阀门把手的检测与定位
Detection and Location on Circular Valve Handle Based on Feature Decomposition and Combination
计算机科学, 2016, 43(4): 284-289. https://doi.org/10.11896/j.issn.1002-137X.2016.04.058
[3] 曲智国,谭贤四,林强,王红,高颖慧.
基于直线Hough变换的图像配准方法
Image Registration Method Based on Straight-line in Hough Parameter Space
计算机科学, 2014, 41(Z11): 107-109.
[4] 陈昊,马钺,陈帅,李昭月.
改进的随机Hough变换的头部区域检测算法
Improved Randomized Hough Method of Circle Detection
计算机科学, 2013, 40(Z6): 163-165.
[5] 程慧,张健沛.
基于GA-PSO Hough变换的建筑物平面重构
Building Planar Recognition Based on GA-PSO Hough Transform
计算机科学, 2013, 40(9): 300-301.
[6] 王燕清,辛柯俊,陈德运,吴剑.
基于启发式概率Hough变换的道路边缘检测方法
Road Edge Detection Based on Heuristic Probabilistic Hough Transform
计算机科学, 2013, 40(9): 279-283.
[7] 康景磊,郭业才.
基于蚁群和Hough变换的虹膜定位算法
Iris Location Algorithm Based on Ant Colony and Hough Transform
计算机科学, 2012, 39(Z11): 384-385.
[8] 谢忠红,郭小清,姬长英,朱淑鑫.
基于梯度相位编组的树枝识别新算法
New Method for Branch Recognition Based on Gradient Phase Grouping
计算机科学, 2012, 39(5): 254-256.
[9] 郭斯羽,翟文娟,唐求,朱院娟.
结合Hough变换与改进最小二乘法的直线检测
Combining the Hough Transform and an Improved Least Squares Method for Line Detection
计算机科学, 2012, 39(4): 196-200.
[10] 张朝亮,江汉红,张博,姜春良.
基于hough变换和harris检测的标尺图像潮位测量
Tidal Level Measurement in Ruler Image Based on Hough Transform and Harris Detection
计算机科学, 2011, 38(3): 283-285.
[11] 邱武,丁明跃,周华.
基于实时灰度Hough变换的超声图像针状物体检测
Needle Segmentation in US Images Based on Real-time Gray-scale Hough Transformation
计算机科学, 2009, 36(11): 269-272.
[12] 魏志强 孙亚兵 纪筱鹏 杨淼.
数字城市中矩形建筑物区域的自动获取

计算机科学, 2009, 36(1): 211-215.
[13] 刘文娟 何怡刚.
印鉴识别系统中印鉴录入的研究

计算机科学, 2008, 35(8): 129-130.
[14] .
基于多粒度数据融合的直线检测算法

计算机科学, 2007, 34(9): 213-217.
[15] 杨文杰 胡明昊 杨静宇.
一种快速的基于边缘的道路检测算法

计算机科学, 2006, 33(5): 257-260.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!