Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 409-415.doi: 10.11896/jsjkx.200100108

• Big Data & Data Science • Previous Articles     Next Articles

Duplicate Formula Detection Based on Deep Convolutional Neural Network

CHEN Ang1, TONG Wei1, ZHOU Yu-qiang2, YIN Yu2, LIU Qi2   

  1. 1 National Education Examinations Authority,Beijing 100084,China
    2 School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:CHEN Ang,born in 1983,Ph.D,asso-ciate.His main research interests include data mining and educational big data.
    ZHOU Yu-qiang,born in 1998,postgraduate.His main research interests include machine learning and data mi-ning.
  • Supported by:
    This work was supported by the Foundation of National Education Examiantion Authorty for Scientific Research Plan of National Education Examination (GJK2017008) and National Natural Science Foundation of China (61672483).

Abstract: In recent years,with the development of educational intelligence,the Internet education model has become an important carrier of education and teaching.Various online education systems provide learners with a convenient way to learn their vast amount of test resources.However,the accumulated exercise resources suffer from the high repetition rate and low quality due to various sources of test questions and inconsistent collection methods.Therefore,how to accurately and efficiently monitor test questions is an important way to refine network resources and improve the quality of network test questions.In this context,this paper focuses on the problem of repeated detection of picture formulas in science test resources.Through accurate formula recognition detection,it can eliminate the interference of test questions semantics,and then improve the test resource monitoring.In response to this problem,the traditional formula repeat detection method is often based on manually defined rules and difficult to apply to large-scale formula data detection because of cumbersome identification steps,low accuracy and low efficiency.Based on this,this paper proposes a formula repeated detection method based on deep convolutional neural network.Firstly,a multi-channel convolution mechanism is used to automate the extraction and processing of formula picture features,making it suitable for large-scale formula data detection.Then,using the end-to-end output mode,the accumulation of errors that may be caused by too many intermediate steps in the traditional method is avoided.Finally,in order to verify the accuracy and practicability of the model,this paper has carried out sufficient experiments on the standard test data set and the data set of the simulated scan noise.The experimental results show that this method can effectively process the formula pictures of different quality.Good results in both accuracy and efficiency.

Key words: Convolutional neural network, Duplicate formula detection, Exercise quality, Image recognition

CLC Number: 

  • TP301
[1] BRESLOW L,PRITCHARD D E,DEBOER J,et al.Studying Learning in the Worldwide Classroom Research into edX's First MOOC[J].Research & Practice in Assessment,2013,8:13-25.
[2] POLSON M C.Foundations of Intelligent Tutoring Systems[M].Hove,UK:Psychology Press,2013.
[3] HUANG Z,LIU Q,CHEN E,et al.Question Difficulty Prediction for READING Problems in Standard Tests[C]//AAAI.2017:1352-1359.
[4] LIU Q,CHEN E H,ZHU T Y,et al.Research on Educational Data Mining for Online Intelligent Learning[J].Pattern Recognition and Artificial Intelligence,2018,31(1):77-90.
[5] KOHLHASE M,SUCAN I.A search engine for mathematicalformulae[C]//Proceedings of the 8th international conference on Artificial Intelligence and Symbolic Computation (AISC'06).Berlin:Springer-Verlag,2006:241-253.
[6] JADERBERG M,SIMONYAN K,VEDALDI A,et al.Reading Text in the Wild with Convolutional Neural Networks[J].ar-Xiv:1412.1842v1.
[7] LIN X Y,GAO L C,TANG Z.Mathematical Formula Identification and Performance EvaluationinPDFDocuments[J].International JournalonDocument Analysis and Recognition,2014,17(3):239-255.
[8] YIN Y,HUANG Z,CHEN E,et al.Transcribing Content from Structural Images with Spotlight Mechanism[C]//Proceedings of the 24th ACM SIGKDD International Conference on Know-ledge Discovery & Data Mining.ACM,2018:2643-2652.
[9] LIU Q,HUANG Z,HUANG Z,et al.Finding Similar Exercises in Online Education Systems[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.ACM,2018:1821-1830.
[10] WANG H,XU T,LIU Q,et al.MCNE:An End-to-End Framework for Learning Multiple Conditional Network Representations of Social Network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19).Association for Computing Machinery,New York,NY,USA,2019:1064-1072.
[11] LIN X Y,GAO L C,TANG Z.A Text Line Detection Method for Mathematical Formula Recognition[C]//Proceedings of International Conference on Document Analysis and Recognition.2013:339-343.
[12] LI Y H,WANG K J,SHANG G W,et al.Baseline structure analysis and recognition algorithm research of mathematical formula[J].Computer Engineering and Applications,2008,44(16):18-22.
[13] ZANIBBI R.Recognition of mathematics notation via computer using baseline structure[R].Queen's University,Kingston,Ontario,2000.
[14] GUO J N.Research on Detection Algorithm of MathematicialFormula for MathML[D].Jinzhou:Bohai University,2016.
[15] ZHU H,NIE Z,DING M.Image recognition by affine moment invariants in Hartley transform domains[C]//International Symposium on Communications and Information Technologies.IEEE,2010:630-633.
[16] LI J,CHENG J,SHI J,et al.Brief Introduction of Back Propagation (BP) Neural Network Algorithmand Its Improvement[C]//Advances in Computer Science and Information Enginee-ring.Berlin:Springer.2012.
[17] LECUN Y,BENGIO Y.Convolutional networks for images,speech,and time series[J].The Handbook of Brain Theory and Neural Networks,1995,3361(10):1995.
[18] CHOPRA S,HADSELL R,LECUN Y.Learning a similaritymetric discrim-inatively,with application to face verification[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2005).IEEE,2005:539-546.
[19] ZAGORUYKO S,KOMODAKIS N.Learning to compare image patches via convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4353-4361.
[20] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internalcovariate shift[C]//International Conference on Machine Learning.2015:448-456.
[21] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015.
[22] GLOROT X,BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:249-256.
[23] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24] HE K,ZHANG X,REN S,et al.Identity mappings in deep residual networks[C]//European Conference on Computer Vision.Cham:Springer,2016:630-645.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[5] LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[6] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[7] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[8] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[9] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[10] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[11] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[12] SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[13] ZHAO Zheng-peng, LI Jun-gang, PU Yuan-yuan. Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network [J]. Computer Science, 2022, 49(6): 199-209.
[14] ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275.
[15] HU Fu-yuan, WAN Xin-jun, SHEN Ming-fei, XU Jiang-lang, YAO Rui, TAO Zhong-ben. Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network [J]. Computer Science, 2022, 49(5): 10-24.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!