计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 47-54.doi: 10.11896/jsjkx.200800187

• 新型分布式计算技术与系统* 上一篇    下一篇

一种基于分布式编码的卷积优化算法

苑晨宇, 谢在鹏, 朱晓瑞, 屈志昊, 徐媛媛   

  1. 河海大学计算机与信息学院 南京211100
  • 收稿日期:2020-08-28 修回日期:2020-11-07 出版日期:2021-02-15 发布日期:2021-02-04
  • 通讯作者: 谢在鹏(zaipengxie@hhu.edu.cn)
  • 作者简介:chenyu_yuan@hhu.edu.cn
  • 基金资助:
    国家重点研发课题(2016YFC0402710);国家自然科学基金重点项目(61832005)

Convolutional Optimization Algorithm Based on Distributed Coding

YUAN Chen-yu, XIE Zai-peng, ZHU Xiao-rui, QU Zhi-hao, XU Yuan-yuan   

  1. School of Computer and Information,Hohai University,Nanjing 211100,China
  • Received:2020-08-28 Revised:2020-11-07 Online:2021-02-15 Published:2021-02-04
  • About author:YUAN Chen-yu,born in 1998,postgra-duate.His main research interests include distributed computing and so on.
    XIE Zai-peng,born in 1982,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include distributed systems,embedded system and Internet of Things.
  • Supported by:
    The National Key R&D Program of China(2016YFC0402710) and Key Project of National Nature Science Foundation of China(61832005).

摘要: 卷积在统计学、信号处理、图像处理、深度学习等领域有着广泛的应用,且起到了至关重要的作用。在深度神经网络中,使用卷积运算对输入信息进行特征提取的方法是实现神经网络的基础计算单元之一。如何优化卷积的运算速度,提高卷积计算效率一直是亟需探讨的问题。近年来,很多研究指出分布式计算架构可以提高卷积神经网络的计算速度,进而优化深度学习的训练效率,然而由于分布式系统中普遍存在落跑者问题(straggler),该问题可能会拖慢整个系统执行任务的时间,因此该问题也成为了分布式深度学习中一个待解决的问题。文中针对二维卷积计算,结合Winograd算法和分布式编码,提出了一种优化的分布式二维卷积算法。Winograd算法能够有效地加速单次二维卷积计算的速度,分布式编码通过使用一种基于分布式冗余的编码方式能够缓解straggler节点对整个分布式系统计算延迟的影响。因此,提出的分布式二维卷积算法可以在加速二维卷积计算的同时有效缓解分布式系统中的straggler问题,有效提高了分布式卷积的计算效率。

关键词: Winograd, 分布式编码, 分布式计算, 卷积

Abstract: Convolution operation plays a vital role in statistics,signal processing,image processing and deep learning.It is also an important operation in deepneural networks where it is the basis of information filter and characteristics extraction.The exploration of methods to speed up the convolutional operations has become an open research topic in recent years.Many studies have pointed out that distributed computing framework may improve the computational time of convolution operations and hence optimize the training efficiency for deep learning.But stragglers in distributed systems may slow down the overall system.This paper proposes a distributed coding based Winograd algorithm for implementing 2D convolution operations.In this algorithm,the Winograd algorithm can effectively accelerate the speed of 2D convolution calculation,while distributed encoding can mitigate the impact of stragglers by using a redundancy-based encoding strategy.Therefore,the proposed distributed 2D convolution algorithm can effectively mitigate the straggler problem in distributed systems while improving the 2D convolution calculation,hence it may effectively improve the computational efficiency of distributed 2D convolution algorithms.

Key words: Convolution, Distributed coding, Distributed computing, Winograd

中图分类号: 

  • TP338.8
[1] PARHI K K.VLSI digital signal processing systems:design and implementation[M].John Wiley & Sons,2007.
[2] LIU S,DU Z,TAO J,et al.Cambricon:An instruction set architecture for neural networks[C]//2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).IEEE,2016:393-405.
[3] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[4] LI Y,ZHANG Z.Parallel computing:review and perspective[C]//2018 5th International Conference on Information Science and Control Engineering (ICISCE).IEEE,2018:365-369.
[5] LI S,MADDAH-ALI M A,YU Q,et al.A fundamental tradeoffbetween computation and communication in distributed computing[J].IEEE Transactions on Information Theory,2017,64(1):109-128.
[6] LI S,MADDAH-ALI M A,AVESTIMEHRA S.Compressedcoded distributed computing[C]//2018 IEEE International Symposium on Information Theory (ISIT).IEEE,2018:2032-2036.
[7] ANDERSON A,GREGG D.Optimal DNN primitive selectionwith partitioned boolean quadratic programming[C]//Procee-dings of the 2018 International Symposium on Code Generation and Optimization.2018:340-351.
[8] MATHIEU M,HENAFF M,LECUN Y.Fast training of convolutional networks through ffts[J].arXiv:1312.5851,2013.
[9] VASILACHE N,JOHNSON J,MATHIEU M,et al.Fast convolutional nets with fbfft:A GPU performance evaluation[J].arXiv:1412.7580,2014.
[10] CONG J,XIAO B.Minimizing computation in convolutionalneural networks[C]//International Conference on Artificial Neural Networks.Springer,Cham,2014:281-290.
[11] STRASSEN V.Gaussian elimination is not optimal[J].Numerische mathematik,1969,13(4):354-356.
[12] LAVIN A,GRAY S.Fast algorithms for convolutional neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2016:4013-4021.
[13] WINOGRAD S.Arithmetic complexity of computations[M].Siam,1980.
[14] WANG Z,LAN Q,HE H,et al.Winograd algorithm for 3D convolution neural networks[C]//International Conference on Artificial Neural Networks.Springer,Cham,2017:609-616.
[15] ZHAO Y,WANG D,WANG L.Convolution accelerator designs using fast algorithms[J].Algorithms,2019,12(5):112.
[16] FERNANDEZ-MARQUES J,WHATMOUGH P N,MUNDY A,et al.Searching for Winograd-aware Quantized Networks[J].arXiv:2002.10711,2020.
[17] ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:Cluster computing with working sets[C]//HotCloud'10 Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.2010:4-10.
[18] DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[19] DEAN J,BARROSOL A.The tail at scale[J].Communications of the ACM,2013,56(2):74-80.
[20] AGARWAL A,DUCHIJ C.Distributed delayed stochastic optimization[C]//Advances in Neural Information Processing Systems.2011:873-881.
[21] RECHT B,RE C,WRIGHT S,et al.Hogwild!:A lock-free approach to parallelizing stochastic gradient descent[J].Advances in neural information processing systems,2011,24:693-701.
[22] ANANTHANARAYANAN G,GHODSI A,SHENKER S,et al.Effective straggler mitigation:Attack of the clones[C]//Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation.2013:185-198.
[23] SHAH N B,LEE K,RAMCHANDRAN K.When do redundant requests reduce latency?[J].IEEE Transactions on Communications,2015,64(2):715-722.
[24] WANG D,JOSHI G,WORNELL G.Efficient task replication for fast response times in parallel computation[C]//The 2014 ACM International Conference on Measurement and Modeling of Computer Systems.2014:599-600.
[25] GARDNER K,ZBARSKY S,DOROUDI S,et al.Reducing latency via redundant requests:Exact analysis[J].ACM SIGMETRICS Performance Evaluation Review,2015,43(1):347-360.
[26] CHAUBEY M,SAULE E.Replicated data placement for uncertain scheduling[C]//2015 IEEE International Parallel and Distributed Processing Symposium Workshop.IEEE,2015:464-472.
[27] LEE K,PEDARSANI R,RAMCHANDRAN K.On schedulingredundant requests with cancellation overheads[J].IEEE/ACM Transactions on Networking,2016,25(2):1279-1290.
[28] JOSHI G,SOLJANIN E,WORNELL G.Efficient redundancytechniques for latency reduction in cloud systems[J].ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS),2017,2(2):1-30.
[29] JOSHI G,LIU Y,SOLJANIN E.On the delay-storage trade-off in content download from coded distributed storage systems[J].IEEE Journal on Selected Areas in Communications,2014,32(5):989-997.
[30] LEE K,LAM M,PEDARSANI R,et al.Speeding up distributed machine learning using codes[J].IEEE Transactions on Information Theory,2017,64(3):1514-1529.
[31] REISIZADEH A,PRAKASH S,PEDARSANI R,et al.Codedcomputation over heterogeneous clusters[J].IEEE Transactions on Information Theory,2019,65(7):4227-4242.
[32] DIMAKIS A G,GODFREY P B,WU Y,et al.Network coding for distributed storage systems[J].IEEE transactions on information theory,2010,56(9):4539-4551.
[33] YU Q,MADDAH-ALI M,AVESTIMEHR S.Polynomial codes:an optimal design for high-dimensional coded matrix multiplication[C]//Advances in Neural Information Processing Systems.2017:4403-4413.
[34] PEI D,SALOMAA A,DING C.Chinese remainder theorem:applications in computing,coding,cryptography[M].World Scientific,1996.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[4] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[5] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[6] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[7] 魏恺轩, 付莹.
基于重参数化多尺度融合网络的高效极暗光原始图像降噪
Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising
计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179
[8] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[9] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[10] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[11] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[12] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[13] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建.
基于注意力机制和多任务学习的阿尔茨海默症分类
Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning
计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!