计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 47-54.doi: 10.11896/jsjkx.200800187
苑晨宇, 谢在鹏, 朱晓瑞, 屈志昊, 徐媛媛
YUAN Chen-yu, XIE Zai-peng, ZHU Xiao-rui, QU Zhi-hao, XU Yuan-yuan
摘要: 卷积在统计学、信号处理、图像处理、深度学习等领域有着广泛的应用,且起到了至关重要的作用。在深度神经网络中,使用卷积运算对输入信息进行特征提取的方法是实现神经网络的基础计算单元之一。如何优化卷积的运算速度,提高卷积计算效率一直是亟需探讨的问题。近年来,很多研究指出分布式计算架构可以提高卷积神经网络的计算速度,进而优化深度学习的训练效率,然而由于分布式系统中普遍存在落跑者问题(straggler),该问题可能会拖慢整个系统执行任务的时间,因此该问题也成为了分布式深度学习中一个待解决的问题。文中针对二维卷积计算,结合Winograd算法和分布式编码,提出了一种优化的分布式二维卷积算法。Winograd算法能够有效地加速单次二维卷积计算的速度,分布式编码通过使用一种基于分布式冗余的编码方式能够缓解straggler节点对整个分布式系统计算延迟的影响。因此,提出的分布式二维卷积算法可以在加速二维卷积计算的同时有效缓解分布式系统中的straggler问题,有效提高了分布式卷积的计算效率。
中图分类号:
[1] PARHI K K.VLSI digital signal processing systems:design and implementation[M].John Wiley & Sons,2007. [2] LIU S,DU Z,TAO J,et al.Cambricon:An instruction set architecture for neural networks[C]//2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).IEEE,2016:393-405. [3] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [4] LI Y,ZHANG Z.Parallel computing:review and perspective[C]//2018 5th International Conference on Information Science and Control Engineering (ICISCE).IEEE,2018:365-369. [5] LI S,MADDAH-ALI M A,YU Q,et al.A fundamental tradeoffbetween computation and communication in distributed computing[J].IEEE Transactions on Information Theory,2017,64(1):109-128. [6] LI S,MADDAH-ALI M A,AVESTIMEHRA S.Compressedcoded distributed computing[C]//2018 IEEE International Symposium on Information Theory (ISIT).IEEE,2018:2032-2036. [7] ANDERSON A,GREGG D.Optimal DNN primitive selectionwith partitioned boolean quadratic programming[C]//Procee-dings of the 2018 International Symposium on Code Generation and Optimization.2018:340-351. [8] MATHIEU M,HENAFF M,LECUN Y.Fast training of convolutional networks through ffts[J].arXiv:1312.5851,2013. [9] VASILACHE N,JOHNSON J,MATHIEU M,et al.Fast convolutional nets with fbfft:A GPU performance evaluation[J].arXiv:1412.7580,2014. [10] CONG J,XIAO B.Minimizing computation in convolutionalneural networks[C]//International Conference on Artificial Neural Networks.Springer,Cham,2014:281-290. [11] STRASSEN V.Gaussian elimination is not optimal[J].Numerische mathematik,1969,13(4):354-356. [12] LAVIN A,GRAY S.Fast algorithms for convolutional neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2016:4013-4021. [13] WINOGRAD S.Arithmetic complexity of computations[M].Siam,1980. [14] WANG Z,LAN Q,HE H,et al.Winograd algorithm for 3D convolution neural networks[C]//International Conference on Artificial Neural Networks.Springer,Cham,2017:609-616. [15] ZHAO Y,WANG D,WANG L.Convolution accelerator designs using fast algorithms[J].Algorithms,2019,12(5):112. [16] FERNANDEZ-MARQUES J,WHATMOUGH P N,MUNDY A,et al.Searching for Winograd-aware Quantized Networks[J].arXiv:2002.10711,2020. [17] ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:Cluster computing with working sets[C]//HotCloud'10 Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.2010:4-10. [18] DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113. [19] DEAN J,BARROSOL A.The tail at scale[J].Communications of the ACM,2013,56(2):74-80. [20] AGARWAL A,DUCHIJ C.Distributed delayed stochastic optimization[C]//Advances in Neural Information Processing Systems.2011:873-881. [21] RECHT B,RE C,WRIGHT S,et al.Hogwild!:A lock-free approach to parallelizing stochastic gradient descent[J].Advances in neural information processing systems,2011,24:693-701. [22] ANANTHANARAYANAN G,GHODSI A,SHENKER S,et al.Effective straggler mitigation:Attack of the clones[C]//Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation.2013:185-198. [23] SHAH N B,LEE K,RAMCHANDRAN K.When do redundant requests reduce latency?[J].IEEE Transactions on Communications,2015,64(2):715-722. [24] WANG D,JOSHI G,WORNELL G.Efficient task replication for fast response times in parallel computation[C]//The 2014 ACM International Conference on Measurement and Modeling of Computer Systems.2014:599-600. [25] GARDNER K,ZBARSKY S,DOROUDI S,et al.Reducing latency via redundant requests:Exact analysis[J].ACM SIGMETRICS Performance Evaluation Review,2015,43(1):347-360. [26] CHAUBEY M,SAULE E.Replicated data placement for uncertain scheduling[C]//2015 IEEE International Parallel and Distributed Processing Symposium Workshop.IEEE,2015:464-472. [27] LEE K,PEDARSANI R,RAMCHANDRAN K.On schedulingredundant requests with cancellation overheads[J].IEEE/ACM Transactions on Networking,2016,25(2):1279-1290. [28] JOSHI G,SOLJANIN E,WORNELL G.Efficient redundancytechniques for latency reduction in cloud systems[J].ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS),2017,2(2):1-30. [29] JOSHI G,LIU Y,SOLJANIN E.On the delay-storage trade-off in content download from coded distributed storage systems[J].IEEE Journal on Selected Areas in Communications,2014,32(5):989-997. [30] LEE K,LAM M,PEDARSANI R,et al.Speeding up distributed machine learning using codes[J].IEEE Transactions on Information Theory,2017,64(3):1514-1529. [31] REISIZADEH A,PRAKASH S,PEDARSANI R,et al.Codedcomputation over heterogeneous clusters[J].IEEE Transactions on Information Theory,2019,65(7):4227-4242. [32] DIMAKIS A G,GODFREY P B,WU Y,et al.Network coding for distributed storage systems[J].IEEE transactions on information theory,2010,56(9):4539-4551. [33] YU Q,MADDAH-ALI M,AVESTIMEHR S.Polynomial codes:an optimal design for high-dimensional coded matrix multiplication[C]//Advances in Neural Information Processing Systems.2017:4403-4413. [34] PEI D,SALOMAA A,DING C.Chinese remainder theorem:applications in computing,coding,cryptography[M].World Scientific,1996. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[4] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[5] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[6] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[7] | 魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179 |
[8] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[9] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[10] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[11] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[12] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[13] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 |
|