Computer Science ›› 2026, Vol. 53 ›› Issue (6): 193-202.doi: 10.11896/jsjkx.251000093
• High Performance Computing • Previous Articles Next Articles
LI Fei1, LIU Song1, GUO Songjian1, LIU Jiazheng1, ZHANG Ying1, HONG Longwei2, ZHANG Boxuan2
CLC Number:
| [1]SHORTEN C,KHOSHGOFTAAR T M.A survey on image data augmentation for deep learning[J].Journal of Big Data,2019,6(1):1-48. [2]MINH T N,SINN M,LAM H T,et al.Automated image data preprocessing with deep reinforcement learning[J].arXiv:1806.05886,2018. [3]GYAWALI D.Comparative analysis ofcpu and gpu profiling for deep learning models[J].arXiv:2309.02521,2023. [4]XIAO H,SUN L P,LI C L,et al.Histogram statistical image enhancement parallel algorithm for GPU[J].Journal of Frontiers of Computer Science & Technology,2022,16(10):2273-2285. [5]LIU B,ZHOU H,BIAN C J,et al.Target detection systembased on lightweight Yolov5 algorithm based on aerospace-grade NPU[J].Chinese Journal of Space Science,2025,45(4):1-11. [6]WENG X,IVANOVIC B,WANG Y,et al.Para-drive:Paralle-lized architecture for real-time autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:15449-15458. [7]DE SILVA U,FERNANDO L,LIK B L P,et al.Large language models for video surveillance applications[C]//2024 IEEE Region 10 Conference(TENCON).IEEE,2024:563-566. [8]CHEN T,DU Z,SUN N,et al.Diannao:A small-footprint high-throughput accelerator for ubiquitous machine-learning[J].ACM SIGARCH Computer Architecture News,2014,42(1):269-284. [9]LU W Z,ZHANG F,HE Y X,et al.Performance evaluation and optimization of Huawei Ascend neural network accelerator[J].Chinese Journal of Computers,2022,45(8):1618-1637. [10]ZHANG S,DU Z,ZHANG L,et al.Cambricon-X:An accelerator for sparse neural networks[C]//2016 49th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO).IEEE,2016:1-12. [11]JIAO Y,HAN L,LONG X.Hanguang 800 NPU-the ultimate AI inference solution for data centers[C]//2020 IEEE Hot Chips 32 Symposium(HCS).IEEE Computer Society,2020:1-29. [12]LI S.China's largest AI computing chip is launched:A look at Suiyuan Technology's Suisi chip and Yunsui accelerator card [J].Microcomputer,2021(24):89-93. [13]LEE K J.Architecture of neural processing unit for deep neural networks[M]//Advances in Computers.Elsevier,2021:217-245. [14]CHEN J,BAI G,LIANG S,et al.Automatic image cropping:A computational complexity study[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:507-515. [15]ZHANG Y,GUO R Q,SHENG Y Y.CV-CUDA High-Perfor-mance Image Processing Acceleration Library [EB/OL].(2022-11-22)[2025-06-03].https://developer.nvidia.com/zh-cn/blog/cv-cuda-high-performance-image-processing. [16]MA Y J,YU D H,WU T,et al.PaddlePaddle:An open source deep learning platform derived from industrial practice [J].Frontiers of Data and Computing Development,2019,1(5):105-115. [17]LIAO H,TU J,XIA J,et al.Ascend:a scalable and unified architecture for ubiquitous deep neural network computing:Industry track paper[C]//2021 IEEE International Symposium on High-Performance Computer Architecture(HPCA).IEEE,2021:789-801. [18]LIU S,DU Z,TAO J,et al.Cambricon:An instruction set architecture for neural networks[J].ACM SIGARCH Computer Architecture News,2016,44(3):393-405. [19]GUO H,ZHAO Y,LI Z,et al.Cambricon-u:A systolic random increment memory architecture for unary computing[C]//Proceedings of the 56th Annual IEEE/ACM International Sympo-sium on Microarchitecture.2023:424-437. [20]HAO Y,ZHAO Y,LIU C,et al.Cambricon-p:A bitflow architecture for arbitrary precision computing[C]//2022 55th IEEE/ACM International Symposium on Microarchitecture(MICRO).IEEE,2022:57-72. [21]SONG X,WEN Y,HU X,et al.Cambricon-r:A fully fused accelerator for real-time learning of neural scene representation[C]//Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture.2023:1305-1318. [22]ZHAO Y,LIU C,DU Z,et al.Cambricon-Q:A hybrid architecture for efficient training[C]//2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture(ISCA).IEEE,2021:706-719. [23]JOUPPI N P,YOUNG C,PATIL N,et al.In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture.2017:1-12. [24]MARKIDIS S,DER CHIEN S W,LAURE E,et al.Nvidia tensor core programmability,performance & precision[C]//2018 IEEE International Parallel and Distributed Processing Sympo-sium Workshops(IPDPSW).IEEE,2018:522-531. [25]HICKMANN B,CHEN J,ROTZIN M,et al.Intelnervana neural network processor-t(nnp-t)fused floating point many-term dot product[C]//2020 IEEE 27th Symposium on Computer Arithmetic(ARITH).IEEE,2020:133-136. [26]BBI R,XU T,XU M,et al.Paddlepaddle:A production-oriented deep learning platform facilitating the competency of enterprises[C]//2022 IEEE 24th Int Conf on High Performance Computing & Communications;8th Int Conf on Data Science & Systems;20th Int Conf on Smart City;8th Int Conf on Dependability in Sensor,Cloud & Big Data Systems & Application(HPCC/DSS/SmartCity/DependSys).IEEE,2022:92-99. [27]VASILE C E,ULMĂMEI A A,BÎRĂ C.Image ProcessingHardware Acceleration-A Review of Operations Involved and Current Hardware Approaches[J].Journal of Imaging,2024,10(12):298. [28]YANG H Y,LI C M,WANG X P,et al.Image collaborativeparallel processing model in CPU/GPU heterogeneous environment[J].Integration Technology,2017,6(5):8-18. [29]NAZ N,HASEEB MALIK A,KHURSHID A B,et al.Efficient processing of image processing applications on CPU/GPU[J].Mathematical Problems in Engineering,2020,2020(1):4839876. [30]ALHUMAIDAN B,ALGHOFAILY S,AL QHAHTANI M,et al.Parallel image processing:Taking grayscale conversion using openmp as an example[J].Journal of Computer and Communications,2024,12(2):1-10. [31]XIAO S Y,WANG L,DU Y,et al.OpenCL acceleration algorithm for image median filtering based on heterogeneousplatforms[J].Journal of Hebei University(Natural Science Edition),2024,44(1):92. [32]MÁNDI Á,MÁTÉ J,RÓZSA D,et al.Hardware acceleratedimage processing on FPGA based PYNQ-Z2 board[J].CarpathianJournal of Electronic and Computer Engineering,2021,14(1):20-23. [33]YUAN H,DING D,FAN Z,et al.A real-time image processing hardware acceleration method based onfpga[C]//2021 6th International Conference on Computational Intelligence and Applications(ICCIA).IEEE,2021:200-205. [34]CHEN W,ZHANG C S,LIU S.An image processing acceleration method based on domestic accelerator card:CN 202410455829.4 [P].2024-07-05. [35]LI Y.The Investigation of DeiT model Based on PaddlePaddleFramework on CIFAR-10 Dataset Image Classification[C]//2023 International Conference on Image,Algorithms and Artificial Intelligence(ICIAAI 2023).Atlantis Press,2023:1062-1067. [36]CAMBRICON TECHNOLOGIES.CAMBRICON BANG C/C++Programming Guide [EB/OL].(2023-09-12)[2025-06-03].https://www.cambricon.com/docs/sdk_1.15.0/cntoolkit_3.7.2/programming_guide_1.7.0/hardware_implementation/index.html. [37]ABOUELNAGA Y,ALI O S,RADY H,et al.Cifar-10:Knn-based ensemble of classifiers[C]//2016 International Confe-rence on Computational Science and Computational Intelligence(CSCI).IEEE,2016:1192-1195. [38]MOKHAIRI M,ENGKU FADZLI HASAN S A,NURSHAZWANI K.Comparison of image classification techniques using CALTECH 101 dataset[J].Journal of Theoretical and Applied Information Technology,2015,71(1):79-86. [39]BAIDU PADDLEPADDLE.PaddlePaddle Deep Learning Plat-form User Guide [EB/OL].[2025-7-18].https://www.paddlepaddle.org.cn/documentation/docs/zh/3.0-beta/guides/hardware_support/mlu/support_cn.html. [40]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [41]RIGATTI S J.Random forest[J].Journal of Insurance Medi-cine,2017,47(1):31-39. [42]STEINBACH M,TAN P N.kNN:k-nearest neighbors[M]//The Top Ten Algorithms in Data Mining.Chapman and Hall/CRC,2009:165-176. [43]DE VILLE B.Decision trees[J].Wiley Interdisciplinary Re-views:Computational Statistics,2013,5(6):448-455. |
| [1] | SUN Xiaoxue, JIA Haipeng, ZHANG Yunquan, YU Yue, QIN Pinle. GPU-based Implementation and Optimization of Banded Matrix LU Factorization [J]. Computer Science, 2026, 53(6): 117-127. |
| [2] | LI Jinyou, ZHANG Wenshuai, SHEN Yu, ZHANG Yundong, LI Huimin, LI Jing. Machine Learning-based Parallel Parameter Optimization in High-performance ComputingApplications [J]. Computer Science, 2026, 53(6): 153-162. |
| [3] | WU Can, XIAO Haili, WANG Xiaoning, ZHAO Yining, LU Shasha, HE Rong. Workload Analysis and Modeling Method for High-performance Computing [J]. Computer Science, 2026, 53(6): 171-184. |
| [4] | JI Liguang, ZHOU Bei, YANG Hongru, ZHOU Yuchang, CUI Mengqi, XU Jinchen. Parallel Detection Method of Maximum Floating-point Error Based on Gridding Particle SwarmOptimization Algorithm [J]. Computer Science, 2026, 53(2): 124-132. |
| [5] | LIAO Zeming, LIU Guikai, HU Yonghua, XIE Anxing. Research on Efficient Code Generation Techniques for Array Computation for Vector DSPs [J]. Computer Science, 2025, 52(6A): 240300156-7. |
| [6] | ZUO Xianyu, ZHOU Xiaohu, ZHOU Liming, XIE Yi, LIU Cheng. Efficient Remote Sensing Common Product Production Algorithm Based on Product Reuse Model [J]. Computer Science, 2025, 52(6): 316-323. |
| [7] | XIE Zhenjie, LIU Yiming, CAI Ruijie, LUO Youqiang. Performance Optimization Method for Domestic Cryptographic Algorithm SM9 [J]. Computer Science, 2025, 52(6): 390-396. |
| [8] | TAN Zhengyuan, ZHONG Jiaqing, CHEN Juan. AI+HPC:An Overview of Supercomputing System Software and Application Technology Development Driven by “AI+” [J]. Computer Science, 2025, 52(5): 1-10. |
| [9] | LIAO Qiucheng, ZHOU Yang, LIN Xinhua. Metrics and Tools for Evaluating the Deviation in Parallel Timing [J]. Computer Science, 2025, 52(5): 41-49. |
| [10] | HUANG Chenxi, LI Jiahui, YAN Hui, ZHONG Ying, LU Yutong. Investigation on Load Balancing Strategies for Lattice Boltzmann Method with Local Grid Refinement [J]. Computer Science, 2025, 52(5): 101-108. |
| [11] | LI Qing, JIA Haipeng, ZHANG Yunquan, ZHANG Sijia. Input-aware Generalized Matrix-Vector Product Algorithm for Adaptative PerformanceOptimization of Hygon DCU [J]. Computer Science, 2025, 52(4): 291-300. |
| [12] | ZHANG Manjing, HE Yulin, LI Xu, HUANG Zhexue. Distributed Two-stage Clustering Method Based on Node Sampling [J]. Computer Science, 2025, 52(2): 134-144. |
| [13] | CHEN Yiyang, WANG Xiaoning, YAN Xiaoting, LI Guanlong ZHAO Yining, LU Shasha, XIAO Haili. Study on High Performance Computing Container Checkpoint Technology Based on CRIU [J]. Computer Science, 2024, 51(9): 40-50. |
| [14] | XU He, ZHOU Tao, LI Peng, QIN Fangfang, JI Yimu. LU Parallel Decomposition Optimization Algorithm Based on Kunpeng Processor [J]. Computer Science, 2024, 51(9): 51-58. |
| [15] | YAN Xiaoting, WANG Xiaoning, DONG Sheng, ZHAO Yining, XIAO Haili. Review on the Development and Application of Checkpointing Technology in High-performanceComputing [J]. Computer Science, 2024, 51(9): 1-14. |
|
||