计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 348-355.doi: 10.11896/jsjkx.230300171

• 信息安全 • 上一篇    下一篇

USPS:面向算力资源高效协同的用户态跨协议代理系统

夏景旋, 申国伟, 郭春, 崔允贺   

  1. 贵州大学计算机科学与技术学院文本计算与认知智能教育部工程研究中心 贵阳 550025
    贵州大学计算机科学与技术学院省部共建公共大数据国家重点实验室 贵阳 550025
  • 收稿日期:2023-03-21 修回日期:2023-07-29 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 申国伟(gwshen@gzu.edu.cn)
  • 作者简介:(jxuan_xia@163.com)
  • 基金资助:
    国家自然科学基金 (62062022);贵州省自然科学基金重点项目(黔科合基础-ZK[2023]重点011)

USPS:User-space Cross Protocol Proxy System for Efficient Collaboration of Computing Power Resources

XIA Jingxuan, SHEN Guowei, GUO Chun, CUI Yunhe   

  1. Engineering Research Center for Text Computing, Cognitive Intelligence, Ministry of Education, School of Computer Science, Technology, Guizhou University, Guiyang 550025, China
    State Key Laboratory of Public Big Data,School of Computer Science and Technology,Guizhou University,Guiyang 550025,China
  • Received:2023-03-21 Revised:2023-07-29 Online:2023-11-15 Published:2023-11-06
  • About author:XIA Jingxuan,born in 1998,postgra-duate.His main research interests include high performance network communication and RDMA.SHEN Guowei,born in 1986,Ph.D,professor,is a member of China Computer Federation.His main research interests include big data,network and information security.
  • Supported by:
    National Natural Science Foundation of China(62062022) and Natural Science Foundation of Guizhou Province,China([2023]011).

摘要: 随着算力网络的快速发展,通用算力、人工智能算力、超算等算力资源分布广泛。算力资源协同服务是算力网络研究的关键问题。在算力资源协同过程中,一方面,算力网络面临海量终端算力服务的高并发请求和低时延响应需求;另一方面,其难以充分发挥数据中心算力资源的高吞吐和低时延优势,进而难以为用户提供高效的算力服务。针对上述挑战,提出一种基于用户态协议栈和远程直接内存访问(Remote Direct Memory Access,RDMA)的用户态代理系统(User-Space Proxy System,USPS),通过用户态协议栈响应客户高并发算力请求,在动态批处理策略协调下实现基于RDMA的数据中心算力高吞吐、低时延服务。在通信方面,USPS实现了一个高效的远程过程调用(Remote Procedure Call,RPC)通信机制,能够充分利用RDMA网卡带宽提供高速消息通信;在请求处理方面,提出了一个动态批处理调度方法,能够在满足用户时延要求的前提下最大化批处理效率。实验结果表明,USPS的服务响应时延仅是传统内核态Nginx代理系统的7.8%~23.1%,是其他用户态代理系统的17.3%~24.7%;吞吐量比传统内核态的Nginx代理系统提升了3.4~8.9倍,比其他用户态代理系统提升了3.2~4.2倍。

关键词: 算力资源高效协同, 用户态代理, 远程直接内存访问, 数据中心, 批处理调度

Abstract: With the rapid development of computing power network,computing power resources such as general computing po-wer,artificial intelligence computing power,and supercomputing are widely distributed.Collaborative service of computing power resources is a key issue in computing power network research.In the process of computing power resource collaboration,on the one hand,it faces the high concurrent requests and low latency response requirements of massive terminal computing power ser-vices,on the other hand,it is difficult to give full play to the high throughput and low latency advantages of computing power resources in data center,and then it is difficult to provide efficient computing power services for users.Aiming at the above challenges,a user-space proxy system(USPS) based on the user-space protocol stack and remote direct memory access(RDMA) techno-logy is proposed.The user space protocol stack is used to respond to client's for high concurrent computing power requests,and the high throughput and low latency services of data center computing power based on RDMA is realized under dynamic batch processing strategy coordination.In terms of communication,USPS has implemented an efficient remote procedure call(RPC) communication mechanism,which can make full use of RDMA NIC bandwidth and provide high-speed message communication.In terms of request processing,a dynamic batch processing scheduling method is proposed,which can maximize the batch processing efficiency while meeting the user's delay requirements.Experiment shows that the service response latency of USPS is only 7.8%~23.1% of that of the traditional kernel-space Nginx proxy system,and 17.3%~24.7% of that of other user-space proxy systems.The throughput is 3.4~8.9 times higher than that of the traditional kernel-space Nginx agent system,and 3.2~4.2 times higher than that of other user-space proxy systems.

Key words: Efficient collaboration of computing power resources, User-space proxy, Remote direct memory access, Data center, Batch processing scheduling

中图分类号: 

  • TP393
[1]JIA Q M,HU Y J,ZHANG H Y,et al.Research on deterministic computing power network[J].Journal on Communications,2022,43(10):55-64.
[2]ZHANG H K,YU C X,QUAN W,et al.Fundamental Research on Computing Integration Networking[J].Acta Electronica Si-nica,2022,50(12):2928-2934.
[3]CHEN X Y,ZHANG X S,XIE Z L,et al.A Computing andTransmission Integrated Optimization Method for Cloud-Edge-End Computing First System[J].Journal of Computer Research and Development,2023(4):719-734.
[4]ZHONG L J,WANG M.Blockchain-enabled Cooperative Resource Allocation Scheme for Computing First Networking[J].Journal of Computer Research and Development,2023,60(4):750-762.
[5]TENCENT CLOUD.F-stack:An high performant networkframework based on DPDK[EB/OL].http://www.f-stack.org/.
[6]INTEL.Data Plane Development Kit[EB/OL].http://dpdk.org.
[7]JEONG E Y,WOO S,JAMSHED M,et al.mtcp:a highly scala-ble user-level TCP stack for multicore systems[C]//11th USENIX Symposium on Networked Systems Design and Implementation(NSDI 14).2014:489-502.
[8]JAMSHED M A,MOON Y G,KIM D,et al.mOS:A reusable networking stack for flow monitoring middleboxes[C]//14th USENIX Symposium on Networked Systems Design and Implementation(NSDI 17).2017:113-129.
[9]WANG S,LOU C,CHEN R,et al.Fast and Concurrent RDFQueries using RDMA-assisted GPU Graph Exploration[C]//2018 USENIX Annual TechnicalConference(USENIX ATC 18).2018:651-664.
[10]XUE J,MIAO Y,CHEN C,et al.Fast distributed deep learning over rdma[C]//Proceedings of the Fourteenth EuroSys Conference 2019.2019:1-14.
[11]ZHANG J,LU X,CHU C H,et al.C-GDR:High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks[C]//2019 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2019:242-251.
[12]ZHANG R,SHEN G,GONG L,et al.DSANA:A distributed machine learning acceleration solution based on dynamic scheduling and network acceleration[C]//2020 IEEE 22nd International Conference on High Performance Computing and Communications;IEEE 18th International Conference on Smart City;IEEE 6th International Conference on Data Science and Systems(HPCC/SmartCity/DSS).IEEE,2020:302-311.
[13]DRAGOJEVIC' A,NARAYANAN D,CASTRO M,et al.FaRM:Fast remote memory[C]//11th {USENIX} Symposium on Networked Systems Design and Implementation({NSDI} 14).2014:401-414.
[14]TSAI S Y,ZHANG Y.Lite kernel rdma support for datacenter applications[C]//Proceedings of the 26th Symposium on Ope-rating Systems Principles.2017:306-324.
[15]CHEN Y,LU Y,SHU J.Scalable RDMA RPC on reliable connection with efficient resource sharing[C]//Proceedings of the Fourteenth EuroSys Conference 2019.2019:1-14.
[16]MONGA S K,KASHYAP S,MIN C.Birds of a Feather Flock Together:Scaling RDMA RPCs with Flock[C]//Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles.2021:212-227.
[17]JONATHAN C.Batch processing of network packets[DB/OL].https://lwn.net/Articles/763056/.
[18]LANGE S,LINGUAGLOSSA L,GEISSLER S,et al.Discrete-time modeling ofnfv accelerators that exploit batched processing[C]//IEEE INFOCOM 2019-IEEE Conference on Computer Communications.IEEE,2019:64-72.
[19]LINGUAGLOSSA L,LANGE S,PONTARELLI S,et al.Sur-vey of performance acceleration techniques for network function virtualization[J].Proceedings of the IEEE,2019,107(4):746-764.
[20]LÉVAI T,NÉMETH F,RAGHAVAN B,et al.Batchy:Batch-scheduling data flow graphs with service-level objectives[C]//17th USENIX Symposium on Networked Systems Design and Implementation(NSDI 20).2020:633-649.
[21]LI M Q.Research on cross-protocol user-space proxy technology for data center network[D].Guiyang:Guizhou University.2021.
[22]WILL G.wrk:Modern HTTP benchmarking tool[EB/OL].https://github.com/wg/wrk.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!