Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250900159-7.doi: 10.11896/jsjkx.250900159

• Big Data & Data Science • Previous Articles     Next Articles

Multi-RAG:Distributed Retrieval-augmented Generation Framework for Cross-domain Data

SHEN Jianwei, CHEN Hanlin, CHEN Xing   

  1. 1 College of Computer and Data Science,Fuzhou University,Fuzhou 350116,China
    2 Fujian Key Laboratory of Network Computing and Intelligent Information Processing(Fuzhou University),Fuzhou 350116,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:SHEN Jianwei,born in 2001,postgra-duate.His main research interests include large language models and know-ledge graphs.
    CHEN Xing,born in 1985,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.35725M).His main research interests include system software,software self-adaptation and cloud computing.
  • Supported by:
    National Natural Science Foundation of China(62072108),Special Funds for Promoting High-quality Development of Marine and Fishery Industries in Fujian Province (FJHYF-ZH-2023-02),Fujian Key Technological Innovation and Industrialization Projects(2024XQ004) and National Key Laboratory of Data Space Technology and System(QZQC2024007).

Abstract: The increasing application of large language models(LLMs) in natural language processing tasks has established retrieval-augmented generation(RAG) as a critical technique for enhancing factual accuracy.However,the distributed storage of cross-domain data presents significant challenges,including barriers to data aggregation,insufficient scalability,and the lack of native distributed coordination.These challenges render traditional centralized RAG frameworks inadequate for cross-domain scenarios.To address this limitation,a distributed RAG framework named Multi-RAG is proposed for handling cross-domain data.This framework enables individual nodes to maintain independent embedding models and vector indexes tailored to their local data characteristics.Queries are routed in parallel to relevant nodes through a query distribution module.Each node retrieves and returns locally high-scoring document segments.A global re-ranking module then performs global semantic reranking on these segments.The optimally re-ranked context is subsequently fed into the LLM for answer generation.Experiments conducted within a synthetically constructed distributed environment using the MultiHop-RAG dataset demonstrate Multi-RAG's effectiveness.The framework achieves a Hits@10 score of 0.765 9,representing a 72% improvement over single-node retrieval(0.445 2) and maintaining performance within 3.1% of a centralized approach.Answer generation accuracy using the DeepSeek-R1 model marks a 48% increase compared to the single-node baseline.The study indicates that through its streamlined distributed coordination mechanism and global information fusion strategy,Multi-RAG effectively enhances retrieval and generation performance in cross-domain settings without requiring raw data consolidation.This framework provides a practical and efficient solution for collaborative knowledge utilization across institutions and domains.

Key words: Distributed retrieval-augmented generation, Cross-domain data collaboration, Global re-ranking, Large language mo-del, Information fusion

CLC Number: 

  • TP391
[1] LEWIS P,PEREZ E,PIKTUS A,et al.Retrieval-augmentedgeneration for knowledge-intensive nlp tasks[J].Advances in Neural Information Processing Systems,2020,33:9459-9474.
[2] CUCONASU F,TRAPPOLINI G,SICILIANO F,et al.Thepower of noise:Redefining retrieval for rag systems[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval.2024:719-729.
[3] ASAI A,WU Z,WANG Y,et al.Self-RAG:Learning to Retrieve,Generate,and Critique through Self-Reflection[C]//The Twelfth International Conference on Learning Representations.2024:1-30.
[4] ARSLAN M,GHANEM H,MUNAWARS,et al.A Survey on RAG with LLMs[J].Procedia Computer Science,2024,246:3781-3790.
[5] DU X Y,LI T,LU W,et al.Cross-domain data management[J].Computer Science,2024,51(1):4-12.
[6] LIU X,WANG R,SUN D,et al.Uncovering cross-domain re-commendation ability of large language models[C]//Companion Proceedings of the ACM on Web Conference 2025.2025:2736-2743.
[7] WAN Z,DING C,YU L,et al.A blockchain-based storage and sharing model for big health data[J].Journal of Chinese Mini-Micro Computer Systems,2023,44(3):636-645.
[8] ADDISON P,NGUYEN M T H,MEDAN T,et al.C-fedrag:A confidential federated retrieval-augmented generation system[J].arXiv:2412.13163,2024.
[9] SUBRAMANYA S J,DEVVRIT,SIMHADRI H V,et al.DiskANN:Fast Accurate Billion-point Nearest Neighbor Search on a Single Node[C]//Advances in Neural Information Processing Systems.2019:13748-13758.
[10] WANG S,KHRAMTSOVA E,ZHUANG S,et al.Feb4rag:Evaluating federated search in the context of retrieval augmented generation[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval.2024:763-773.
[11] MALKOV Y A,YASHUNIND A.Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,42(4):824-836.
[12] JIN C,PENG H,ZHANG A,et al.Rankflow:A multi-role collaborative reranking workflow utilizing large language models[C]//Proceedings of the ACM on Web Conference 2025.2025:2484-2493.
[13] IZACARD G,GRAVE E.Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.2021:874-880.
[14] HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].ICLR,2022,1(2):3.
[15] GHAYVAT H,PANDYA S,BHATTACHARYA P,et al.CP-BDHCA:Blockchain-based Confidentiality-Privacy preserving Big Data scheme for healthcare clouds and applications[J].IEEE Journal of Biomedical and Health Informatics,2021,26(5):1937-1948.
[16] GOUNARI M,STERGIOPOULOS G,PIPYROS K,et al.Harmonizing open banking in the European Union:An analysis of PSD2 compliance and interrelation with cybersecurity frameworks and standards[J].International Cybersecurity Law Review,2024,5(1):79-120.
[17] LEE J S,JUN S P.Privacy-preserving data mining for open go-vernment data from heterogeneous sources[J].Government Information Quarterly,2021,38(1):101544.
[18] VOIGT P,VON DEM BUSSCHE A.The eu general data protection regulation(gdpr)[M].Cham:Springer International Publishing,2017.
[19] BONTA R.California consumer privacy act(CCPA)[EB/OL]https://oag.ca.gov/privacy/ccpa.
[20] Data Security Law of the People's Republic of China[J].Gazette of the Standing Committee of the National People's Congress of the People's Republic of China,2021(5):951-956.
[21] Personal Information Protection Law of the People's Republic of China[J].Gazette of the Standing Committee of the National People's Congress of the People's Republic of China,2021(6):1117-1125.
[22] AUMÜLLER M,BERNHARDSSON E,FAITHFULL A.ANN-Benchmarks:A benchmarking tool for approximate nearest neighbor algorithms[J].Information Systems,2020,87:101374.
[23] YANG Q,LIU Y,CHEN T,et al.Federated machine learning:Concept and applications[J].ACM Transactions on Intelligent Systems and Technology,2019,10(2):1-19.
[24] CALLAN J.Distributed information retrieval[M]//Advances in Information Retrieval:Recent Research from the Center for Intelligent Information Retrieval.Boston,MA:Springer US,2002:127-150.
[25] ASAI A,MIN S,ZHONG Z,et al.Retrieval-based languagemodels and applications[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:41-46.
[26] HUANG G.Data Internet:Digital Space Infrastructure[J].Communications of the China Computer Federation,2021,17(12):60-62.
[27] KAHN R,WILENSKY R.A framework for distributed digital object services[J].International Journal on Digital Libraries,2006,6(2):115-123.
[28] ZHANG N,LIU Y,MA X J,et al.Data Internet identification and resolution technology for human-machine-object integration[J].Journal of Software,2024,35(10):4681-4695.
[29] MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[30] WU W L,YIN H L,WANG N,et al.Cross-domain heterogeneous data query framework with collaboration of large language models and knowledge graphs[J].Journal of Computer Research and Development,2025,62(3):605-619.
[31] TANG Y,YANG Y.MultiHop-RAG:Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries[C]//First Conference on Language Modeling.2024:1-16.
[1] ZHAO Wenhao, MEI Meng, WANG Xiaoping, LUO Hangyu. PKHOI:Enhancing Human-Object Interaction Detection Algorithms with Prior Knowledge [J]. Computer Science, 2026, 53(1): 141-152.
[2] JIANG Kun, ZHAO Zhengpeng, PU Yuanyuan, HUANG Jian, GU Jinjing, XU Dan. Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis [J]. Computer Science, 2025, 52(7): 210-217.
[3] WANG Ruijia, SHEN Zhen, LI Junjie, DING Lei. Redundancy Compression Strategy in Cooperative Perception Services Based on Value ofInformation [J]. Computer Science, 2025, 52(11A): 241100009-6.
[4] FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
[5] WANG Zihong, SHAO Yingxia, HE Jiyuan, LIU Jinbao. Sequential Recommendation Based on Multi-space Attribute Information Fusion [J]. Computer Science, 2024, 51(3): 102-108.
[6] XIE Hui, ZHANG Pengyuan, DONG Zexiao, YANG Huiting, KANG Huan, HE Jiangshan, CHEN Xueli. Student Academic Performance Predictive Model Based on Dual-stream Deep Network [J]. Computer Science, 2024, 51(10): 119-128.
[7] CHENG Haiyang, ZHANG Jianxin, SUN Qisen, ZHANG Qiang, WEI Xiaopeng. Deep Cross-modal Information Fusion Network for Stock Trend Prediction [J]. Computer Science, 2023, 50(5): 128-136.
[8] ZHANG Weiliang, CHEN Xiuhong. SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification [J]. Computer Science, 2023, 50(3): 231-237.
[9] CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion [J]. Computer Science, 2023, 50(3): 298-306.
[10] CAO Jinxin, XU Weizhong, JIN Di, DING Weiping. Survey of Community Discovery in Complex Networks [J]. Computer Science, 2023, 50(11A): 230100130-11.
[11] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[12] ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[13] ZHOU Chu-lin, CHEN Jing-dong, HUANG Fan. WiFi-PDR Fusion Indoor Positioning Technology Based on Unscented Particle Filter [J]. Computer Science, 2022, 49(6A): 606-611.
[14] LIU Zhi-xin, ZHANG Ze-hua, ZHANG Jie. Top-N Recommendation Method for Graph Attention Based on Multi-level and Multi-view [J]. Computer Science, 2021, 48(4): 104-110.
[15] ZHANG Liang-cheng, WANG Yun-feng. Dynamic Adaptive Multi-radar Tracks Weighted Fusion Method [J]. Computer Science, 2020, 47(11A): 321-326.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!