SQL-MARS:面向用户模糊需求的 Text2SQL 结构化数据推荐系统

doi:10.11896/jsjkx.250700096

Abstract

Abstract: With the maturity of LLM technology,natural language-based database interaction systems(e.g.,Chat2DB,ChatExcel) have achieved wide application.However,existing systems generally rely on the “precise query” assumption and struggle to handle the ubiquitous ambiguous requirements in real-world scenarios,where users need to clarify their query needs during interaction with the system.To address this challenge,this paper proposes SQL-MARS(SQL-oriented Multi-Agent Recommender System),a multi-agent collaborative framework based on a “perception-action-evaluation” closed-loop mechanism for dynamic identification and adaptive processing of ambiguous database query requirements.The system introduces a three-layer metadata architecture to model user’s requirements for ambiguous awareness.Based on this,it implements data navigation function,providing query recommendations at varying granularities based on users’ ambiguous requirements to progressively guide them in clari-fying their query needs.Additionally,the system proposes the fusion mechanism between external knowledge and local data to fully utilize valuable information from external sources.We alsocreate the dataset named Bird-fuzzy for ambiguous requirements and implements automated evaluation.Experimental results show that SQL-MARS can effectively identify ambiguous requirements and guide users to clarify their data needs.

Key words: Ambiguous requirements, Multi-agent collaboration, Hierarchical metadata, Data navigation, External knowledge fusion

CLC Number:

TP311

XU Jiawen, ZHENG Yungui, ZHOU Wei, XU Yaoqiang, HU Huiqi, ZHOU Xuan. SQL-MARS:Text-to-SQL Structured Data Recommendation System for Ambiguous UserRequirements[J].Computer Science, 2026, 53(3): 52-63.

References

[1]ZHAO X,ZHOU X,LI G.Chat2Data:An Interactive DataAnalysis System with RAG,Vector Databases and LLMs[C]//Proceedings of the VLDB Endowment.ACM,2024:4481-4484.
[2]LI J,HUI B,QU G,et al.Canllm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls[C]//Proceedings of Curran Associates Inc.2023.
[3]WU Q,BANSAL G,ZHANG J,et al.AutoGen:Enabling Next-Gen LLM Applications via Multi-Agent Conversations[C]//Proceedings of the First Conference on Language Modeling.2024.
[4]HONG S,ZHUG M,CHEN J,et al.MetaGPT:Meta Programming for Multi-Agent Collaborative Framework[C]//Procee-dings of the 12th International Conference on Learning Representations(ICLR 2024).ICLR,2024.
[5]WANG B,ZHANG Y,LI X,et al.MAC-SQL:A Multi-Agent Collaborative Framework for Text-to-SQL[C]//Proceedings of the 12th International Conference on Learning Representations(ICLR 2024).ICLR,2024.
[6]POURREZA M,RAFIEI D,LI B,et al.CHASE-SQL:Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL[C]//Proceedings of the 13th International Conference on Learning Representations(ICLR 2025).ICLR,2025.
[7]YAO S,ZHENG S,YU B,et al.ReAct:Synergizing Reasoning and Acting in Language Models[C]//Proceedings of the 11th International Conference on Learning Representations(ICLR 2023).ICLR,2023.
[8]SHINN N,CASSANO F,GOPINATH A,et al.Reflexion:Language Agents with Verbal Reinforcement Learning[C]//Advances in Neural Information Processing Systems.2023:8634-8652.
[9]TANG N,FAN J,LI F Y,et al.RPT:Relational Pre-trainedTrans-former Is Almost All You Need towards Democratizing Data Preparation[C]//Proceedings of the VLDB Endowment.2021:1254-1261.
[10]ZHOU X,SUN Z,LI G.DB-GPT:Large Language Model Meets Database[J].Data Science and Engineering,2024,9(1):102-111.
[11]ZHAO F H,LIM L,AHMAD I,et al.LLM-SQL-Solver:Can LLMs Determine SQL Equivalence?[J].arXiv:2312.10321,2023.
[12]ZHOU X,LI G.D-Bot:Database Diagnosis System Using Large Language Models[J].arXiv:2312.01454,2023.
[13]CHANG S,FOSLER-LUSSIER E.How to Prompt LLMs forText-to-SQL:A Study in Zero-Shot,Single-Domain,and Cross-Domain Settings[C]//NeurIPS 2023 Second Table Representation Learning Workshop.2023.
[14]ZHANG B,YE Y,HU X,et al.Benchmarking the Text-to-SQL Capability of Large Language Models:A Comprehensive Evaluation[J].arXiv:2403.02951,2024.
[15]XIA H,JIANG F,DENG N,et al.SQL-Craft:Text-to-SQLThrough Interactive Refinement and Enhanced Reasoning[J].arXiv:2402.14851v1,2024.
[16]DONG X,ZHANG C,GE Y,et al.C3:Zero-Shot Text-to-SQL with ChatGPT[J].arXiv:2307.07306,2023.
[17]POURREZA M,RAFIEI D.DIN-SQL:Decomposed In-Context Learning of Text-to-SQL with Self-Correction[C]//Advances in Neural Information Processing Systems.Curran Associates Inc.,2023:36339-36348.
[18]XIE Y,JIN X,XIE T,et al.Decomposition for Enhancing Atten-tion:Improving LLM-Based Text-to-SQL through Workflow Paradigm[C]//Findings of the Association for Computational Linguistics:ACL 2024.ACL,2024:10796-10816.
[19]DENG M,XU C,HU L,et al.ReFoRCE:A Text-to-SQL Agent with Self-Refinement,Format Restriction,and Column Exploration[C]//ICLR 2025 Workshop:VerifAI:AI Verification in the Wild.International Conference on Learning Representations.2025.
[20]TAI C,CHEN Z,ZHANG T,et al.Exploring Chain of Thought Style Prompting for Text-to-SQL[C]//Conference on Empirical Methods in Natural Language Processing.ACL,2023:5376-5393.
[21]FRANCISCATTO M,DEL FABRO M,TROIS C,et al.Talk to Your Data:A Chatbot System for Multidimensional Datasets[C]//2022 IEEE 46th Annual Computers,Software,and Applications Conference.IEEE Computer Society,2022:486-495.
[22]CHAUDHURI R.Automated Question Generation on Tabular Data for Conversational Data Exploration[J].arXiv:2407.12859,2024.
[23]MANATKAR A,AKELLA A,GUPTA P,et al.QUIS:Question-Guided Insights Generation for Automated Exploratory Data Analysis[C]//2024 Conference on Empirical Methods in Na-tural Language Processing:Industry Track.ACL.2024:1523-1535.

Related Articles 15

[1]	SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[2]	LU Chao, YANG Chaoshu, YAO Zhengzhu, LIU Ying, ZHANG Runyu. Survey on Optimization B+ Tree Index for Persistent Memory [J]. Computer Science, 2026, 53(1): 77-88.
[3]	LI Shunyong, ZHENG Mengjiao, LI Jiaming, ZHAO Xingwang. Joint Spectrum Embedding Clustering Algorithm Based on Multi-view Diversity Learning [J]. Computer Science, 2026, 53(1): 104-114.
[4]	SONG Yijing, ZHANG Jifu. Attribute Grouping-based Categorical Outlier Detection Using Isolation Forest Ensemble Strategy [J]. Computer Science, 2026, 53(1): 115-127.
[5]	XU Teng, LIU Luyao, JIANG Haoyu, LUO Chang, LI Heng, YUAN Wei. Survey on Security of Android SDKs [J]. Computer Science, 2026, 53(1): 285-297.
[6]	PAN Yanyang, YANG Binhao, JI Qingge. PBFT Consensus Algorithm Based on Bayesian Theory [J]. Computer Science, 2026, 53(1): 331-340.
[7]	ZHANG Lizheng, YANG Qiuhui, DAI Shengxin. Automated Program Repair Based on Perturbing and Freezing Pre-trained Model [J]. Computer Science, 2025, 52(12): 18-23.
[8]	ZHANG Cong, CHEN Zhe, WANG Huijie, WEI Yiyang. SCADE Model Checking Based on Implicit Predicate Abstraction and Property-directedReachability [J]. Computer Science, 2025, 52(12): 24-31.
[9]	LI Jianhao, BAI Yaoyao, MI Jie, ZHANG Yingzhou, CAO Wenlong, WANG Dong, WANG Gang. Cross-procedure Feature Envy Detection Supporting Type-sensitive Scenarios [J]. Computer Science, 2025, 52(12): 32-39.
[10]	SONG Rirong, CHEN Qinwen, CHEN Xing. Distributed Automated Testing for Android Applications Based on Reinforcement Learning [J]. Computer Science, 2025, 52(12): 40-47.
[11]	WANG Zhiyi, HU Jun, XU Heng. Transition and Verification Method from RSML^-e to Lustre Model for Flight Mode Transition [J]. Computer Science, 2025, 52(12): 48-59.
[12]	LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230.
[13]	LIU Weijie, TANG Zecheng, LI Juntao. MemLong:Memory-augmented Retrieval for Long Text Modeling [J]. Computer Science, 2025, 52(12): 231-238.
[14]	WANG Shuai, HUANG Chen, JIANG Yunsong, XIAO Xi, WANG Guanlin, YU Tingting, XU Qizhen. AFL-VTest:Fuzzing Framework for Aerospace Embedded Software [J]. Computer Science, 2025, 52(12): 9-17.
[15]	PAN Jie, WANG Juan, WANG Nan. Large Language Models and Rumors:A Survey on Generation and Detection [J]. Computer Science, 2025, 52(11): 1-12.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

SQL-MARS:Text-to-SQL Structured Data Recommendation System for Ambiguous UserRequirements

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0