Are Large Language Models Ready for Multi-Turn Tabular Data Analysis?

  • Jinyang Li
  • , Nan Huo
  • , Yan Gao
  • , Jiayi Shi
  • , Yingxiu Zhao
  • , Ge Qu
  • , Bowen Qin
  • , Yurong Wu
  • , Xiaodong Li
  • , Chenhao Ma
  • , Jian Guang Lou
  • , Reynold Cheng

Research output: Contribution to journalConference articlepeer-review

Abstract

Conversational Tabular Data Analysis, a collaboration between humans and machines, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic conversational logs for tabular data analysis hinder comprehensive quantitative evaluation of Large Language Models (LLMs) in this task. To mitigate this issue, we introduce COTA, a new benchmark to evaluate LLMs on conversational data analysis. COTA contains 1013 conversations, covering 4 practical scenarios: NORMAL, ACTION, PRIVATE, and PRIVATE ACTION. Notably, COTA is constructed by a multi-agent environment, DECISION COMPANY. This environment ensures efficiency and scalability of generating new conversational data. Our comprehensive study, conducted by data analysis experts, demonstrates that DECISION COMPANY is capable of producing diverse and high-quality data, laying the groundwork for efficient data annotation. We evaluate popular and advanced LLMs in COTA, which highlights the challenges of conversational tabular data analysis. Furthermore, we propose Adaptive Conversation Reflection (ACR), a self-generated reflection strategy that guides LLMs to learn from successful histories. Experiments demonstrate that ACR can evolve LLMs into effective conversational tabular data analysis agents, achieving a relative performance improvement of up to 35.14%. Code can be found at https: //tapilot-crossing.github.io/

Original languageEnglish
Pages (from-to)34795-34835
Number of pages41
JournalProceedings of Machine Learning Research
Volume267
Publication statusPublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Fingerprint

Dive into the research topics of 'Are Large Language Models Ready for Multi-Turn Tabular Data Analysis?'. Together they form a unique fingerprint.

Cite this