Querying and Plotting Data with LangChain’s Pandas Agent + OpenAI

Querying and Plotting Data with LangChain’s Pandas Agent + OpenAI

Natural language interfaces for data analysis are moving from research into everyday engineering practice.
With LangChain and OpenAI, you can now query a DataFrame directly in English, let the model write and execute the Pandas/Matplotlib code, and even plot results, all inside Python.

In this post, I’ll show you how I built a Pandas DataFrame Agent with LangChain + OpenAI that can:

  • Answer tabular questions about a dataset.

  • Generate Python plotting code automatically.

  • Execute that code safely to produce charts.


🔧 Setup

We need a few packages:

pip install langchain langchain-openai langchain-experimental pandas matplotlib

And of course, set your OpenAI key:

export OPENAI_API_KEY=...

📊 The Dataset

For demo purposes, let’s mock up some sales data:

import pandas as pd data = { "month": ["2025-01","2025-02","2025-03","2025-04","2025-05","2025-06"], "region": ["EMEA","EMEA","EMEA","AMER","AMER","APAC"], "units": [120, 150, 130, 200, 180, 160], "price": [10.0, 10.0, 10.0, 12.0, 12.0, 9.0], } df = pd.DataFrame(data) df["revenue"] = df["units"] * df["price"]

🤖 Creating the Agent

The magic comes from create_pandas_dataframe_agent.
This wraps the DataFrame in a tool the LLM can call with code execution:

from langchain_openai import ChatOpenAI from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) agent = create_pandas_dataframe_agent( llm=llm, df=df, agent_type="openai-tools", verbose=False, # cleaner logs )

🔎 Asking Questions in English

Now you can query the data without touching Pandas:

resp = agent.invoke({ "input": "Compute total revenue by region as a tidy table sorted descending." }) print(resp["output"])

Output:

| region | revenue | |--------|---------| | AMER | 4560.0 | | EMEA | 4000.0 | | APAC | 1440.0 |

The agent generated and executed the Pandas code under the hood.


📈 Asking for a Plot

We can go one step further — ask the model to plot monthly revenue:

plot_code = """ import pandas as pd import matplotlib.pyplot as plt from pathlib import Path df['month'] = pd.to_datetime(df['month']) monthly_revenue = df.groupby('month')['revenue'].sum() plt.figure(figsize=(10, 5)) plt.plot(monthly_revenue.index, monthly_revenue.values, marker='o') plt.title('Monthly Revenue Over Time') plt.xlabel('Month') plt.ylabel('Total Revenue') plt.grid() plt.xticks(rotation=45) plt.tight_layout() out = Path('monthly_revenue.png') plt.savefig(out) print(f'Chart saved to: {out.resolve()}') """ agent.invoke({"input": f"Execute this exact code:\n```python\n{plot_code}\n```"})

This produces:



🚀 Why This Matters

This pattern unlocks:

  • Rapid ad-hoc analytics: Ask questions in English, get code + charts.

  • Non-technical users can explore data without writing Pandas.

  • Bridges RAG + analytics: Instead of only text retrieval, you can augment LLMs with structured data queries.

Of course, for production you’d want guardrails (e.g., AST linting before execution, schema checks, sandboxing). But as a prototyping tool, this workflow is incredibly powerful.


Source code can be found here: JordiCorbilla/langgraph-cookbook: langgraph-cookbook

Comments

Popular Posts