LangGraph Customer Health Agent with CData Connect AI

Build intelligent customer health analysis agents that autonomously discover data schemas, query live enterprise data, and generate executive health briefs using LangGraph + CData Connect AI.

NOTE: While this guide uses Google Sheets as the data source, the same principles apply to any of the 350+ data sources CData Connect AI supports.

Architecture

flowchart TD
    UP["User Prompt"] --> G

    subgraph G["ReAct Gatherer Agent"]
        direction TB
        LLM["LLM decides next action"] --> TOOLS["MCP Tools<br>get_catalogs, get_schemas<br>get_tables, get_columns<br>query_data"]
        TOOLS --> |"loop until done"| LLM
    end

    TOOLS <--> CDA["CData Connect AI<br>(350+ sources)"]
    G --> A["Analyst Node (LLM)<br>Structured JSON: health_score,<br>signals, recommendations, risks"]
    A --> R["Renderer Node<br>Jinja2 template to HTML<br>Deterministic, no LLM"]
    R --> OUT["output/*.html"]

How it works:

A ReAct Gatherer Agent autonomously discovers schemas via MCP tools and gathers data through iterative tool calls
An Analyst node makes a single LLM call to produce a structured JSON health assessment (score, signals, recommendations, risks)
A Renderer node fills a Jinja2 template with the analysis and saves a styled HTML brief (deterministic, no LLM)
Schema caching avoids redundant discovery on subsequent runs (24h TTL by default)
Each node can be upgraded to a full agent by adding tools -- the pipeline is multi-agent-ready

Agent Pipeline Patterns

The gatherer uses LangGraph's create_react_agent -- a ReAct (Reason + Act) loop where the LLM decides which tool to call next based on results so far. This means the agent:

Discovers schemas dynamically (no hard-coded table names)
Adapts queries based on what it finds
Handles any CData-connected data source without code changes

The Analyst is a single structured LLM call (no tools), and the Renderer is fully deterministic (no LLM). Both are designed as extension points -- add tools to either node and it becomes a full agent. This separation keeps the pipeline predictable while giving the gatherer maximum flexibility.

Quick Start

Prerequisites

Python 3.8+ (Download)
CData Connect AI account (Free trial)
OpenAI API key (Get key) or another supported LLM provider
Git (Download)

Installation

git clone https://github.com/CDataSoftware/langgraph-customer-health-agent.git
cd langgraph-customer-health-agent
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your credentials

Set Up Sample Data

Copy the sample Google Sheet (File > Make a copy, name it "demo_organization")
In CData Connect AI, add a Google Sheets connection pointing to the copied sheet
Create a Personal Access Token under Settings > Access Tokens
Update .env with your credentials

Quick Run (Interactive)

The easiest way to get started. The interactive runner handles setup, credential configuration, and running the agent:

python run.py

It provides a menu-driven interface to:

Setup wizard -- configure CData credentials and choose an LLM provider (OpenAI, Gemini, DeepSeek)
Run health analysis -- analyze a specific account with sample suggestions
Run open-ended query -- ask any question about your data
Refresh schema cache -- clear cached schemas for re-discovery
Check setup -- verify credentials, test MCP connection, check dependencies

rich is auto-installed on first run if not already present.

CLI Usage (Direct)

Account Health Analysis

python src/main.py --account "Premium Auto Group Europe"

Runs the full 3-node pipeline: discover schema, gather data, analyze health, render HTML brief.

Open-Ended Query

python src/main.py "Which industries have the highest average revenue?"
python src/main.py "Show me all accounts with high priority open tickets"

Ask any question -- the ReAct agent figures out which tables and queries to run.

Options

python src/main.py --account "Acme Corp" --verbose        # DEBUG logging
python src/main.py --account "Acme Corp" --refresh-schema # Force schema re-discovery

Flag	Description
`--account NAME`	Shortcut for health analysis of a specific account
`--refresh-schema`	Clear cached schema and re-discover on next run
`--verbose`	Set log level to DEBUG for detailed output

Multi-Provider LLM Support

Configure the LLM provider via environment variables:

Provider	`LLM_PROVIDER`	`LLM_MODEL` (example)	Required package
OpenAI	`openai`	`gpt-4o`	`langchain-openai` (included)
Anthropic	`anthropic`	`claude-sonnet-4-20250514`	`langchain-anthropic`
Google	`google`	`gemini-pro`	`langchain-google-genai`
Ollama	`ollama`	`llama3`	`langchain-ollama`

Any OpenAI-compatible API (DeepSeek, Azure OpenAI, etc.) works with LLM_PROVIDER=openai and OPENAI_API_BASE.

Schema Caching

On first run, the agent discovers schemas via MCP (catalogs, tables, columns). This metadata is cached to ~/.cache/langgraph-health/schema.json with a 24-hour TTL.

Cache hit: Agent skips discovery and starts querying immediately
Cache miss/expired: Agent discovers schema, then caches for next run
Force refresh: --refresh-schema flag or set SCHEMA_CACHE_TTL=0
Demo shortcut: Set CDATA_CATALOG in .env to skip catalog discovery

Extending the Agent Pipeline

Add custom agents by appending to the PIPELINE list in src/graph.py:

PIPELINE = [
    ("gather", gather_node),
    ("analyze", analyze_node),
    ("my_custom_step", my_custom_node),  # Add your agent here
    ("render", render_node),
]

Each node receives and returns the shared AgentState dict.

Project Structure

langgraph-customer-health-agent/
├── run.py                     # Interactive runner (python run.py)
├── src/
│   ├── config.py              # Configuration & get_llm() factory
│   ├── state.py               # AgentState TypedDict
│   ├── mcp_tools.py           # 5 @tool-decorated MCP wrappers
│   ├── schema_cache.py        # TTL-based schema cache
│   ├── logger.py              # Lightweight logging & stats
│   ├── graph.py               # 3-node LangGraph pipeline
│   ├── main.py                # CLI entry point
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── gatherer.py        # ReAct agent (schema discovery + queries)
│   │   ├── analyst.py         # LLM node: structured JSON health analysis
│   │   └── renderer.py        # Deterministic node: Jinja2 HTML brief renderer
│   └── templates/
│       └── brief.html         # Jinja2 HTML template
├── output/                    # Generated briefs
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md

Environment Variables

# Required
CDATA_EMAIL=your-email@company.com
CDATA_PAT=your-personal-access-token
OPENAI_API_KEY=sk-proj-...

# LLM configuration
LLM_PROVIDER=openai                 # openai, anthropic, google, ollama
LLM_MODEL=gpt-4o                    # Model name for chosen provider

# Optional
CDATA_CATALOG=ConnectionName         # Skip catalog discovery
SCHEMA_CACHE_TTL=86400               # Cache TTL in seconds (default 24h)
LOG_LEVEL=INFO                       # INFO or DEBUG
MAX_ITERATIONS=15                    # Max ReAct tool-call loops
OPENAI_API_BASE=https://...          # Custom API base URL

Troubleshooting

Error	Solution
`Query returned no results`	Verify connection name and data existence in CData Connect AI
`MCP error`	Check CDATA_EMAIL / CDATA_PAT in .env
`Unsupported LLM_PROVIDER`	Use openai, anthropic, google, or ollama
`ImportError: langchain-...`	Install the package for your chosen LLM provider
Agent loops too many times	Reduce MAX_ITERATIONS or set CDATA_CATALOG to narrow scope

Resources

License

MIT License - See LICENSE for details.

CData Software - https://www.cdata.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangGraph Customer Health Agent with CData Connect AI

Architecture

Agent Pipeline Patterns

Quick Start

Prerequisites

Installation

Set Up Sample Data

Quick Run (Interactive)

CLI Usage (Direct)

Account Health Analysis

Open-Ended Query

Options

Multi-Provider LLM Support

Schema Caching

Extending the Agent Pipeline

Project Structure

Environment Variables

Troubleshooting

Resources

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

LangGraph Customer Health Agent with CData Connect AI

Architecture

Agent Pipeline Patterns

Quick Start

Prerequisites

Installation

Set Up Sample Data

Quick Run (Interactive)

CLI Usage (Direct)

Account Health Analysis

Open-Ended Query

Options

Multi-Provider LLM Support

Schema Caching

Extending the Agent Pipeline

Project Structure

Environment Variables

Troubleshooting

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages