Build intelligent customer health analysis agents that autonomously discover data schemas, query live enterprise data, and generate executive health briefs using LangGraph + CData Connect AI.
NOTE: While this guide uses Google Sheets as the data source, the same principles apply to any of the 350+ data sources CData Connect AI supports.
flowchart TD
UP["User Prompt"] --> G
subgraph G["ReAct Gatherer Agent"]
direction TB
LLM["LLM decides next action"] --> TOOLS["MCP Tools<br>get_catalogs, get_schemas<br>get_tables, get_columns<br>query_data"]
TOOLS --> |"loop until done"| LLM
end
TOOLS <--> CDA["CData Connect AI<br>(350+ sources)"]
G --> A["Analyst Node (LLM)<br>Structured JSON: health_score,<br>signals, recommendations, risks"]
A --> R["Renderer Node<br>Jinja2 template to HTML<br>Deterministic, no LLM"]
R --> OUT["output/*.html"]
How it works:
- A ReAct Gatherer Agent autonomously discovers schemas via MCP tools and gathers data through iterative tool calls
- An Analyst node makes a single LLM call to produce a structured JSON health assessment (score, signals, recommendations, risks)
- A Renderer node fills a Jinja2 template with the analysis and saves a styled HTML brief (deterministic, no LLM)
- Schema caching avoids redundant discovery on subsequent runs (24h TTL by default)
- Each node can be upgraded to a full agent by adding tools -- the pipeline is multi-agent-ready
The gatherer uses LangGraph's create_react_agent -- a ReAct (Reason + Act) loop where the LLM decides which tool to call next based on results so far. This means the agent:
- Discovers schemas dynamically (no hard-coded table names)
- Adapts queries based on what it finds
- Handles any CData-connected data source without code changes
The Analyst is a single structured LLM call (no tools), and the Renderer is fully deterministic (no LLM). Both are designed as extension points -- add tools to either node and it becomes a full agent. This separation keeps the pipeline predictable while giving the gatherer maximum flexibility.
- Python 3.8+ (Download)
- CData Connect AI account (Free trial)
- OpenAI API key (Get key) or another supported LLM provider
- Git (Download)
git clone https://github.com/CDataSoftware/langgraph-customer-health-agent.git
cd langgraph-customer-health-agent
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your credentials- Copy the sample Google Sheet (File > Make a copy, name it "demo_organization")
- In CData Connect AI, add a Google Sheets connection pointing to the copied sheet
- Create a Personal Access Token under Settings > Access Tokens
- Update
.envwith your credentials
The easiest way to get started. The interactive runner handles setup, credential configuration, and running the agent:
python run.pyIt provides a menu-driven interface to:
- Setup wizard -- configure CData credentials and choose an LLM provider (OpenAI, Gemini, DeepSeek)
- Run health analysis -- analyze a specific account with sample suggestions
- Run open-ended query -- ask any question about your data
- Refresh schema cache -- clear cached schemas for re-discovery
- Check setup -- verify credentials, test MCP connection, check dependencies
richis auto-installed on first run if not already present.
python src/main.py --account "Premium Auto Group Europe"Runs the full 3-node pipeline: discover schema, gather data, analyze health, render HTML brief.
python src/main.py "Which industries have the highest average revenue?"
python src/main.py "Show me all accounts with high priority open tickets"Ask any question -- the ReAct agent figures out which tables and queries to run.
python src/main.py --account "Acme Corp" --verbose # DEBUG logging
python src/main.py --account "Acme Corp" --refresh-schema # Force schema re-discovery| Flag | Description |
|---|---|
--account NAME |
Shortcut for health analysis of a specific account |
--refresh-schema |
Clear cached schema and re-discover on next run |
--verbose |
Set log level to DEBUG for detailed output |
Configure the LLM provider via environment variables:
| Provider | LLM_PROVIDER |
LLM_MODEL (example) |
Required package |
|---|---|---|---|
| OpenAI | openai |
gpt-4o |
langchain-openai (included) |
| Anthropic | anthropic |
claude-sonnet-4-20250514 |
langchain-anthropic |
google |
gemini-pro |
langchain-google-genai |
|
| Ollama | ollama |
llama3 |
langchain-ollama |
Any OpenAI-compatible API (DeepSeek, Azure OpenAI, etc.) works with LLM_PROVIDER=openai and OPENAI_API_BASE.
On first run, the agent discovers schemas via MCP (catalogs, tables, columns). This metadata is cached to ~/.cache/langgraph-health/schema.json with a 24-hour TTL.
- Cache hit: Agent skips discovery and starts querying immediately
- Cache miss/expired: Agent discovers schema, then caches for next run
- Force refresh:
--refresh-schemaflag or setSCHEMA_CACHE_TTL=0 - Demo shortcut: Set
CDATA_CATALOGin.envto skip catalog discovery
Add custom agents by appending to the PIPELINE list in src/graph.py:
PIPELINE = [
("gather", gather_node),
("analyze", analyze_node),
("my_custom_step", my_custom_node), # Add your agent here
("render", render_node),
]Each node receives and returns the shared AgentState dict.
langgraph-customer-health-agent/
├── run.py # Interactive runner (python run.py)
├── src/
│ ├── config.py # Configuration & get_llm() factory
│ ├── state.py # AgentState TypedDict
│ ├── mcp_tools.py # 5 @tool-decorated MCP wrappers
│ ├── schema_cache.py # TTL-based schema cache
│ ├── logger.py # Lightweight logging & stats
│ ├── graph.py # 3-node LangGraph pipeline
│ ├── main.py # CLI entry point
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── gatherer.py # ReAct agent (schema discovery + queries)
│ │ ├── analyst.py # LLM node: structured JSON health analysis
│ │ └── renderer.py # Deterministic node: Jinja2 HTML brief renderer
│ └── templates/
│ └── brief.html # Jinja2 HTML template
├── output/ # Generated briefs
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md
# Required
CDATA_EMAIL=your-email@company.com
CDATA_PAT=your-personal-access-token
OPENAI_API_KEY=sk-proj-...
# LLM configuration
LLM_PROVIDER=openai # openai, anthropic, google, ollama
LLM_MODEL=gpt-4o # Model name for chosen provider
# Optional
CDATA_CATALOG=ConnectionName # Skip catalog discovery
SCHEMA_CACHE_TTL=86400 # Cache TTL in seconds (default 24h)
LOG_LEVEL=INFO # INFO or DEBUG
MAX_ITERATIONS=15 # Max ReAct tool-call loops
OPENAI_API_BASE=https://... # Custom API base URL| Error | Solution |
|---|---|
Query returned no results |
Verify connection name and data existence in CData Connect AI |
MCP error |
Check CDATA_EMAIL / CDATA_PAT in .env |
Unsupported LLM_PROVIDER |
Use openai, anthropic, google, or ollama |
ImportError: langchain-... |
Install the package for your chosen LLM provider |
| Agent loops too many times | Reduce MAX_ITERATIONS or set CDATA_CATALOG to narrow scope |
- LangGraph Documentation
- CData Connect AI
- OpenAI API Documentation
- Model Context Protocol
- CData Community Forums
MIT License - See LICENSE for details.
CData Software - https://www.cdata.com
