🤖 AI Chat Assistant with Weather + Confluence RAG

A local AI chat assistant built with Qwen 2.5-1.5B, featuring real-time weather via Apify MCP and semantic search over any website/Confluence using RAG (ChromaDB).

✨ Features

💬 AI Chat — Qwen 2.5-1.5B running fully on CPU (Mac Intel)
🌤️ Weather Tool — Live weather via Apify MCP (Model Context Protocol)
📚 Confluence RAG — Semantic search over any website or Confluence instance
📡 Streaming — Token-by-token streaming responses
🔧 Pattern 2 Tool Calling — Model decides when to use tools (no hardcoded rules)

🗂️ Project Structure

├── app.py                      # Main Flask app (entry point)
├── apify_weather_mcp.py        # Apify Weather MCP client
├── confluence_crawler.py       # Recursive web crawler
├── confluence_rag.py           # Chunking, embedding, ChromaDB search
├── confluence_tool.py          # search_confluence() tool definition
├── ingest.py                   # One-time crawl + index script
├── templates/
│   └── index.html              # Chat UI
├── static/                     # CSS, JS assets
├── requirements.txt            # Python dependencies
├── .env.example                # Environment variable template
├── .gitignore
└── README.md

🚀 Quick Start

1. Clone the repo

git clone https://github.com/YOUR_USERNAME/ai-chat-assistant.git
cd ai-chat-assistant

2. Create virtual environment

python -m venv venv
source venv/bin/activate        # Mac/Linux
# venv\Scripts\activate         # Windows

3. Install dependencies

pip install -r requirements.txt

4. Set environment variables

cp .env.example .env
# Edit .env and add your APIFY_TOKEN

5. Build the knowledge base (one time)

# Crawl and index the default URL
python ingest.py

# Or use your own URL
python ingest.py --url https://your-confluence.atlassian.net/wiki/spaces/SPACE

# Limit pages for a quick test
python ingest.py --max-pages 20

6. Run the app

python app.py

Open http://localhost:8000 in your browser.

⚙️ Environment Variables

Create a .env file (copy from .env.example):

APIFY_TOKEN=<get your token>

Get your Apify token at: https://console.apify.com/account/integrations

🔧 Configuration

All settings are in app.py under CONFIG:

Key	Default	Description
`base_model`	`Qwen/Qwen2.5-1.5B-Instruct`	HuggingFace model
`max_length`	`200`	Max tokens in response
`temperature`	`0.7`	Response creativity
`tool_temperature`	`0.0`	Tool decision (deterministic)
`max_tool_rounds`	`3`	Max tool call iterations

RAG settings are in confluence_rag.py:

Key	Default	Description
`CHUNK_SIZE`	`500`	Characters per chunk
`CHUNK_OVERLAP`	`100`	Overlap between chunks
`EMBEDDING_MODEL`	`all-MiniLM-L6-v2`	Sentence transformer model

📚 Knowledge Base Management

# Initial crawl + index
python ingest.py --url https://your-site.com/docs

# Re-index without re-crawling (uses cached crawl_results.json)
python ingest.py --from-file crawl_results.json

# Wipe and re-index from scratch
python ingest.py --reset

# Check how many chunks are indexed
python -c "from confluence_rag import ConfluenceRAG; r = ConfluenceRAG(); print(r.stats())"

Refresh via API endpoint

curl -X POST http://localhost:8000/api/refresh-kb

🔐 Using a Private Confluence Instance

If you have a real Atlassian Confluence Cloud instance, update confluence_crawler.py:

import base64

EMAIL = "your@email.com"
API_TOKEN = "your_confluence_api_token"
credentials = base64.b64encode(f"{EMAIL}:{API_TOKEN}".encode()).decode()

headers = {
    "Authorization": f"Basic {credentials}",
    "User-Agent": "RAG-Crawler/1.0"
}

Get your Confluence API token: https://id.atlassian.com/manage-profile/security/api-tokens

🏗️ Architecture

User Query
    ↓
Flask Backend (app.py)
    ↓
Qwen 2.5-1.5B decides:
    ├── Weather query?   → Apify MCP → Open-Meteo API
    ├── Docs/knowledge?  → ChromaDB semantic search → Top 5 chunks
    └── Direct answer?   → Stream response
    ↓
Final answer streamed token by token

Architecture (Full)

1. System Architecture (Full)

graph TB
    subgraph FRONTEND["① FRONTEND — HTML/CSS/JS"]
        UI["💬 Chat UI<br/>index.html"]
        FETCH["📡 Fetch API<br/>POST /api/chat"]
        SSE["🔄 SSE Stream Reader<br/>token-by-token"]
        STATUS["📊 Status Check<br/>/api/status"]
    end

    subgraph FLASK["② FLASK BACKEND — app.py"]
        ROUTE["🔀 Route Handler<br/>/api/chat"]
        STREAM["📤 Streaming Response<br/>text/event-stream"]
        DISPATCH["⚙️ Tool Dispatcher<br/>call_tool()"]
        REFRESH["🔁 Refresh KB<br/>/api/refresh-kb"]
    end

    subgraph MODEL["③ AI MODEL — Qwen 2.5-1.5B CPU"]
        QWEN["🧠 Qwen 2.5-1.5B Instruct<br/>~3GB RAM · 2-5 tok/sec"]
        DECIDE["🤔 decide_tool_or_answer()<br/>tool_temperature=0.0"]
        STREAMER["⚡ TextIteratorStreamer<br/>token streaming"]
        TEMPLATE["📝 Chat Template<br/>apply_chat_template()"]
    end

    subgraph TOOLS["④ TOOLS LAYER"]
        subgraph WEATHER_T["🌤️ Weather Tool"]
            WT["get_weather(city)<br/>→ ApifyWeatherMCP<br/>→ new event loop"]
        end
        subgraph RAG_T["📚 Confluence RAG Tool"]
            RT["search_confluence(query)<br/>→ ChromaDB semantic search<br/>→ Top 5 chunks + URLs"]
        end
    end

    subgraph EXTERNAL["⑤ EXTERNAL — APIs & Storage"]
        APIFY["☁️ Apify MCP Server<br/>jiri-spilka/weather-mcp-server<br/>Streamable HTTP · JSON-RPC 2.0"]
        METEO["🌍 Open-Meteo API<br/>open-meteo.com<br/>Free · No key needed"]
        CHROMA["🗄️ ChromaDB Local<br/>./chroma_db/<br/>1433 chunks · cosine similarity"]
        EMBED["🔢 Sentence Transformers<br/>all-MiniLM-L6-v2<br/>80MB · 384 dimensions"]
    end

    UI -->|user message| FETCH
    FETCH -->|HTTP POST| ROUTE
    ROUTE --> STREAM
    STREAM -->|SSE tokens| SSE
    STATUS -.->|health check| ROUTE

    ROUTE --> DISPATCH
    DISPATCH --> DECIDE
    DECIDE --> TEMPLATE
    TEMPLATE --> QWEN
    QWEN --> STREAMER
    STREAMER -->|stream tokens| STREAM

    DECIDE -->|tool call JSON| DISPATCH
    DISPATCH -->|get_weather| WT
    DISPATCH -->|search_confluence| RT
    REFRESH -.->|re-index| RT

    WT -->|async MCP call| APIFY
    APIFY -->|forwards| METEO
    METEO -->|weather data| APIFY
    APIFY -->|SSE response| WT

    RT -->|embed query| EMBED
    EMBED -->|query vector| CHROMA
    CHROMA -->|top 5 chunks + URLs| RT

2. Tool Calling Flow

sequenceDiagram
    actor User
    participant UI as Chat UI
    participant Flask as Flask Backend
    participant Model as Qwen 2.5-1.5B
    participant Tool as Tool Layer
    participant Ext as External API/DB

    User->>UI: "How do I connect Confluence with Slack?"
    UI->>Flask: POST /api/chat {message, stream:true}

    Flask->>Model: build_chat_prompt(messages + tool defs)
    Model-->>Flask: {"tool_call": {"name": "search_confluence", "arguments": {"query": "..."}}}

    Flask-->>UI: SSE: "Let me search the knowledge base..."
    Note over Flask,UI: Stream acknowledgment token by token

    Flask->>Tool: call_tool("search_confluence", {query})
    Tool->>Ext: embed(query) → ChromaDB.query(top_k=5)
    Ext-->>Tool: [{text, url, title, score}, ...]
    Tool-->>Flask: formatted context + source URLs

    Flask->>Model: inject tool result into context
    Model-->>Flask: final answer tokens (streaming)
    Flask-->>UI: SSE: answer with source links
    UI-->>User: Rendered response + links

3. RAG Ingestion Pipeline

flowchart LR
    A["🔗 Base URL<br/>confluence/resources"] 
    --> B["🕷️ confluence_crawler.py<br/>Recursive BFS crawl<br/>same-domain · path-scoped"]
    --> C["📄 Raw HTML Pages<br/>title + text + url"]
    --> D["✂️ Chunker<br/>500 chars · 100 overlap"]
    --> E["🔢 SentenceTransformer<br/>all-MiniLM-L6-v2<br/>→ 384-dim vectors"]
    --> F["🗄️ ChromaDB<br/>cosine similarity index<br/>1433 chunks stored"]

    G["💾 crawl_results.json<br/>cache"] -.->|skip re-crawl| D
    H["ingest.py --reset"] -.->|wipe + reindex| F

4. Confluence RAG Query Flow

flowchart TD
    Q["User Query"] --> E1["Embed query<br/>all-MiniLM-L6-v2"]
    E1 --> VS["ChromaDB<br/>cosine similarity search"]
    VS --> R["Top 5 chunks<br/>with scores + URLs"]
    R --> CTX["format_context()<br/>chunk text + source links"]
    CTX --> INJ["Inject into<br/>model context"]
    INJ --> ANS["Qwen generates<br/>answer with citations"]

5. Flow Initialization

flowchart LR
    REQ["Incoming Request"]
    --> CK1{"model_loaded?"}

    CK1 -->|No| LM["load_model_once()<br/>Qwen 2.5-1.5B"]
    CK1 -->|Yes| USE["Use cached model"]
    LM --> USE

    USE --> CK2{"tool needed?"}
    CK2 -->|weather| CK3{"weather_mcp?"}
    CK2 -->|confluence| CK4{"confluence_rag?"}

    CK3 -->|No| LW["get_weather_mcp()<br/>ApifyWeatherMCP()"]
    CK3 -->|Yes| UW["Use cached MCP"]
    LW --> UW

    CK4 -->|No| LR["get_confluence_rag()<br/>ConfluenceRAG()"]
    CK4 -->|Yes| UR["Use cached RAG"]
    LR --> UR

📊 Performance

Component	Spec
Model	Qwen 2.5-1.5B (~3GB RAM)
Device	CPU (Mac Intel)
Inference speed	2–5 tokens/sec
Weather tool latency	~2–3 seconds
RAG search latency	<100ms
Total response time	10–30 seconds

🧪 API Reference

`POST /api/chat`

{
  "message": "How do I integrate Confluence with Slack?",
  "stream": true
}

Streaming response (stream: true): text/event-stream

data: {"token": "Let"}
data: {"token": " me"}
data: {"token": " search..."}
data: {"done": true}

Non-streaming response (stream: false):

{
  "response": "To integrate Confluence with Slack...",
  "model": "Qwen 2.5-1.5B"
}

`GET /api/status`

{
  "model_loaded": true,
  "model_name": "Qwen/Qwen2.5-1.5B-Instruct",
  "device": "CPU (Mac Intel)",
  "streaming_supported": true,
  "confluence_chunks": 1433
}

`POST /api/refresh-kb`

Triggers a fresh crawl and re-index of the knowledge base.

🔜 Future Enhancements

Multi-turn conversation memory
More MCP servers (news, calendar, stocks)
Cloud deployment (Render, Fly.io)
Better UI with chat history
Support for PDF / file uploads
Scheduled KB refresh (cron)

📦 Dependencies

flask — Web framework
torch + transformers — Qwen model inference
sentence-transformers — Text embeddings
chromadb — Local vector database
httpx + beautifulsoup4 — Web crawling
peft — Model adapter support

🎬 See It in Action

📄 License

MIT License — feel free to use and modify.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Images		Images
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
apify_weather_mcp.py		apify_weather_mcp.py
app.py		app.py
app_backup_1.py		app_backup_1.py
confluence_crawler.py		confluence_crawler.py
confluence_rag.py		confluence_rag.py
confluence_tool.py		confluence_tool.py
download_model.py		download_model.py
inference.py		inference.py
ingest.py		ingest.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Chat Assistant with Weather + Confluence RAG

✨ Features

🗂️ Project Structure

🚀 Quick Start

1. Clone the repo

2. Create virtual environment

3. Install dependencies

4. Set environment variables

5. Build the knowledge base (one time)

6. Run the app

⚙️ Environment Variables

🔧 Configuration

📚 Knowledge Base Management

Refresh via API endpoint

🔐 Using a Private Confluence Instance

🏗️ Architecture

Architecture (Full)

1. System Architecture (Full)

2. Tool Calling Flow

3. RAG Ingestion Pipeline

4. Confluence RAG Query Flow

5. Flow Initialization

📊 Performance

🧪 API Reference

POST /api/chat

GET /api/status

POST /api/refresh-kb

🔜 Future Enhancements

📦 Dependencies

🎬 See It in Action

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/chat`

`GET /api/status`

`POST /api/refresh-kb`

Packages