A local AI chat assistant built with Qwen 2.5-1.5B, featuring real-time weather via Apify MCP and semantic search over any website/Confluence using RAG (ChromaDB).
- π¬ AI Chat β Qwen 2.5-1.5B running fully on CPU (Mac Intel)
- π€οΈ Weather Tool β Live weather via Apify MCP (Model Context Protocol)
- π Confluence RAG β Semantic search over any website or Confluence instance
- π‘ Streaming β Token-by-token streaming responses
- π§ Pattern 2 Tool Calling β Model decides when to use tools (no hardcoded rules)
βββ app.py # Main Flask app (entry point)
βββ apify_weather_mcp.py # Apify Weather MCP client
βββ confluence_crawler.py # Recursive web crawler
βββ confluence_rag.py # Chunking, embedding, ChromaDB search
βββ confluence_tool.py # search_confluence() tool definition
βββ ingest.py # One-time crawl + index script
βββ templates/
β βββ index.html # Chat UI
βββ static/ # CSS, JS assets
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variable template
βββ .gitignore
βββ README.md
git clone https://github.com/YOUR_USERNAME/ai-chat-assistant.git
cd ai-chat-assistantpython -m venv venv
source venv/bin/activate # Mac/Linux
# venv\Scripts\activate # Windowspip install -r requirements.txtcp .env.example .env
# Edit .env and add your APIFY_TOKEN# Crawl and index the default URL
python ingest.py
# Or use your own URL
python ingest.py --url https://your-confluence.atlassian.net/wiki/spaces/SPACE
# Limit pages for a quick test
python ingest.py --max-pages 20python app.pyOpen http://localhost:8000 in your browser.
Create a .env file (copy from .env.example):
APIFY_TOKEN=<get your token>Get your Apify token at: https://console.apify.com/account/integrations
All settings are in app.py under CONFIG:
| Key | Default | Description |
|---|---|---|
base_model |
Qwen/Qwen2.5-1.5B-Instruct |
HuggingFace model |
max_length |
200 |
Max tokens in response |
temperature |
0.7 |
Response creativity |
tool_temperature |
0.0 |
Tool decision (deterministic) |
max_tool_rounds |
3 |
Max tool call iterations |
RAG settings are in confluence_rag.py:
| Key | Default | Description |
|---|---|---|
CHUNK_SIZE |
500 |
Characters per chunk |
CHUNK_OVERLAP |
100 |
Overlap between chunks |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Sentence transformer model |
# Initial crawl + index
python ingest.py --url https://your-site.com/docs
# Re-index without re-crawling (uses cached crawl_results.json)
python ingest.py --from-file crawl_results.json
# Wipe and re-index from scratch
python ingest.py --reset
# Check how many chunks are indexed
python -c "from confluence_rag import ConfluenceRAG; r = ConfluenceRAG(); print(r.stats())"curl -X POST http://localhost:8000/api/refresh-kbIf you have a real Atlassian Confluence Cloud instance, update confluence_crawler.py:
import base64
EMAIL = "your@email.com"
API_TOKEN = "your_confluence_api_token"
credentials = base64.b64encode(f"{EMAIL}:{API_TOKEN}".encode()).decode()
headers = {
"Authorization": f"Basic {credentials}",
"User-Agent": "RAG-Crawler/1.0"
}Get your Confluence API token: https://id.atlassian.com/manage-profile/security/api-tokens
User Query
β
Flask Backend (app.py)
β
Qwen 2.5-1.5B decides:
βββ Weather query? β Apify MCP β Open-Meteo API
βββ Docs/knowledge? β ChromaDB semantic search β Top 5 chunks
βββ Direct answer? β Stream response
β
Final answer streamed token by token
graph TB
subgraph FRONTEND["β FRONTEND β HTML/CSS/JS"]
UI["π¬ Chat UI<br/>index.html"]
FETCH["π‘ Fetch API<br/>POST /api/chat"]
SSE["π SSE Stream Reader<br/>token-by-token"]
STATUS["π Status Check<br/>/api/status"]
end
subgraph FLASK["β‘ FLASK BACKEND β app.py"]
ROUTE["π Route Handler<br/>/api/chat"]
STREAM["π€ Streaming Response<br/>text/event-stream"]
DISPATCH["βοΈ Tool Dispatcher<br/>call_tool()"]
REFRESH["π Refresh KB<br/>/api/refresh-kb"]
end
subgraph MODEL["β’ AI MODEL β Qwen 2.5-1.5B CPU"]
QWEN["π§ Qwen 2.5-1.5B Instruct<br/>~3GB RAM Β· 2-5 tok/sec"]
DECIDE["π€ decide_tool_or_answer()<br/>tool_temperature=0.0"]
STREAMER["β‘ TextIteratorStreamer<br/>token streaming"]
TEMPLATE["π Chat Template<br/>apply_chat_template()"]
end
subgraph TOOLS["β£ TOOLS LAYER"]
subgraph WEATHER_T["π€οΈ Weather Tool"]
WT["get_weather(city)<br/>β ApifyWeatherMCP<br/>β new event loop"]
end
subgraph RAG_T["π Confluence RAG Tool"]
RT["search_confluence(query)<br/>β ChromaDB semantic search<br/>β Top 5 chunks + URLs"]
end
end
subgraph EXTERNAL["β€ EXTERNAL β APIs & Storage"]
APIFY["βοΈ Apify MCP Server<br/>jiri-spilka/weather-mcp-server<br/>Streamable HTTP Β· JSON-RPC 2.0"]
METEO["π Open-Meteo API<br/>open-meteo.com<br/>Free Β· No key needed"]
CHROMA["ποΈ ChromaDB Local<br/>./chroma_db/<br/>1433 chunks Β· cosine similarity"]
EMBED["π’ Sentence Transformers<br/>all-MiniLM-L6-v2<br/>80MB Β· 384 dimensions"]
end
UI -->|user message| FETCH
FETCH -->|HTTP POST| ROUTE
ROUTE --> STREAM
STREAM -->|SSE tokens| SSE
STATUS -.->|health check| ROUTE
ROUTE --> DISPATCH
DISPATCH --> DECIDE
DECIDE --> TEMPLATE
TEMPLATE --> QWEN
QWEN --> STREAMER
STREAMER -->|stream tokens| STREAM
DECIDE -->|tool call JSON| DISPATCH
DISPATCH -->|get_weather| WT
DISPATCH -->|search_confluence| RT
REFRESH -.->|re-index| RT
WT -->|async MCP call| APIFY
APIFY -->|forwards| METEO
METEO -->|weather data| APIFY
APIFY -->|SSE response| WT
RT -->|embed query| EMBED
EMBED -->|query vector| CHROMA
CHROMA -->|top 5 chunks + URLs| RT
sequenceDiagram
actor User
participant UI as Chat UI
participant Flask as Flask Backend
participant Model as Qwen 2.5-1.5B
participant Tool as Tool Layer
participant Ext as External API/DB
User->>UI: "How do I connect Confluence with Slack?"
UI->>Flask: POST /api/chat {message, stream:true}
Flask->>Model: build_chat_prompt(messages + tool defs)
Model-->>Flask: {"tool_call": {"name": "search_confluence", "arguments": {"query": "..."}}}
Flask-->>UI: SSE: "Let me search the knowledge base..."
Note over Flask,UI: Stream acknowledgment token by token
Flask->>Tool: call_tool("search_confluence", {query})
Tool->>Ext: embed(query) β ChromaDB.query(top_k=5)
Ext-->>Tool: [{text, url, title, score}, ...]
Tool-->>Flask: formatted context + source URLs
Flask->>Model: inject tool result into context
Model-->>Flask: final answer tokens (streaming)
Flask-->>UI: SSE: answer with source links
UI-->>User: Rendered response + links
flowchart LR
A["π Base URL<br/>confluence/resources"]
--> B["π·οΈ confluence_crawler.py<br/>Recursive BFS crawl<br/>same-domain Β· path-scoped"]
--> C["π Raw HTML Pages<br/>title + text + url"]
--> D["βοΈ Chunker<br/>500 chars Β· 100 overlap"]
--> E["π’ SentenceTransformer<br/>all-MiniLM-L6-v2<br/>β 384-dim vectors"]
--> F["ποΈ ChromaDB<br/>cosine similarity index<br/>1433 chunks stored"]
G["πΎ crawl_results.json<br/>cache"] -.->|skip re-crawl| D
H["ingest.py --reset"] -.->|wipe + reindex| F
flowchart TD
Q["User Query"] --> E1["Embed query<br/>all-MiniLM-L6-v2"]
E1 --> VS["ChromaDB<br/>cosine similarity search"]
VS --> R["Top 5 chunks<br/>with scores + URLs"]
R --> CTX["format_context()<br/>chunk text + source links"]
CTX --> INJ["Inject into<br/>model context"]
INJ --> ANS["Qwen generates<br/>answer with citations"]
flowchart LR
REQ["Incoming Request"]
--> CK1{"model_loaded?"}
CK1 -->|No| LM["load_model_once()<br/>Qwen 2.5-1.5B"]
CK1 -->|Yes| USE["Use cached model"]
LM --> USE
USE --> CK2{"tool needed?"}
CK2 -->|weather| CK3{"weather_mcp?"}
CK2 -->|confluence| CK4{"confluence_rag?"}
CK3 -->|No| LW["get_weather_mcp()<br/>ApifyWeatherMCP()"]
CK3 -->|Yes| UW["Use cached MCP"]
LW --> UW
CK4 -->|No| LR["get_confluence_rag()<br/>ConfluenceRAG()"]
CK4 -->|Yes| UR["Use cached RAG"]
LR --> UR
| Component | Spec |
|---|---|
| Model | Qwen 2.5-1.5B (~3GB RAM) |
| Device | CPU (Mac Intel) |
| Inference speed | 2β5 tokens/sec |
| Weather tool latency | ~2β3 seconds |
| RAG search latency | <100ms |
| Total response time | 10β30 seconds |
{
"message": "How do I integrate Confluence with Slack?",
"stream": true
}Streaming response (stream: true): text/event-stream
data: {"token": "Let"}
data: {"token": " me"}
data: {"token": " search..."}
data: {"done": true}
Non-streaming response (stream: false):
{
"response": "To integrate Confluence with Slack...",
"model": "Qwen 2.5-1.5B"
}{
"model_loaded": true,
"model_name": "Qwen/Qwen2.5-1.5B-Instruct",
"device": "CPU (Mac Intel)",
"streaming_supported": true,
"confluence_chunks": 1433
}Triggers a fresh crawl and re-index of the knowledge base.
- Multi-turn conversation memory
- More MCP servers (news, calendar, stocks)
- Cloud deployment (Render, Fly.io)
- Better UI with chat history
- Support for PDF / file uploads
- Scheduled KB refresh (cron)
flaskβ Web frameworktorch+transformersβ Qwen model inferencesentence-transformersβ Text embeddingschromadbβ Local vector databasehttpx+beautifulsoup4β Web crawlingpeftβ Model adapter support
MIT License β feel free to use and modify.


