Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Keep the build context small and the image lean.
#
# Two things matter:
# 1. Anything COPYed (or implicitly included by a wildcard) gets uploaded
# to the docker daemon — make sure huge dev-only trees never go.
# 2. Anything that *isn't* COPYed still affects layer-cache invalidation
# via `COPY src/ ./src/` and friends. List paths even if no stage
# uses them so the daemon doesn't ship them.

# Source-control + IDE
.git
.gitignore
.gitattributes
.github
.vscode
.idea
.cursor
.claude
.continue

# Python build/runtime junk
__pycache__/
*.pyc
*.pyo
*.pyd
*.egg-info/
.venv
venv
env
.python-version
.pytest_cache
.ruff_cache
.mypy_cache
.tox
htmlcov
.coverage
.coverage.*
*.cover

# Node — the frontend stage installs its own deps from package-lock.
frontend/node_modules
frontend/dist
**/.npm
**/.yarn
**/yarn-error.log
**/.parcel-cache

# Tests & local fixtures — runtime image doesn't need them.
tests/
*.db
*.sqlite
*.sqlite3

# Captures / screenshots / video produced by capture_*.py scripts.
# These live at the repo root and are megabytes of binary noise.
capture_*.py
assets/
video/
docs/
*.png
*.jpg
*.jpeg
*.gif
*.mp4

# Local secrets — env_file mounts .env into the container at runtime;
# never bake it into the image.
.env
.env.*
!.env.example

# OS + editor cruft
.DS_Store
Thumbs.db
*.swp
*~

# Docker itself
Dockerfile
docker-compose.yml
.dockerignore

# Big planning docs that don't ship to prod.
*.md
!README.md
!frontend/package*.json
50 changes: 34 additions & 16 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,41 +127,59 @@ When optimizing report generation or other LLM-heavy workflows:
- **Nginx security headers**: `Strict-Transport-Security` + `X-Frame-Options: DENY` configured in `aidcmo.conf`.
- **Port allocation**: Do not assume production app port is `8080`. `8080` is occupied by `sub2api` on `newyork`; OpenCMO uses `8081`.
- **BWG role**: `BWG` is no longer the primary OpenCMO host. Treat it as a lightweight box, temporary reverse proxy, or fallback node unless explicitly re-promoted.
- **Browser-backed scans**: SEO/context fallback paths use `crawl4ai`/Playwright. Fresh servers need browser binaries installed, or scans will fail with `BrowserType.launch` executable errors.
- **Browser-backed scans**: SEO/context fallback paths use `crawl4ai`/Playwright. The Docker image installs the Chromium binary during build (`playwright install --with-deps chromium`), so containers work out of the box. For the legacy systemd path, fresh servers need `playwright install chromium` manually or scans fail with `BrowserType.launch` executable errors.
- **Deployment**: Production runs in Docker. The host runs `docker compose up -d` from `/opt/OpenCMO`; the legacy systemd unit (`opencmo.service`) is disabled but left in place for rollback. See `docs/DOCKER.md` for the full migration story.
- **Local testing**: Don't `npm run dev` / `opencmo-web` locally for verification. The user's laptop is not the deployment target and the local environment doesn't mirror production. Verify on `newyork` via SSH + curl. Local `npm run build` and `pytest` are fine (no runtime dependency on prod topology).

## Deployment (newyork — aidcmo.com)

### Deploy frontend assets to New York
### Default path: Docker

```bash
cd frontend && npm run build # Build locally (avoid server-side frontend builds)
rsync -avz --delete frontend/dist/ root@192.3.16.77:/opt/OpenCMO/frontend/dist/
./deploy/docker-deploy.sh # rsync + docker compose build + up -d + health probe
./deploy/docker-deploy.sh --no-build # config-only change (skip image rebuild)
./deploy/docker-deploy.sh --logs # tail container logs after deploy
```

### Deploy backend code to New York
Manual equivalent:

```bash
# Sync source (excludes data/, .env, frontend/dist — those live on the server)
rsync -avz --delete \
--exclude '.git' \
--exclude 'frontend/node_modules' \
--exclude 'frontend/dist' \
--exclude '.venv' \
--exclude '.git' --exclude '.venv' \
--exclude 'frontend/node_modules' --exclude 'frontend/dist' \
--exclude 'data/' --exclude '.env' --exclude '.env.*' \
./ root@192.3.16.77:/opt/OpenCMO/
ssh newyork "cd /opt/OpenCMO && source .venv/bin/activate && pip install -e . -q && systemctl restart opencmo"

ssh newyork "cd /opt/OpenCMO && docker compose up -d --build"
```

### Health + log checks

```bash
ssh newyork "cd /opt/OpenCMO && docker compose ps" # container + (healthy) status
ssh newyork "cd /opt/OpenCMO && docker compose logs -f --tail=100 opencmo"
ssh newyork "ss -ltnp | grep -E ':80|:443|:8081'" # nginx + container ports
curl -sL -o /dev/null -w '%{http_code}\n' https://www.aidcmo.com/app/
```

### New York service / runtime checks
### Rollback to systemd (emergency only)

```bash
ssh newyork "systemctl status opencmo --no-pager"
ssh newyork "journalctl -u opencmo -n 200 --no-pager"
ssh newyork "ss -ltnp | grep -E ':80|:443|:8081'"
ssh newyork "cd /opt/OpenCMO && docker compose down && systemctl enable --now opencmo"
# Nginx upstream is unchanged (127.0.0.1:8081) so traffic resumes instantly.
# DB: container reads /opt/OpenCMO/data/data.db; systemd reads ~/.opencmo/data.db.
# If the schema migrated forward under Docker, `cp /opt/OpenCMO/data/data.db ~/.opencmo/data.db` before starting systemd.
```

### Install Playwright browsers (when scan workers need them)
### Legacy frontend-only deploy (still valid for tiny UI tweaks)

The Docker image bakes the frontend in at build time. To ship UI changes without a full image rebuild you can still rsync the bundle into the container's build context and rebuild:

```bash
ssh newyork "cd /opt/OpenCMO && .venv/bin/playwright install chromium"
cd frontend && npm run build
rsync -avz --delete frontend/dist/ root@192.3.16.77:/opt/OpenCMO/frontend/dist/
ssh newyork "cd /opt/OpenCMO && docker compose up -d --build"
```

### BWG (optional fallback / proxy only)
Expand Down
98 changes: 83 additions & 15 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,26 +1,94 @@
# Stage 1: Build frontend
FROM node:20-slim AS frontend-builder
# syntax=docker/dockerfile:1.7

# ────────────────────────────────────────────────────────────────────────────
# Stage 1 — build the frontend bundle (Node 20, alpine for size)
# ────────────────────────────────────────────────────────────────────────────
FROM node:20-alpine AS frontend-builder
WORKDIR /app/frontend

# Cache dependency layer separately from source. Use `npm ci` (lockfile-aware,
# reproducible) when the lockfile exists; fall back to `npm install` if it
# doesn't so the repo stays buildable without a commit-locked lockfile.
COPY frontend/package*.json ./
RUN npm ci --silent
RUN if [ -f package-lock.json ]; then npm ci --silent; else npm install --silent; fi

COPY frontend/ ./
RUN npm run build

# Stage 2: Python runtime
FROM python:3.11-slim
WORKDIR /app
# ────────────────────────────────────────────────────────────────────────────
# Stage 2 — Python runtime + Playwright Chromium baked in
# ────────────────────────────────────────────────────────────────────────────
# Use a slim base. Playwright's Python package downloads its own browser
# binaries; we install OS-level deps separately and then download Chromium.
FROM python:3.12-slim AS runtime

# System packages: certificates for outbound TLS, curl for HEALTHCHECK,
# tini for proper PID-1 signal forwarding, and the libraries Chromium
# needs at runtime (added by `playwright install-deps`).
ENV DEBIAN_FRONTEND=noninteractive \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
RUN apt-get update && apt-get install -y --no-install-recommends \
curl ca-certificates \
ca-certificates \
curl \
tini \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install Python deps in a cache-friendly order: project metadata first
# (so requirement changes don't bust the source-copy layer), then source.
COPY pyproject.toml README.md ./
COPY src/ ./src/
RUN pip install --no-cache-dir -e ".[all]"
# Install system deps for Chromium (used by crawl4ai), then set up crawl4ai
RUN playwright install-deps chromium \
&& crawl4ai-setup || true

# Install the app with EVERY runtime extra plus the `browser` extra so that
# playwright is present for crawl4ai's scan fallback paths. The previous
# Dockerfile installed `[all]` which excludes `browser` — scans then crashed
# with "BrowserType.launch executable doesn't exist".
RUN pip install --no-cache-dir -e ".[all,browser]"

# Install Chromium + its OS-level deps in one go. `--with-deps` runs apt
# under the hood to add the missing shared libs (libnss3, libatk-1.0-0, …).
# This pulls ~280 MB but means every scan path works on a fresh container.
RUN playwright install --with-deps chromium \
&& rm -rf /var/lib/apt/lists/*

# Run crawl4ai's post-install (downloads its own models/templates). It
# previously had `|| true` which masked real failures — let it fail loudly.
RUN crawl4ai-setup

# Pull the pre-built frontend bundle from stage 1 into the location the
# FastAPI app serves static files from.
COPY --from=frontend-builder /app/frontend/dist ./frontend/dist

# Non-root user. Playwright stores its browser cache under HOME so we point
# it at the user's home and give the user ownership of /app and /data.
RUN useradd --create-home --shell /bin/bash --uid 1000 opencmo \
&& mkdir -p /data \
&& chown -R opencmo:opencmo /app /data \
&& cp -r /root/.cache/ms-playwright /home/opencmo/.cache/ms-playwright 2>/dev/null || true \
&& chown -R opencmo:opencmo /home/opencmo/.cache 2>/dev/null || true
USER opencmo

# Persistent state lives under /data. The default DB path matches.
VOLUME ["/data"]
ENV OPENCMO_DB_PATH=/data/data.db
ENV OPENCMO_WEB_HOST=0.0.0.0
EXPOSE 8080
CMD ["opencmo-web"]
ENV OPENCMO_DB_PATH=/data/data.db \
OPENCMO_WEB_HOST=0.0.0.0 \
OPENCMO_WEB_PORT=8081 \
PLAYWRIGHT_BROWSERS_PATH=/home/opencmo/.cache/ms-playwright

EXPOSE 8081

# Healthcheck — same endpoint nginx-on-host probes when it hands off traffic.
HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
CMD curl -fsS "http://127.0.0.1:${OPENCMO_WEB_PORT}/api/v1/health" || exit 1

# tini handles SIGTERM properly so `docker compose down` is fast and clean.
ENTRYPOINT ["/usr/bin/tini", "--"]

# Run uvicorn directly so the port is configurable via env. The `opencmo-web`
# console script hard-codes port=8080 in its signature, which is why we
# bypass it here.
CMD ["sh", "-c", "uvicorn opencmo.web.app:app --host ${OPENCMO_WEB_HOST} --port ${OPENCMO_WEB_PORT}"]
97 changes: 97 additions & 0 deletions deploy/docker-deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#!/usr/bin/env bash
# deploy/docker-deploy.sh — push the current working tree to newyork and
# (re)build the Docker container in place.
#
# Usage:
# ./deploy/docker-deploy.sh # build + restart
# ./deploy/docker-deploy.sh --no-build # restart only (config-only change)
# ./deploy/docker-deploy.sh --logs # tail logs after deploy
#
# Pre-requisites on the server (one-time):
# - Docker + Docker Compose plugin installed
# - /opt/OpenCMO/.env populated with provider keys (copy from the legacy
# systemd setup if migrating)
# - /opt/OpenCMO/data/ directory exists (or symlink to a backup location)
#
# Pre-requisites locally:
# - `ssh newyork ...` works (key auth, the alias resolves to root@192.3.16.77)

set -euo pipefail

HOST="${OPENCMO_DEPLOY_HOST:-newyork}"
REMOTE_DIR="${OPENCMO_DEPLOY_DIR:-/opt/OpenCMO}"
DO_BUILD=1
TAIL_LOGS=0

for arg in "$@"; do
case "$arg" in
--no-build) DO_BUILD=0 ;;
--logs) TAIL_LOGS=1 ;;
-h|--help)
sed -n '2,18p' "$0" | sed 's/^# \{0,1\}//'
exit 0
;;
*) echo "Unknown flag: $arg" >&2; exit 2 ;;
esac
done

cd "$(dirname "$0")/.."
ROOT="$(pwd)"
echo "→ syncing $ROOT → $HOST:$REMOTE_DIR"

# Exclude everything the container builds for itself, plus host-only state
# (venv, node_modules, the host's ./data dir which we never want to overwrite),
# plus the on-server secrets (`.env`) which are gitignored locally and would
# otherwise be wiped by `--delete`.
rsync -avz --delete \
--exclude '.git/' \
--exclude '.venv/' \
--exclude 'frontend/node_modules/' \
--exclude 'frontend/dist/' \
--exclude 'data/' \
--exclude '.env' \
--exclude '.env.*' \
--exclude '__pycache__/' \
--exclude '*.pyc' \
--exclude '.pytest_cache/' \
--exclude '.ruff_cache/' \
--exclude '.mypy_cache/' \
--exclude '.DS_Store' \
--exclude '.claude/' \
./ "$HOST:$REMOTE_DIR/"

REMOTE_CMD=""
if [ "$DO_BUILD" -eq 1 ]; then
echo "→ building image on $HOST"
REMOTE_CMD="cd $REMOTE_DIR && docker compose build && docker compose up -d"
else
echo "→ restarting container on $HOST (no rebuild)"
REMOTE_CMD="cd $REMOTE_DIR && docker compose up -d"
fi

# `docker compose up -d` is idempotent: it recreates the container only if
# config/image changed. So a `--no-build` invocation after a config-only
# change is the fastest path back to a clean container.
ssh "$HOST" "$REMOTE_CMD"

echo "→ verifying health"
# Give uvicorn a beat to bind + serve. Retry rather than sleep-and-pray.
for i in 1 2 3 4 5 6 7 8 9 10; do
if ssh "$HOST" "curl -fsS http://127.0.0.1:8081/api/v1/health >/dev/null"; then
echo "✓ healthy after ${i} attempt(s)"
break
fi
if [ "$i" -eq 10 ]; then
echo "✗ never became healthy — dumping last 60 log lines:" >&2
ssh "$HOST" "cd $REMOTE_DIR && docker compose logs --tail=60 opencmo" >&2 || true
exit 1
fi
sleep 2
done

echo "→ done. Public probe:"
curl -sS -o /dev/null -w " https://www.aidcmo.com/app/ → HTTP %{http_code}\n" -L https://www.aidcmo.com/app/

if [ "$TAIL_LOGS" -eq 1 ]; then
ssh "$HOST" "cd $REMOTE_DIR && docker compose logs -f --tail=50 opencmo"
fi
Loading
Loading