Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# AGENTS.md

## Cursor Cloud specific instructions

### Overview

This is a Python Streamlit application for video understanding and chat. It extracts frames from uploaded videos, captions them using vision models (BLIP / Qwen2-VL), and allows users to chat about video content using local LLMs (Flan-T5) or Google Gemini API.

### Running the app

```bash
source .venv/bin/activate
export HF_HOME=/workspace/.cache/huggingface
streamlit run app.py --server.port 8501 --server.headless true --browser.gatherUsageStats false
```

The app serves at `http://localhost:8501`.

### Key caveats

- **HF_HOME must be set before running.** The `.env` file and `app.py` hardcode a macOS SSD path (`/Volumes/PortableSSD/...`) that does not exist on Linux. Always `export HF_HOME=/workspace/.cache/huggingface` before launching, or the app will crash trying to `os.makedirs` on a nonexistent mount.
- **First run downloads ~1GB+ of ML models** (BLIP vision model, Flan-T5 chat model) from Hugging Face Hub. Subsequent runs use the cached models.
- **Gemini chat model requires `GEMINI_API_KEY`** environment variable. Use `google/flan-t5-base` (local) for testing without an API key.
- **No linter or test suite is configured** in this repository. There are no `pytest`, `flake8`, `mypy`, or similar configurations.
- Standard setup/run commands are documented in `README.md`.