Real-time audio transcription, screen capture, and AI-assisted conversation analysis.
┌─────────────────────────────────────────────────────────────────┐
│ Go Platform (port 8000) │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
│ │ Audio Capture│ │ Screen Grab │ │ WebSocket/HTTP Server │ │
│ │ (malgo) │ │ (screenshot) │ │ (coder/websocket) │ │
│ └──────┬───────┘ └──────┬───────┘ └───────────┬───────────┘ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Orchestrator (channels, backpressure)│◄──────┘ │
│ └──────────────────┬───────────────────┘ │
└─────────────────────┼───────────────────────────────────────────┘
│ gRPC (port 50051)
▼
┌─────────────────────────────────────────────────────────────────┐
│ Python Inference Services │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Whisper STT │ │ OCR Service │ │ LLM Service (streaming) │ │
│ │ + Silero VAD│ │ (RapidOCR) │ │ (Gemini/Ollama) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Memory Service (ChromaDB) ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
good-listener/
├── proto/ # Protobuf definitions
│ └── cognition.proto
├── inference/ # Python ML services
│ ├── app/
│ │ ├── core/ # Logging, utilities
│ │ ├── services/ # Transcription, VAD, OCR, LLM, Memory
│ │ ├── pb/ # Generated protobuf code
│ │ └── grpc_server.py # gRPC server entry point
│ ├── tests/
│ └── requirements.txt
├── platform/ # Go orchestration layer
│ ├── cmd/server/ # Main entry point
│ ├── internal/
│ │ ├── audio/ # Audio capture with backpressure
│ │ ├── screen/ # Screen capture
│ │ ├── orchestrator/ # Service coordination
│ │ ├── server/ # HTTP/WebSocket handlers
│ │ ├── grpcclient/ # gRPC client to Python
│ │ └── config/ # Configuration
│ ├── pkg/pb/ # Generated protobuf code
│ └── go.mod
├── frontend/ # Electron + React UI
└── Makefile
- Go 1.22+
- Python 3.11+
- Node.js 18+
- protoc (Protocol Buffers compiler)
# Install all dependencies
make install
# Generate protobuf files (requires protoc)
make proto# Start all services (inference + platform + frontend)
make dev
# Or start individually:
make inference # Python gRPC server on :50051
make platform # Go server on :8000
make frontend # React dev serverCreate a .env file in the project root:
# LLM Configuration
GOOGLE_API_KEY=your-api-key
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.0-flash
# Platform Configuration
HTTP_ADDR=:8000
INFERENCE_ADDR=localhost:50051
SAMPLE_RATE=16000
VAD_THRESHOLD=0.5
CAPTURE_SYSTEM_AUDIO=true
AUTO_ANSWER_ENABLED=truemake test # All tests
make inference-test # Python tests only
make platform-test # Go tests onlyAfter modifying proto/cognition.proto:
make proto-
Go for orchestration: Native goroutines + channels provide:
- Proper backpressure (bounded channels)
- Graceful cancellation (context)
- Efficient concurrency without GIL
-
Python for ML inference: Keeps the ML ecosystem:
- PyTorch/faster-whisper for transcription
- LangChain for LLM abstraction
- ChromaDB for vector storage
-
gRPC for communication:
- Type-safe API contracts
- Streaming support for audio/LLM
- Language-agnostic
MIT