Self-hosted job-board crawler that ingests offers from six French/EU sources, enriches them with an LLM, scores them against a personal preference profile, and serves the result through a typed GraphQL API to a React SPA.
Built to replace the noisy "scroll five job sites every morning" routine with a single dashboard that ranks offers by how well they match my actual stack and remote-work preferences.
┌─────────────────────────────────────────────────────────────────┐
│ Sourcing pipeline │
│ │
│ Discovery ──► Fetch ──► Analyze ──► Enrich ──► Score │
│ (Playwright) (HTTP/JS) (parse HTML) (LLM) (profile) │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
JobOffer row (Postgres + jsonb step metadata)
Each stage is an idempotent Sidekiq job that writes a steps_details.<step>
JSONB key with { at, version } so re-runs are safe and version-aware. The
Score step is profile-driven (see data/scoring_profile.json)
and re-runnable independently of the rest of the pipeline.
| Layer | Tech |
|---|---|
| Backend | Rails 8.1 (API-only) + GraphQL-Ruby 2.3 + Sidekiq 8 |
| Database | PostgreSQL 16 |
| LLM | RubyLLM (provider-agnostic; OpenAI by default) |
| Scraping | Playwright (Ruby client) for JS-rendered providers |
| Storage | ActiveStorage on RustFS (S3-compatible, self-hosted) |
| Frontend | React 19 + Apollo Client 4 + React Router 7 |
| Build | Vite 8 + vite-ruby + Tailwind v4 + Biome |
| Auth/UI | shadcn-ui + Radix UI + Lucide |
| Testing | RSpec, FactoryBot, SimpleCov, Vitest, Storybook + Playwright |
| Deploy | Kamal + Thruster on Docker |
| CI | GitHub Actions (Brakeman, bundler-audit, RuboCop, RSpec, tsc, Vitest, Storybook) |
| Dev tools | mise (Ruby + Node versions), lefthook (pre-commit), foreman |
Seven providers behind a uniform four-step contract
(Sourcing::DiscoveryStep / FetchStep / AnalyzeStep / EnrichStep):
| Provider | Notes |
|---|---|
| APEC | French executive job board |
| Cadremploi | French job board, session-based crawling |
| France Travail | French public employment service |
| Hellowork | General French job board |
| Indeed | Aggregator, Cloudflare-protected, optional session |
| Public guest endpoints over plain HTTP (no auth) | |
| Welcome to the Jungle | Tech-leaning French board |
Adding a new provider = drop four files under app/services/sourcing/providers/<name>/
and register the key in Sourcing::Providers.
See app/services/sourcing/README.md for pipeline internals and the technology canonicalization system.
The Sourcing::ScoringJob reads data/scoring_profile.json
and writes a 0-100 score plus a score_breakdown JSONB explaining each
component. The profile is editable through a GraphQL mutation
(updateScoringProfile), so the React UI can tune scoring without redeploy.
Current scoring axes:
- Technology: primary (strong weight) and secondary (light weight) matches; malus when an offer requires a primary tech outside the profile.
- Remote/hybrid: weighted preference ranking (
remote/hybrid/on_site); for hybrid offers, allowed cities and minimum remote-days-per-week.
Prerequisites:
- mise (pins Ruby 4.0 + Node 22 — see mise.toml)
- Docker + Docker Compose (for Postgres, Redis, RustFS)
- An OpenAI-compatible API key (
OPENAI_API_KEYorLLM_API_KEY)
# infra services
docker compose up -d postgres redis rustfs
# Ruby + Node deps
bundle install
bundle exec lefthook install -f
npm ci
# database
bin/rails db:create db:migrate
# everything: Rails + Sidekiq + Vite + GraphQL codegen watchers
bin/devOpen http://localhost:3000.
Procfile.dev starts:
web— Rails server (port 3000)sidekiq— background pipelinevite— Vite dev servergql-schema— watches Ruby GraphQL files → regeneratestmp/schema.graphqlgql-types— watches frontend TS/TSX → regeneratesapp/frontend/graphql/generated.ts
Copy .env.example to .env and fill in API keys.
bundle exec rspec # backend (RSpec + FactoryBot)
COVERAGE=true bundle exec rspec # with SimpleCov branch coverage report
npm run test:unit # frontend (Vitest, jsdom)
npm run test:storybook # frontend (Vitest browser + Playwright)
npx tsc --noEmit # type checkCI runs all of the above in parallel jobs plus Brakeman, bundler-audit, and RuboCop. Coverage report is uploaded as a 14-day-retention artifact.
app/
├── channels/ # ActionCable (live sourcing status)
├── controllers/ # GraphQL endpoint + SPA shell
├── frontend/ # React app (Vite-mounted)
│ ├── app.tsx # router
│ ├── components/{ui,layout}/ # shared components + shadcn primitives
│ ├── features/{offers,profile,sourcing}/ # feature-scoped UI
│ ├── graphql/ # codegen output + queries
│ └── pages/ # route entry points
├── graphql/
│ ├── mutations/ # launchDiscovery, recomputeOfferScores, updateScoringProfile
│ ├── subscriptions/ # sourcingStatus
│ └── types/queries/ # jobOffers, jobOffer, dashboardMetrics, providers, scoringProfile, technologies
├── jobs/sourcing/ # ActiveJob: Discovery / Fetch / Analyze / Enrich / Scoring / LaunchDiscovery
├── models/job_offer.rb # single domain model (dry-schema validated jsonb)
├── services/sourcing/ # pipeline contract + 6 providers
└── subscribers/sourcing/ # ActiveSupport::Notifications hooks (offer_discovered, offer_fetched, …)
db/migrate/ # schema migrations (9)
spec/ # RSpec, FactoryBot, Vitest stories share the storybook runner
PolyForm Noncommercial 1.0.0 — source-available for noncommercial use (research, learning, personal projects, public-interest organizations). Commercial use requires permission.