Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .agents/skills/sourcing-provider-creation/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ If using Playwright crawling, assess authentication needs:
2. Infer missing fields from the stripped description text.
3. Keep strict output schema and defaults for unknown values.
4. Fail loudly with actionable details when LLM keys are missing (`OPENAI_API_KEY` or `LLM_API_KEY`) or quota is hit.
5. Include `#{canonical_technologies_prompt}` in the user prompt — the base `EnrichStep` provides this method. It injects the canonical technology list from `TechnologyStore` (Redis) when available, guiding the LLM toward consistent tech names. It returns an empty string before the first `DedupTechnologiesJob` run, so it is always safe to interpolate.

## Output Format (for review with user)

Expand Down Expand Up @@ -192,7 +193,8 @@ Before discovery implementation, always include:
13. **Provider-specific README created** and linked from root README:
- Document extraction strategy (JSON-LD first, then selectors, fallback heuristics).
- Document known limitations and provider-specific behavior.
- Include rebuild instructions and test command examples.
- Include provider-specific verification commands (targeted specs and manual probes).
- Do NOT repeat content already owned by `app/services/sourcing/README.md`: analyze output field list, LLM credentials, Playwright fr-FR locale default, and "run all sourcing specs" command.
14. Provider-specific specs exist for integration behavior.
15. Fetch was validated against a real page state, not just fixtures.
16. Failure modes for auth walls, challenge pages, quota errors, and invalid HTML were checked explicitly.
Expand Down Expand Up @@ -355,6 +357,7 @@ end

## Related Documentation

- See `app/services/sourcing/README.md` for step contracts (analyze output fields, LLM credentials, Playwright defaults, technology canonicalization system).
- See `app/services/sourcing/providers/cadremploi/README.md` for a complete example of discovery/fetch/analyze/enrich with session management and JSON-LD extraction.
- See `app/services/sourcing/providers/linkedin/README.md` for resilient selector patterns and performance optimization examples.
- See `AGENTS.md` for provider-specific guardrails (e.g., LinkedIn crawling brittleness, need to update provider docs on behavior changes).
2 changes: 2 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,5 @@ group :test do
gem "shoulda-matchers"
gem "simplecov", require: false
end

gem "sidekiq-scheduler", "~> 6.0"
58 changes: 37 additions & 21 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -156,13 +156,15 @@ GEM
ed25519 (1.4.0)
erb (6.0.4)
erubi (1.13.1)
et-orbi (1.4.0)
tzinfo
event_stream_parser (1.0.0)
factory_bot (6.6.0)
activesupport (>= 6.1.0)
factory_bot_rails (6.5.1)
factory_bot (~> 6.5)
railties (>= 6.1.0)
faraday (2.14.1)
faraday (2.14.2)
faraday-net_http (>= 2.0, < 3.5)
json
logger
Expand All @@ -171,13 +173,16 @@ GEM
http-cookie (>= 1.0.0)
faraday-multipart (1.2.0)
multipart-post (~> 2.0)
faraday-net_http (3.4.2)
faraday-net_http (3.4.3)
net-http (~> 0.5)
faraday-retry (2.4.0)
faraday (~> 2.0)
fiber-storage (1.0.1)
foreman (0.90.0)
thor (~> 1.4)
fugit (1.12.2)
et-orbi (~> 1.4)
raabro (~> 1.4)
globalid (1.3.0)
activesupport (>= 6.1)
graphql (2.6.1)
Expand All @@ -195,7 +200,7 @@ GEM
rdoc (>= 4.0.0)
reline (>= 0.4.2)
jmespath (1.6.2)
json (2.19.4)
json (2.19.7)
json-schema (6.2.0)
addressable (~> 2.8)
bigdecimal (>= 3.1, < 5)
Expand Down Expand Up @@ -223,15 +228,15 @@ GEM
net-imap
net-pop
net-smtp
marcel (1.1.0)
marcel (1.2.1)
mcp (0.14.0)
json-schema (>= 4.1)
mime-types (3.7.0)
logger
mime-types-data (~> 3.2025, >= 3.2025.0507)
mime-types-data (3.2026.0414)
mini_mime (1.1.5)
minitest (6.0.5)
minitest (6.0.6)
drb (~> 2.0)
prism (~> 1.5)
msgpack (1.8.0)
Expand Down Expand Up @@ -293,9 +298,10 @@ GEM
public_suffix (7.0.5)
puma (8.0.1)
nio4r (~> 2.0)
raabro (1.4.0)
racc (1.8.1)
rack (3.2.6)
rack-proxy (0.7.7)
rack-proxy (0.8.2)
rack
rack-session (2.1.2)
base64 (>= 0.1.0)
Expand Down Expand Up @@ -406,7 +412,7 @@ GEM
rubocop-performance (>= 1.24)
rubocop-rails (>= 2.30)
ruby-progressbar (1.13.0)
ruby_llm (1.14.1)
ruby_llm (1.15.0)
base64
event_stream_parser (~> 1)
faraday (>= 1.10.0)
Expand All @@ -416,7 +422,9 @@ GEM
marcel (~> 1)
ruby_llm-schema (~> 0)
zeitwerk (~> 2)
ruby_llm-schema (0.3.0)
ruby_llm-schema (0.4.0)
rufus-scheduler (3.9.2)
fugit (~> 1.1, >= 1.11.1)
securerandom (0.4.1)
shoulda-matchers (7.0.1)
activesupport (>= 7.1)
Expand All @@ -426,6 +434,9 @@ GEM
logger (>= 1.7.0)
rack (>= 3.2.0)
redis-client (>= 0.26.0)
sidekiq-scheduler (6.0.2)
rufus-scheduler (~> 3.2)
sidekiq (>= 7.3, < 9)
sidekiq-throttled (2.1.0)
concurrent-ruby (>= 1.2.0)
redis-prescription (~> 2.2)
Expand Down Expand Up @@ -467,7 +478,7 @@ GEM
unicode-emoji (4.2.0)
uri (1.1.1)
useragent (0.16.11)
vite_rails (3.10.0)
vite_rails (3.11.0)
railties (>= 5.1, < 9)
vite_ruby (~> 3.0, >= 3.2.2)
vite_ruby (3.10.2)
Expand All @@ -480,7 +491,7 @@ GEM
base64
websocket-extensions (>= 0.1.0)
websocket-extensions (0.1.5)
zeitwerk (2.7.5)
zeitwerk (2.8.2)

PLATFORMS
aarch64-linux
Expand Down Expand Up @@ -519,6 +530,7 @@ DEPENDENCIES
ruby_llm
shoulda-matchers
sidekiq
sidekiq-scheduler (~> 6.0)
sidekiq-throttled
simplecov
solid_cable
Expand All @@ -527,7 +539,6 @@ DEPENDENCIES
tzinfo-data
vite_rails

CHECKSUMS
action_text-trix (2.1.18) sha256=3fdb83f8bff4145d098be283cdd47ac41caf5110bfa6df4695ed7127d7fb3642
actioncable (8.1.3) sha256=e5bc7f75e44e6a22de29c4f43176927c3a9ce4824464b74ed18d8226e75a80f0
actionmailbox (8.1.3) sha256=df7da474eaa0e70df4ed5a6fef66eb3b3b0f2dbf7f14518deee8d77f1b4aae59
Expand Down Expand Up @@ -577,24 +588,26 @@ CHECKSUMS
ed25519 (1.4.0) sha256=16e97f5198689a154247169f3453ef4cfd3f7a47481fde0ae33206cdfdcac506
erb (6.0.4) sha256=38e3803694be357fe2bfe312487c74beaf9fb4e5beb3e22498952fe1645b95d9
erubi (1.13.1) sha256=a082103b0885dbc5ecf1172fede897f9ebdb745a4b97a5e8dc63953db1ee4ad9
et-orbi (1.4.0) sha256=6c7e3c90779821f9e3b324c5e96fda9767f72995d6ae435b96678a4f3e2de8bc
event_stream_parser (1.0.0) sha256=a2683bab70126286f8184dc88f7968ffc4028f813161fb073ec90d171f7de3c8
factory_bot (6.6.0) sha256=1fc1b3b5620ec980a6a27aec1b6ec8c250ca82962e970e8a40f93e8d388d4b89
factory_bot_rails (6.5.1) sha256=d3cc4851eae4dea8a665ec4a4516895045e710554d2b5ac9e68b94d351bc6d68
faraday (2.14.1) sha256=a43cceedc1e39d188f4d2cdd360a8aaa6a11da0c407052e426ba8d3fb42ef61c
faraday (2.14.2) sha256=73ccb9994a9e8648f010e32eca2ae82e41c57860aa10932cda29418b9e0223ad
faraday-cookie_jar (0.0.8) sha256=0140605823f8cc63c7028fccee486aaed8e54835c360cffc1f7c8c07c4299dbb
faraday-multipart (1.2.0) sha256=7d89a949693714176f612323ca13746a2ded204031a6ba528adee788694ef757
faraday-net_http (3.4.2) sha256=f147758260d3526939bf57ecf911682f94926a3666502e24c69992765875906c
faraday-net_http (3.4.3) sha256=9db13becec9312f345a769eeeecf9049c9287d54c0ae053d7235228993a4eec1
faraday-retry (2.4.0) sha256=7b79c48fb7e56526faf247b12d94a680071ff40c9fda7cf1ec1549439ad11ebe
fiber-storage (1.0.1) sha256=f48e5b6d8b0be96dac486332b55cee82240057065dc761c1ea692b2e719240e1
foreman (0.90.0) sha256=ff675e2d47b607ac58714a6d4ac3e1ee8f06f41d8db084531c31961e2c3f117c
fugit (1.12.2) sha256=643f2bf28db263bd400cbf8e0dd8b76b2c9b94bdb130e12d2394de04d9c20e5e
globalid (1.3.0) sha256=05c639ad6eb4594522a0b07983022f04aa7254626ab69445a0e493aa3786ff11
graphql (2.6.1) sha256=e4b4290dda309fcfba2df38cacf4ee67e2194e06f09b9efbd7738f1df7e41967
http-cookie (1.1.6) sha256=ba4b82be64de61dc281243dac70e3c382c45142f20268ed9276a3670c93feaa9
i18n (1.14.8) sha256=285778639134865c5e0f6269e0b818256017e8cde89993fdfcbfb64d088824a5
io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc
irb (1.18.0) sha256=de9454a0703a54704b9811a5ef31a60c86949fbf4013fcf244fabc7c775248e3
jmespath (1.6.2) sha256=238d774a58723d6c090494c8879b5e9918c19485f7e840f2c1c7532cf84ebcb1
json (2.19.4) sha256=670a7d333fb3b18ca5b29cb255eb7bef099e40d88c02c80bd42a3f30fe5239ac
json (2.19.7) sha256=fe432c8639f6efff69f9d73b518a3705d9581ab93156f981ea72806e1e5bcc3e
json-schema (6.2.0) sha256=e8bff46ed845a22c1ab2bd0d7eccf831c01fe23bb3920caa4c74db4306813666
kamal (2.11.0) sha256=1408864425e0dec7e0a14d712a3b13f614e9f3a425b7661d3f9d287a51d7dd75
language_server-protocol (3.17.0.5) sha256=fd1e39a51a28bf3eec959379985a72e296e9f9acfce46f6a79d31ca8760803cc
Expand All @@ -603,12 +616,12 @@ CHECKSUMS
logger (1.7.0) sha256=196edec7cc44b66cfb40f9755ce11b392f21f7967696af15d274dde7edff0203
loofah (2.25.1) sha256=d436c73dbd0c1147b16c4a41db097942d217303e1f7728704b37e4df9f6d2e04
mail (2.9.0) sha256=6fa6673ecd71c60c2d996260f9ee3dd387d4673b8169b502134659ece6d34941
marcel (1.1.0) sha256=fdcfcfa33cc52e93c4308d40e4090a5d4ea279e160a7f6af988260fa970e0bee
marcel (1.2.1) sha256=1678e9360e32f9eafa917c80029e2f6d10b2715c66a4b87b6d0da9b9cd1f859f
mcp (0.14.0) sha256=9e3ca2e6b5e568739e8c07090982829896f2e4d884ffbb668d06f0fe758489e1
mime-types (3.7.0) sha256=dcebf61c246f08e15a4de34e386ebe8233791e868564a470c3fe77c00eed5e56
mime-types-data (3.2026.0414) sha256=461c4c655373a44bd6c5fe54bcf5b7776026ea96e808144b1ec465c4b99148cc
mini_mime (1.1.5) sha256=8681b7e2e4215f2a159f9400b5816d85e9d8c6c6b491e96a12797e798f8bccef
minitest (6.0.5) sha256=f007d7246bf4feea549502842cd7c6aba8851cdc9c90ba06de9c476c0d01155c
minitest (6.0.6) sha256=153ea36d1d987a62942382b61075745042a2b3123b1cd48f4c3675af9cc7d6f1
msgpack (1.8.0) sha256=e64ce0212000d016809f5048b48eb3a65ffb169db22238fb4b72472fecb2d732
multipart-post (2.4.1) sha256=9872d03a8e552020ca096adadbf5e3cb1cd1cdd6acd3c161136b8a5737cdb4a8
mutex_m (0.3.0) sha256=cfcb04ac16b69c4813777022fdceda24e9f798e48092a2b817eb4c0a782b0751
Expand Down Expand Up @@ -644,9 +657,10 @@ CHECKSUMS
psych (5.3.1) sha256=eb7a57cef10c9d70173ff74e739d843ac3b2c019a003de48447b2963d81b1974
public_suffix (7.0.5) sha256=1a8bb08f1bbea19228d3bed6e5ed908d1cb4f7c2726d18bd9cadf60bc676f623
puma (8.0.1) sha256=7b94e50c07655718c1fb8ae41a11fc06c7d61293208b3aa608ff71a46d3ad37c
raabro (1.4.0) sha256=d4fa9ff5172391edb92b242eed8be802d1934b1464061ae5e70d80962c5da882
racc (1.8.1) sha256=4a7f6929691dbec8b5209a0b373bc2614882b55fc5d2e447a21aaa691303d62f
rack (3.2.6) sha256=5ed78e1f73b2e25679bec7d45ee2d4483cc4146eb1be0264fc4d94cb5ef212c2
rack-proxy (0.7.7) sha256=446a4b57001022145d5c3ba73b775f66a2260eaf7420c6907483141900395c8a
rack-proxy (0.8.2) sha256=d3086e865cbe74e627d41d2c6cd24521206c68d9e0308576760ea06d68e11ab0
rack-session (2.1.2) sha256=595434f8c0c3473ae7d7ac56ecda6cc6dfd9d37c0b2b5255330aa1576967ffe8
rack-test (2.2.0) sha256=005a36692c306ac0b4a9350355ee080fd09ddef1148a5f8b2ac636c720f5c463
rackup (2.3.1) sha256=6c79c26753778e90983761d677a48937ee3192b3ffef6bc963c0950f94688868
Expand Down Expand Up @@ -675,11 +689,13 @@ CHECKSUMS
rubocop-rails (2.34.3) sha256=10d37989024865ecda8199f311f3faca990143fbac967de943f88aca11eb9ad2
rubocop-rails-omakase (1.1.0) sha256=2af73ac8ee5852de2919abbd2618af9c15c19b512c4cfc1f9a5d3b6ef009109d
ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
ruby_llm (1.14.1) sha256=7487d0f0bb9e86836d9233d656e10637370a6b22eb2555343c0a4c179ce7c500
ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
ruby_llm (1.15.0) sha256=ca207465bca1cca007010a79fce500d4012b1fbe188b025040cd37b884fb98af
ruby_llm-schema (0.4.0) sha256=e930f5a5316f9301bff3fb7fe572e44727d05bb8e50621001bbb49a47d63b8da
rufus-scheduler (3.9.2) sha256=55fa9e4db0ff69d7f38c804f17baba0c9bce5cba39984ae3c5cf6c039d1323b9
securerandom (0.4.1) sha256=cc5193d414a4341b6e225f0cb4446aceca8e50d5e1888743fac16987638ea0b1
shoulda-matchers (7.0.1) sha256=b4bfd8744c10e0a36c8ac1a687f921ee7e25ed529e50488d61b79a8688749c77
sidekiq (8.1.3) sha256=a42f51aca3705d21cb50f37f5ec07e69de8708e126be4cf94b45cf15b84b3762
sidekiq-scheduler (6.0.2) sha256=1065f1a989e750849de1d23d1f34338c104ed6ec5a6dc22a2620c5af54a2d6fc
sidekiq-throttled (2.1.0) sha256=430d4684b781b37f7f22b9b4a26dadf438b725718ecb821ae8bb8ea66f78fc4c
simplecov (0.22.0) sha256=fe2622c7834ff23b98066bb0a854284b2729a569ac659f82621fc22ef36213a5
simplecov-html (0.13.2) sha256=bd0b8e54e7c2d7685927e8d6286466359b6f16b18cb0df47b508e8d73c777246
Expand All @@ -700,11 +716,11 @@ CHECKSUMS
unicode-emoji (4.2.0) sha256=519e69150f75652e40bf736106cfbc8f0f73aa3fb6a65afe62fefa7f80b0f80f
uri (1.1.1) sha256=379fa58d27ffb1387eaada68c749d1426738bd0f654d812fcc07e7568f5c57c6
useragent (0.16.11) sha256=700e6413ad4bb954bb63547fa098dddf7b0ebe75b40cc6f93b8d54255b173844
vite_rails (3.10.0) sha256=8c4471554870c043e8d9ff7d379302a01d147ade2cd61476ddf020fed32a107a
vite_rails (3.11.0) sha256=6e89b11aa1dd7c3fbe6f54a5b9753fd6c4ea88aba4695dcd2092ed467cfea855
vite_ruby (3.10.2) sha256=db465451eb180abbdc81f596ba63671d2b2552e2bb7bf3c1f7b79f9d58ca9a86
websocket-driver (0.8.0) sha256=ed0dba4b943c22f17f9a734817e808bc84cdce6a7e22045f5315aa57676d4962
websocket-extensions (0.1.5) sha256=1c6ba63092cda343eb53fc657110c71c754c56484aad42578495227d717a8241
zeitwerk (2.7.5) sha256=d8da92128c09ea6ec62c949011b00ed4a20242b255293dd66bf41545398f73dd
zeitwerk (2.8.2) sha256=7212a61311083c604184b1ea2574b9aa05cd14f855a0841c06985cabe9181d12

BUNDLED WITH
4.0.3
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Seven providers behind a uniform four-step contract

Adding a new provider = drop four files under `app/services/sourcing/providers/<name>/`
and register the key in [Sourcing::Providers](app/services/sourcing/providers.rb).
See [app/services/sourcing/README.md](app/services/sourcing/README.md) for pipeline internals and the technology canonicalization system.

## Profile-driven scoring

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ import {
} from '@/features/sourcing/queries/documents'
import type {
LaunchDiscoveryMutation,
PrimaryTechnologiesQuery,
ProvidersQuery,
TechnologiesQuery,
} from '@/graphql/generated'
import { LaunchDiscoveryView } from './launch-discovery-view'

Expand All @@ -25,7 +25,7 @@ export function LaunchDiscovery({ onSuccess }: LaunchDiscoveryProps) {
const { data: providersData, loading: providersLoading } =
useQuery<ProvidersQuery>(PROVIDERS_QUERY)
const { data: technologiesData, loading: technologiesLoading } =
useQuery<TechnologiesQuery>(TECHNOLOGIES_QUERY)
useQuery<PrimaryTechnologiesQuery>(TECHNOLOGIES_QUERY)

const [launchDiscovery, { loading, error, data }] =
useMutation<LaunchDiscoveryMutation>(LAUNCH_DISCOVERY_MUTATION)
Expand Down
4 changes: 2 additions & 2 deletions app/frontend/features/sourcing/queries/documents.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ export const PROVIDERS_QUERY = gql`
`

export const TECHNOLOGIES_QUERY = gql`
query Technologies {
technologies
query PrimaryTechnologies {
technologies(includePrimary: true)
}
`

Expand Down
Loading
Loading