Skip to content

Persist metadata for NoUsableRelease repos in backfill#11

Merged
rainxchzed merged 1 commit intomainfrom
backfill-metadata-only
May 4, 2026
Merged

Persist metadata for NoUsableRelease repos in backfill#11
rainxchzed merged 1 commit intomainfrom
backfill-metadata-only

Conversation

@rainxchzed
Copy link
Copy Markdown
Member

Summary

Real bug exposed by running the first /v1/internal/backfill-stale on prod: most curated rows return `RefreshResult.NoUsableRelease` from `searchClient.refreshRepo` (libraries, dotfiles, repos that release via tags-only, etc. -- anything without a non-draft / non-prerelease release on GitHub). The previous backfill loop only persisted on `Ok`, so those repos:

  1. Had their metadata fetched from GitHub (pool-token cost incurred).
  2. Had their `open_issues` + `license_*` columns left at the migration default (0 / NULL).
  3. Re-appeared in every subsequent backfill SELECT (since the filter is `license_spdx_id IS NULL`).
  4. Cost was paid forever, value was never written.

After 5 minutes of the initial run, only +3 rows had populated -- consistent with the no-release-detected ratio.

Fix

On the `NoUsableRelease` branch in `runBackfill`, do a metadata-only UPDATE on the existing row. The new helper `upsertMetadataOnly(GitHubRepo)` writes:

  • `stars`, `forks` (drift-prone)
  • `open_issues` (the V14 column)
  • `license_spdx_id`, `license_name` (the V15 columns)
  • `description` (also drift-prone)
  • `indexed_at` (so future runs don't re-process the same row)

Release-related columns are NOT touched -- they're either correct from the last successful Ok-path refresh, or they're at schema defaults because the repo has never had a release. Both cases are correct.

The final log line now distinguishes `ok` (full Ok-path persist) from `metadata-only` (NoUsableRelease but metadata written) so an operator can see the ratio.

Test plan

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

Warning

Rate limit exceeded

@rainxchzed has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 48 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72a44735-c34c-4127-ba66-0640126c59f5

📥 Commits

Reviewing files that changed from the base of the PR and between 69a8043 and 0b51d6a.

📒 Files selected for processing (1)
  • src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backfill-metadata-only

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 19 minutes and 48 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@rainxchzed rainxchzed merged commit 7791188 into main May 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant