Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ All under `/v1/`:
| `GET /search/explore?q=&platform=&page=` | User-triggered deep GitHub search, paginated, ingests into index. Also reads `X-GitHub-Token`. Cold-path latency is 10–30s — clients must use a 30s timeout. |
| `GET /categories/{trending\|new-releases\|most-popular}/{android\|windows\|macos\|linux}` | Pre-ranked repo lists. Sort order is `search_score DESC NULLS LAST, rank ASC` — static `rank` is only the tie-breaker once behavioral signals exist. |
| `GET /topics/{privacy\|media\|productivity\|networking\|dev-tools}/{platform}` | Topic-bucketed repos. Same dynamic ordering as categories. |
| `GET /repo/{owner}/{name}` | Single repo detail. Curated DB hit on the fast path; on miss, lazy-fetches metadata from GitHub via `GitHubResourceClient` and reads optional `X-GitHub-Token`. Response includes `openIssuesCount` (mirrors GitHub's `open_issues_count`, which counts open issues + open PRs together — same value the GitHub website's Issues tab shows). |
| `GET /repo/{owner}/{name}` | Single repo detail. Curated DB hit on the fast path; on miss, lazy-fetches metadata from GitHub via `GitHubResourceClient` and reads optional `X-GitHub-Token`. Response includes `openIssuesCount` (mirrors GitHub's `open_issues_count`, which counts open issues + open PRs together — same value the GitHub website's Issues tab shows) and `licenseSpdxId` / `licenseName` (GitHub-detected license; null when no LICENSE file or unrecognised). |
| `POST /repo/{owner}/{name}/refresh` | User-triggered refetch of a repo's metadata + latest release. Re-fetches from GitHub via `RepoRefreshCoordinator`, upserts Postgres + pushes Meili, returns the same shape as the GET. Per-repo cooldown 30s + global hourly budget 1000 prevent pool-token torch from spam clicks. Reads `X-GitHub-Token`. Response is `Cache-Control: no-store`; the GET path's CDN cache catches up via its own TTL (~5 min on `s-maxage=300`). |
| `GET /releases/{owner}/{name}?page=&per_page=` | Proxied list of GitHub releases. Reads optional `X-GitHub-Token`. Cached server-side for 1h. |
| `GET /readme/{owner}/{name}` | Proxied README JSON (base64-encoded content + metadata, GitHub's shape). Reads optional `X-GitHub-Token`. Cached 24h. |
Expand All @@ -62,6 +62,7 @@ All under `/v1/`:
| `POST /auth/device/start` | Stateless proxy for `github.com/login/device/code`. Client used to call GitHub directly; some user networks (documented in OpenHub-Store/GitHub-Store#433, #395) can't reach GitHub reliably. Backend adds `client_id`, forwards GitHub's body verbatim. 10 req/hr/IP. |
| `POST /auth/device/poll` | Stateless proxy for `github.com/login/oauth/access_token`. Reads `device_code` from form body, adds `client_id` + `grant_type`, forwards GitHub's body verbatim (including tokens on success). The backend never logs, caches, or persists the token. 200 req/hr/IP. |
| `GET /internal/metrics` | Operator-only. Gated by `X-Admin-Token` matching the `ADMIN_TOKEN` env var (open if unset, for local dev). Returns per-source search counters, P-latency, worker queue depth, and top 20 misses (8-char `query_hash` prefix only) in last 7 days. |
| `POST /internal/backfill-stale?limit=N` | Operator-only. Spawns a paced background job that refreshes every curated row whose new metadata columns are still at their migration defaults (currently keyed on `license_spdx_id IS NULL`). One concurrent run; returns 409 on re-trigger. Uses `searchClient.refreshRepo` + persist; respects the quiet window so the daily fetcher's pool stays free. Run after a column-add deploy; no-ops afterwards once the filter no longer matches. |
| `GET /badge/...` | M3-styled SVG badges. Per-repo: `/badge/{owner}/{name}/{kind}/{style}/{variant}` for kind ∈ {release, stars, downloads}. Global: `/badge/{kind}/{style}/{variant}` for kind ∈ {users, fdroid}. Static: `/badge/static/{style}/{variant}?label=&icon=`. Style 1-12 hue, variant 1-3 shade. Vectorized glyph rendering — no font dependency at SVG embed time. |

Client-facing API contract and migration history live in `internal/` (gitignored, operator-only). The client repo at `OpenHub-Store/GitHub-Store` is the public source of truth for client behavior.
Expand Down
132 changes: 132 additions & 0 deletions docs/client/license-info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Client Integration — `licenseSpdxId` / `licenseName`

**Audience:** client coding agent (KMP / Compose Multiplatform).
**Goal:** surface a repo's license on the details screen using the new `licenseSpdxId` + `licenseName` fields on `RepoResponse`. No new endpoint, no extra fetch — the values ride on the existing GET response.

---

## 1. What changed

`RepoResponse` now carries two new fields:

```kotlin
val licenseSpdxId: String? = null, // e.g. "MIT", "GPL-3.0", "Apache-2.0"
val licenseName: String? = null, // e.g. "MIT License", "GNU General Public License v3.0"
```

Both nullable — not every repo has a license. Old clients that don't know the fields parse cleanly via `ignoreUnknownKeys = true`. Same back-compat story as every other additive `RepoResponse` field.

---

## 2. What the values mean

GitHub's `license` object on `/repos/{owner}/{name}`. Backend only persists two fields out of GitHub's full payload:

- `licenseSpdxId` — the SPDX short tag. Stable, machine-readable, suitable for icon mapping or filter chips. Example: `"MIT"`, `"GPL-3.0"`, `"Apache-2.0"`, `"BSD-3-Clause"`, `"AGPL-3.0"`, `"MPL-2.0"`, `"Unlicense"`.
- `licenseName` — the full human name. Use for tooltips, accessibility labels, "About" sections. Example: `"MIT License"`, `"GNU General Public License v3.0"`.

### When both are null

GitHub returns `license: null` if:
- The repo has no `LICENSE` / `LICENSE.txt` / `LICENSE.md` file at the root.
- GitHub's classifier couldn't recognise the file's content (rare; usually means a custom or modified license).
- The repo is private + you don't have access (not a concern here — backend always uses authenticated calls).

Show "No license" or hide the chip entirely when both are null. **Do NOT** assume "unlicensed" means "free to use" — most popular OSS without a `LICENSE` file is still under default copyright. Do NOT ship UI that implies otherwise.

### When one is set but the other is null

Should not happen — backend writes both columns from the same GitHub object atomically. If you see it, it's a row written before V15 deployed and not yet refreshed. Treat as if both are null until refreshed.

---

## 3. Where the fields appear

Every `RepoResponse`-shaped payload, identical surface to `openIssuesCount`:

| Endpoint | Behaviour |
|----------|-----------|
| `GET /v1/repo/{owner}/{name}` | DB-hit and lazy-fetch paths both fill it. |
| `POST /v1/repo/{owner}/{name}/refresh` | Fresh from GitHub. |
| `GET /v1/categories/.../...` | DB value. |
| `GET /v1/topics/.../...` | DB value. |
| `GET /v1/search?q=...` | Meilisearch index value. |

Existing curated rows have `null` license fields until refreshed — backend writes them on:
1. Search-passthrough ingest
2. Refresh button
3. Hourly worker
4. Daily Python fetcher (after fetcher repo is updated)

---

## 4. Display recommendations

- **Where:** details screen, in the "facts" row alongside language, stars, forks, open issues. Or in an info panel.
- **Chip text:** show `licenseSpdxId` ("MIT", "GPL-3.0"). Short, scannable.
- **Tooltip / long-press:** show `licenseName` ("MIT License").
- **Tap behaviour:** open `https://github.com/{owner}/{name}/blob/HEAD/LICENSE` in a browser. Almost every licensed repo has a top-level `LICENSE` file. If GitHub redirects (because it's actually `LICENSE.md` or `COPYING`), browsers handle it.
- **Icon:** generic license / scale glyph. Some clients map specific licenses to specific icons (MIT = open lock, GPL = copyleft symbol). Optional polish — `licenseSpdxId` is the key.
- **Color:** don't color-code by permissive vs copyleft vs proprietary. That implies a value judgement and tends to be controversial. Neutral chip styling.
- **Null handling:** hide the chip cleanly. Don't render "Unknown license" — that's misleading.

---

## 5. Pseudo-code

```kotlin
@Composable
fun LicenseChip(repo: RepoResponse) {
val spdx = repo.licenseSpdxId ?: return // hide when absent
Chip(
leadingIcon = { Icon(Icons.License, contentDescription = null) },
label = { Text(spdx) },
modifier = Modifier.semantics {
// Use the full name for accessibility narration.
contentDescription = repo.licenseName ?: "Licensed under $spdx"
},
onClick = { openInBrowser("https://github.com/${repo.fullName}/blob/HEAD/LICENSE") },
)
}
```

---

## 6. Filter / search use cases (out of scope for this PR but FYI)

`licenseSpdxId` is now indexed in Meilisearch via the `license_spdx_id` field on the search document. If you want to add "filter by license" to the search screen later, it's already there — call `/v1/search` with a Meilisearch filter expression. Not implementing that here; just noting the data is available.

Common useful filter sets:
- "Permissive only": `MIT`, `Apache-2.0`, `BSD-2-Clause`, `BSD-3-Clause`, `MPL-2.0`, `Unlicense`, `0BSD`, `ISC`.
- "Copyleft only": `GPL-2.0`, `GPL-3.0`, `AGPL-3.0`, `LGPL-2.1`, `LGPL-3.0`.
- "Permissive or copyleft (anything but proprietary)": null exclusion + filter list.

Comment on lines +95 to +103
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Section 6 overstates how ready license filtering is.

"call //v1/search with a Meilisearch filter expression" implies the backend already exposes that capability. It doesn't — SearchRoutes.kt's /search handler only accepts q, platform, sort, limit, and offset; there is no path to pass arbitrary Meilisearch filter strings through to the index.

Additionally, storing license_spdx_id in documents doesn't automatically make it filterable in Meilisearch — it also needs to be added to the index's filterableAttributes settings before filter expressions work.

Suggest rewording to prevent a future developer from wiring a broken client filter without the necessary backend changes:

📝 Suggested rewording
-`licenseSpdxId` is now indexed in Meilisearch via the `license_spdx_id` field on the search document. If you want to add "filter by license" to the search screen later, it's already there — call `/v1/search` with a Meilisearch filter expression. Not implementing that here; just noting the data is available.
+`licenseSpdxId` is stored in every Meilisearch document via the `license_spdx_id` field, so the raw data is available for future filtering. To actually enable "filter by license" you would need two backend changes first:
+1. Add `license_spdx_id` to the index's `filterableAttributes` in Meilisearch settings.
+2. Expose a `license` (or similar) query parameter in `GET /v1/search` wired into `MeilisearchClient.search()`.
+Neither is in scope here; just noting the data is in the index.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## 6. Filter / search use cases (out of scope for this PR but FYI)
`licenseSpdxId` is now indexed in Meilisearch via the `license_spdx_id` field on the search document. If you want to add "filter by license" to the search screen later, it's already there — call `/v1/search` with a Meilisearch filter expression. Not implementing that here; just noting the data is available.
Common useful filter sets:
- "Permissive only": `MIT`, `Apache-2.0`, `BSD-2-Clause`, `BSD-3-Clause`, `MPL-2.0`, `Unlicense`, `0BSD`, `ISC`.
- "Copyleft only": `GPL-2.0`, `GPL-3.0`, `AGPL-3.0`, `LGPL-2.1`, `LGPL-3.0`.
- "Permissive or copyleft (anything but proprietary)": null exclusion + filter list.
## 6. Filter / search use cases (out of scope for this PR but FYI)
`licenseSpdxId` is stored in every Meilisearch document via the `license_spdx_id` field, so the raw data is available for future filtering. To actually enable "filter by license" you would need two backend changes first:
1. Add `license_spdx_id` to the index's `filterableAttributes` in Meilisearch settings.
2. Expose a `license` (or similar) query parameter in `GET /v1/search` wired into `MeilisearchClient.search()`.
Neither is in scope here; just noting the data is in the index.
Common useful filter sets:
- "Permissive only": `MIT`, `Apache-2.0`, `BSD-2-Clause`, `BSD-3-Clause`, `MPL-2.0`, `Unlicense`, `0BSD`, `ISC`.
- "Copyleft only": `GPL-2.0`, `GPL-3.0`, `AGPL-3.0`, `LGPL-2.1`, `LGPL-3.0`.
- "Permissive or copyleft (anything but proprietary)": null exclusion + filter list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/client/license-info.md` around lines 95 - 103, The docs overstate that
license filtering is available: update the text in docs/client/license-info.md
to remove the implication that calling /v1/search will accept arbitrary
Meilisearch filter expressions and instead note that the backend currently only
accepts q, platform, sort, limit, and offset (SearchRoutes.kt's /search handler)
and that enabling filters requires two backend changes — (1) extend the /search
handler (SearchRoutes.kt) to accept a filter parameter and safely passthrough or
translate it to the Meilisearch client, and (2) ensure the search index settings
include license_spdx_id in filterableAttributes so filter expressions work;
reword the section to mention these prerequisites and that license_spdx_id is
indexed but not yet filterable until those changes are made.

---

## 7. What you do NOT need to do

- **No separate license fetch.** Don't call `/repos/{o}/{n}/license` against GitHub or any equivalent backend route — the value is on the repo response.
- **No license-text rendering.** We don't ship the full LICENSE text in the response (it can be hundreds of lines + GitHub already does this beautifully on their site). Tap the chip to open GitHub.
- **No license validation client-side.** Don't try to verify the SPDX ID against a list — backend trusts whatever GitHub returns. New SPDX tags appear over time; whitelisting client-side would create silent breakage.

---

## 8. Acceptance criteria

- [ ] `RepoResponse` deserializes with `licenseSpdxId` + `licenseName` on every call site.
- [ ] Details screen renders a license chip when `licenseSpdxId != null`, hides cleanly otherwise.
- [ ] Chip tap opens the LICENSE file on GitHub in an external browser.
- [ ] Tooltip / accessibility label uses `licenseName` when available.
- [ ] No crash when the field is absent (older server response during rollout).

---

## 9. Authoritative reference

Backend definitions:
- `model/RepoResponse.kt` — `licenseSpdxId` + `licenseName` fields.
- `db/migration/V15__license_info.sql` — the columns.
- `ingest/GitHubSearchClient.kt` — `GitHubLicense` DTO + ingest writes.
- `routes/RepoRoutes.kt`, `routes/SearchRoutes.kt`, `db/RepoRepository.kt` — mappers.

If client and server disagree, backend wins; file an issue on the backend repo.
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ object DatabaseFactory {
// only for V13 to drop it seconds later.
"V13__drop_telemetry_events.sql",
"V14__open_issues_count.sql",
"V15__license_info.sql",
)
for (migration in migrations) {
val rawSql = this::class.java.classLoader
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ data class MeiliRepoHit(
val stars: Int = 0,
val forks: Int = 0,
val open_issues: Int = 0,
val license_spdx_id: String? = null,
val license_name: String? = null,
val language: String? = null,
val latest_release_date: String? = null,
val latest_release_tag: String? = null,
Expand Down
2 changes: 2 additions & 0 deletions src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ class RepoRepository {
stargazersCount = this[Repos.stars],
forksCount = this[Repos.forks],
openIssuesCount = this[Repos.openIssues],
licenseSpdxId = this[Repos.licenseSpdxId],
licenseName = this[Repos.licenseName],
language = this[Repos.language],
topics = this[Repos.topics],
releasesUrl = "${this[Repos.htmlUrl]}/releases",
Expand Down
2 changes: 2 additions & 0 deletions src/main/kotlin/zed/rainxch/githubstore/db/Tables.kt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ object Repos : Table("repos") {
val stars = integer("stars").default(0)
val forks = integer("forks").default(0)
val openIssues = integer("open_issues").default(0)
val licenseSpdxId = text("license_spdx_id").nullable()
val licenseName = text("license_name").nullable()
val language = text("language").nullable()
val topics = array<String>("topics", TextColumnType())
val latestReleaseDate = timestampWithTimeZone("latest_release_date").nullable()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,8 @@ class GitHubSearchClient(
it[stars] = repo.stargazersCount
it[forks] = repo.forksCount
it[openIssues] = repo.openIssuesCount
it[licenseSpdxId] = repo.license?.spdxId
it[licenseName] = repo.license?.name
it[language] = repo.language
it[topics] = repo.topics
it[latestReleaseDate] = releaseDate
Expand Down Expand Up @@ -557,6 +559,8 @@ class GitHubSearchClient(
stars = r.repo.stargazersCount,
forks = r.repo.forksCount,
open_issues = r.repo.openIssuesCount,
license_spdx_id = r.repo.license?.spdxId,
license_name = r.repo.license?.name,
language = r.repo.language,
topics = r.repo.topics,
latest_release_date = r.release.publishedAt,
Expand Down Expand Up @@ -604,6 +608,8 @@ class GitHubSearchClient(
stargazersCount = repo.stargazersCount,
forksCount = repo.forksCount,
openIssuesCount = repo.openIssuesCount,
licenseSpdxId = repo.license?.spdxId,
licenseName = repo.license?.name,
language = repo.language,
topics = repo.topics,
releasesUrl = "${repo.htmlUrl}/releases",
Expand Down Expand Up @@ -660,6 +666,9 @@ data class GitHubRepo(
// Includes open PRs (GitHub treats PRs as issues). Same number GitHub
// website's Issues tab shows.
@SerialName("open_issues_count") val openIssuesCount: Int = 0,
// GitHub-detected license. Null on unlicensed repos or when GitHub's
// classifier didn't recognise the LICENSE file.
val license: GitHubLicense? = null,
val language: String? = null,
val topics: List<String> = emptyList(),
val archived: Boolean = false,
Expand Down Expand Up @@ -689,3 +698,11 @@ data class GitHubAsset(
val size: Long = 0,
@SerialName("download_count") val downloadCount: Long = 0,
)

// GitHub's license object on /repos/{o}/{n}. We persist `spdx_id` + `name`
// only; the upstream `key`, `url`, and `node_id` aren't surfaced.
@Serializable
data class GitHubLicense(
@SerialName("spdx_id") val spdxId: String? = null,
val name: String? = null,
)
6 changes: 6 additions & 0 deletions src/main/kotlin/zed/rainxch/githubstore/model/RepoResponse.kt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ data class RepoResponse(
// open PRs (GitHub treats PRs as a kind of issue). Same value as the
// GitHub website's Issues tab badge.
val openIssuesCount: Int = 0,
// GitHub-detected license. Null when the repo has no LICENSE file or
// when GitHub couldn't classify it. spdxId is the short tag for chip
// display ("MIT", "GPL-3.0", "Apache-2.0"); name is the human-readable
// version ("MIT License").
val licenseSpdxId: String? = null,
val licenseName: String? = null,
val language: String?,
val topics: List<String>,
val releasesUrl: String?,
Expand Down
Loading
Loading