Skip to content

feat: add github/openssf scrapers#1875

Merged
moshloop merged 2 commits intomainfrom
security-scrapers
Feb 26, 2026
Merged

feat: add github/openssf scrapers#1875
moshloop merged 2 commits intomainfrom
security-scrapers

Conversation

@moshloop
Copy link
Copy Markdown
Member

@moshloop moshloop commented Jan 23, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added GitHub repository scraper with security alerts detection (Dependabot, Code Scanning, Secret Scanning).
    • Integrated OpenSSF Scorecard analysis for repository security assessment.
  • Improvements

    • Added rate-limit handling to gracefully manage API throttling with reset timing.
    • Implemented incremental scraping to optimize repeated repository scans.
  • Tests

    • Added comprehensive test coverage for GitHub scraper functionality.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 23, 2026

Warning

Rate limit exceeded

@moshloop has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 38 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8f8ab18 and 73cb168.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • Makefile
  • api/v1/github.go
  • api/v1/interface.go
  • api/v1/interface_rate_limit_test.go
  • api/v1/scrapeconfig_types.go
  • api/v1/types.go
  • cmd/run.go
  • fixtures/github.yaml
  • go.mod
  • scrapers/common.go
  • scrapers/cron.go
  • scrapers/github/client.go
  • scrapers/github/openssf.go
  • scrapers/github/scraper.go
  • scrapers/github/scraper_e2e_test.go
  • scrapers/github/scraper_test.go
  • scrapers/github/security.go
  • scrapers/run.go

Walkthrough

Adds two new security scrapers and supporting types: a GitHub Security scraper (alerts: Dependabot, Code Scanning, Secret Scanning) and an OpenSSF Scorecard scraper, plus API clients, fixtures, deepcopy generation, registry registration, and small context/AWS tweaks.

Changes

Cohort / File(s) Summary
API Types & Config
api/v1/github_security.go, api/v1/openssf.go, api/v1/types.go, api/v1/scrapeconfig_types.go
Introduce GitHubSecurity and OpenSSFScorecard types (with repository and filter subtypes), add fields to ScraperSpec, and update ScrapeConfig.Type() to detect new types.
DeepCopy Generation
api/v1/zz_generated.deepcopy.go
Generated DeepCopyInto/DeepCopy methods for new GitHubSecurity/OpenSSF types and extended ScraperSpec deep-copy handling.
GitHub Security Scraper & Client
scrapers/github/security.go, scrapers/github/security_client.go, scrapers/github/testdata/security/*.json
Add GithubSecurityScraper with incremental per-repo alert scraping, rate-limit handling, health calculation; add GitHubSecurityClient with methods to fetch Dependabot, Code Scanning, Secret Scanning alerts and rate-limit checks; include test fixtures.
OpenSSF Scorecard Scraper & Client
scrapers/openssf/scorecard.go, scrapers/openssf/client.go, scrapers/openssf/compliance.go, scrapers/openssf/testdata/*.json
Add OpenSSFScorecardScraper, ScorecardClient with retry/backoff and JSON parsing, compliance mappings for checks, caching/TTL logic, and scorecard test fixtures.
Scraper Registry & Common
scrapers/common.go
Register new scrapers: GithubSecurityScraper and OpenSSFScorecardScraper; add openssf import.
Fixtures
fixtures/github-security.yaml, fixtures/openssf.yaml
Add example ScrapeConfig fixtures for GitHub security and OpenSSF scorecard scans.
Context & Minor Changes
api/context.go, scrapers/aws/aws.go
Add ScrapeContext.IsDebug() method; adjust ECS log verbosity and Route53 AliasTarget handling.

Sequence Diagram(s)

sequenceDiagram
    participant ScrapeCtx as Scrape Context
    participant GHS as GitHubSecurityScraper
    participant GSC as GitHubSecurityClient
    participant GHApi as GitHub API
    participant Results as ScrapeResults

    ScrapeCtx->>GHS: Scrape(ctx)
    GHS->>GHS: CanScrape(spec)
    GHS->>GSC: NewGitHubSecurityClient(config)
    loop For each repository
        GHS->>GSC: fetchAllAlerts(ctx)
        GSC->>GHApi: GetDependabotAlerts()
        GHApi-->>GSC: alerts
        GSC->>GHApi: GetCodeScanningAlerts()
        GHApi-->>GSC: alerts
        GSC->>GHApi: GetSecretScanningAlerts()
        GHApi-->>GSC: alerts
        GSC->>GSC: compute max updated time
        GHS->>GHS: calculateHealthStatus(counts)
        GHS->>Results: append ScrapeResult
    end
    GHS-->>ScrapeCtx: results
Loading
sequenceDiagram
    participant ScrapeCtx as Scrape Context
    participant OSS as OpenSSFScorecardScraper
    participant SC as ScorecardClient
    participant OpenSSFApi as OpenSSF API
    participant Results as ScrapeResults

    ScrapeCtx->>OSS: Scrape(ctx)
    OSS->>OSS: CanScrape(spec)
    OSS->>SC: NewScorecardClient(ctx)
    loop For each repository
        OSS->>OSS: check cache & TTL
        OSS->>SC: GetRepositoryScorecard(owner, repo)
        loop Retry up to 3 times
            SC->>OpenSSFApi: HTTP GET scorecard
            OpenSSFApi-->>SC: response or error
            SC->>SC: isRetryable(err)?
        end
        SC-->>OSS: ScorecardResponse
        OSS->>OSS: filter by MinScore & build analyses
        OSS->>Results: append ScrapeResult
    end
    OSS-->>ScrapeCtx: results
Loading

Possibly related PRs

Suggested reviewers

  • adityathebe
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding GitHub and OpenSSF security scrapers. The changeset introduces new scraper types for GitHubSecurity and OpenSSFScorecard with supporting infrastructure across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch security-scrapers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/v1/types.go (1)

130-136: ApplyPlugin skips GitHubSecurity/OpenSSFScorecard.

New scraper specs won’t receive BaseScraper plugin settings, so plugin labels/transforms won’t apply.

🛠️ Proposed fix
 	for i := range spec.GithubActions {
 		spec.GithubActions[i].BaseScraper = spec.GithubActions[i].BaseScraper.ApplyPlugins(plugins...)
 	}
+
+	for i := range spec.GitHubSecurity {
+		spec.GitHubSecurity[i].BaseScraper = spec.GitHubSecurity[i].BaseScraper.ApplyPlugins(plugins...)
+	}
+
+	for i := range spec.OpenSSFScorecard {
+		spec.OpenSSFScorecard[i].BaseScraper = spec.OpenSSFScorecard[i].BaseScraper.ApplyPlugins(plugins...)
+	}
 
 	for i := range spec.Azure {
 		spec.Azure[i].BaseScraper = spec.Azure[i].BaseScraper.ApplyPlugins(plugins...)
 	}
🤖 Fix all issues with AI agents
In `@api/context.go`:
- Around line 161-167: IsDebug() and IsTrace() may dereference a nil
scrapeConfig; update both methods to guard against nil (and nil Spec if Spec is
a pointer) similarly to ScraperID(): check if ctx.scrapeConfig == nil (and
ctx.scrapeConfig.Spec == nil when applicable) and return false if nil, otherwise
call and return ctx.scrapeConfig.Spec.IsDebug() / IsTrace(); this prevents
panics when WithScrapeConfig() hasn't been called.

In `@api/v1/github_security.go`:
- Around line 43-52: ParseMaxAge currently calls time.ParseDuration which
doesn't accept the "d" day suffix; update GitHubSecurityFilters.ParseMaxAge to
detect a trailing "d" (case-insensitive), parse the numeric portion as an
integer or float representing days, convert to a time.Duration via days * 24 *
time.Hour, and return that; for all other inputs keep using time.ParseDuration
and propagate its error. Ensure the implementation handles empty MaxAge (already
returns 0,nil), returns a clear error on invalid numeric parsing, and references
the MaxAge field and ParseMaxAge method when making the change.

In `@api/v1/openssf.go`:
- Around line 10-11: The MinScore field currently uses *float64 which breaks
controller-gen; change MinScore to a string pointer (e.g., *string) and add
validation/tagging as needed in the struct, then update the scraper logic to
parse the string into a float64 (handle nil/parse errors) wherever MinScore is
consumed; alternatively, if you must keep a float, add the kubebuilder CRD
marker // +kubebuilder:validation:Type=number and ensure controller-gen is
invoked with crd:allowDangerousTypes=true—locate and update the MinScore field
in the api/v1/openssf.go struct for these changes.

In `@scrapers/github/security_client.go`:
- Around line 121-133: GetSecurityAdvisories currently omits pagination so
results are limited to the API default; update the request options in
GetSecurityAdvisories to copy AlertListOptions.Page and AlertListOptions.PerPage
into the github.ListRepositorySecurityAdvisoriesOptions (alongside State) before
calling c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories so callers
can control Page and PerPage like the other methods (GetDependabotAlerts,
GetCodeScanningAlerts, GetSecretScanningAlerts).

In `@scrapers/github/security.go`:
- Around line 94-106: ScrapeResult.CreatedAt is typed as *time.Time but the code
passes a time.Time from repo.CreatedAt.GetTime(); fix by creating a local
variable to hold the returned time (e.g., tm := repo.CreatedAt.GetTime()) and
assign CreatedAt to its address (CreatedAt: &tm) when constructing the
v1.ScrapeResult; update the construction in the block that builds the
ScrapeResult (v1.ScrapeResult / CreatedAt) so it uses the pointer to the local
time variable.

In `@scrapers/openssf/compliance.go`:
- Around line 10-11: The file-level comment above ComplianceMappings incorrectly
cites "plugins/vite-plugin-security.js" as the data source; update the comment
to reference the correct source (e.g., the OpenSSF Scorecard documentation or
the actual mapping source used in this repo) so it accurately documents where
the compliance mapping originates; locate the top comment near the
ComplianceMappings declaration in scrapers/openssf/compliance.go and replace the
incorrect reference with the proper source name/URL (or remove the irrelevant
Vite/Vue reference) while keeping the explanatory text about mapping OpenSSF
Scorecard checks to compliance framework controls.

In `@scrapers/openssf/scorecard.go`:
- Around line 274-276: The function scorecardCheckNameToKebab is a no-op (it
replaces "-" with "-" and is unused); either remove the function completely or
implement proper kebab-case conversion and use it where needed—e.g., transform
spaces/underscores to dashes and lowercase (apply to scorecardCheckNameToKebab),
or delete the scorecardCheckNameToKebab declaration and any references to it if
no conversion is required.
🧹 Nitpick comments (4)
scrapers/github/security.go (1)

291-296: createConfigInsights is still a FIXME.

Please implement or track via a follow-up issue so the alert details aren’t silently dropped.

scrapers/github/security_client.go (1)

59-79: Empty state string passed to API unconditionally.

When opts.State is empty, you're passing &opts.State (pointer to empty string) to the API. This may cause unexpected filtering behavior depending on how the GitHub API handles empty state values.

Suggested fix to conditionally set State
 func (c *GitHubSecurityClient) GetDependabotAlerts(ctx context.Context, opts AlertListOptions) ([]*github.DependabotAlert, *github.Response, error) {
 	reqOpts := &github.ListAlertsOptions{
-		State: &opts.State,
 		ListOptions: github.ListOptions{
 			Page:    opts.Page,
 			PerPage: opts.PerPage,
 		},
 	}
 
+	if opts.State != "" {
+		reqOpts.State = &opts.State
+	}
+
 	if opts.Severity != "" {
 		reqOpts.Severity = &opts.Severity
 	}
scrapers/openssf/scorecard.go (2)

74-84: Type assertion without safety check could panic.

Line 76 uses a direct type assertion lastScrape.(time.Time) without the comma-ok pattern. If a non-time.Time value is ever stored in the map (due to a bug or future change), this will panic.

Suggested fix for safer type assertion
 func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
 	if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
-		lastTime := lastScrape.(time.Time)
+		lastTime, ok := lastScrape.(time.Time)
+		if !ok {
+			ctx.Logger.V(3).Infof("invalid cache entry for %s, will refetch", repoFullName)
+			return false, ""
+		}
 		timeSinceLastScrape := time.Since(lastTime)

139-143: Consider if score < 10 is too strict for "failing" status.

A score of 9/10 is marked as "failing", which may be overly strict. Most scoring systems consider 7+ or 8+ as passing. This creates inconsistency with createRepositoryProperties where score >= 7 counts as passing.

Consider aligning with the passing threshold used elsewhere
-		if check.Score < 10 {
+		if check.Score < 7 {
 			analysis.Status = "failing"
 		} else {
 			analysis.Status = "passing"
 		}

Comment thread api/context.go Outdated
Comment on lines +161 to +167
func (ctx ScrapeContext) IsTrace() bool {
return ctx.scrapeConfig.Spec.IsTrace()
}

func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential nil pointer dereference in IsDebug() and IsTrace().

Both methods access ctx.scrapeConfig.Spec without checking if scrapeConfig is nil. Other methods like ScraperID() (lines 144-149) perform this nil check. If IsDebug() or IsTrace() is called before WithScrapeConfig(), this will panic.

Suggested fix for IsDebug (same pattern should be applied to IsTrace)
 func (ctx ScrapeContext) IsDebug() bool {
+	if ctx.scrapeConfig == nil {
+		return false
+	}
 	return ctx.scrapeConfig.Spec.IsDebug()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (ctx ScrapeContext) IsTrace() bool {
return ctx.scrapeConfig.Spec.IsTrace()
}
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
func (ctx ScrapeContext) IsTrace() bool {
return ctx.scrapeConfig.Spec.IsTrace()
}
func (ctx ScrapeContext) IsDebug() bool {
if ctx.scrapeConfig == nil {
return false
}
return ctx.scrapeConfig.Spec.IsDebug()
}
🤖 Prompt for AI Agents
In `@api/context.go` around lines 161 - 167, IsDebug() and IsTrace() may
dereference a nil scrapeConfig; update both methods to guard against nil (and
nil Spec if Spec is a pointer) similarly to ScraperID(): check if
ctx.scrapeConfig == nil (and ctx.scrapeConfig.Spec == nil when applicable) and
return false if nil, otherwise call and return ctx.scrapeConfig.Spec.IsDebug() /
IsTrace(); this prevents panics when WithScrapeConfig() hasn't been called.

Comment thread api/v1/github_security.go Outdated
Comment on lines +43 to +52
// MaxAge filters alerts by age (e.g., "90d", "30d")
MaxAge string `yaml:"maxAge,omitempty" json:"maxAge,omitempty"`
}

// ParseMaxAge converts the MaxAge string to a time.Duration
func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
if f.MaxAge == "" {
return 0, nil
}
return time.ParseDuration(f.MaxAge)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Go time.ParseDuration supported time units documentation

💡 Result:

In Go’s standard library, time.ParseDuration accepts these unit suffixes (as documented):

  • ns
  • us (or µs)
  • ms
  • s
  • m
  • h (hours)

Docs: time.ParseDuration on pkg.go.dev/time (see “Valid time units…”). [1]

Examples: 300ms, -1.5h, 2h45m, 1µs / 1us. [1]

Sources

[1] https://pkg.go.dev/time#ParseDuration


🏁 Script executed:

# Search for MaxAge documentation examples in the codebase
rg -A 5 -B 5 "MaxAge" --type go | head -100

Repository: flanksource/config-db

Length of output: 186


🏁 Script executed:

# Specifically look for "90d" or "30d" examples in comments or strings
rg "(90d|30d)"

Repository: flanksource/config-db

Length of output: 186


🏁 Script executed:

fd github_security.go --exec cat {}

Repository: flanksource/config-db

Length of output: 186


ParseMaxAge cannot handle day units—time.ParseDuration only accepts ns, us, ms, s, m, h.

If the documentation or code examples suggest using "90d" or "30d", they will fail. Either update examples to hours (e.g., 2160h for 90 days) or implement custom day parsing.

🛠️ Proposed fix (support "d" suffix)
 import (
+	"strconv"
+	"strings"
 	"time"
 
 	"github.com/flanksource/duty/types"
 )
@@
 func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
 	if f.MaxAge == "" {
 		return 0, nil
 	}
+	if strings.HasSuffix(f.MaxAge, "d") {
+		daysStr := strings.TrimSuffix(f.MaxAge, "d")
+		days, err := strconv.Atoi(daysStr)
+		if err != nil {
+			return 0, err
+		}
+		return time.Hour * 24 * time.Duration(days), nil
+	}
 	return time.ParseDuration(f.MaxAge)
 }
🤖 Prompt for AI Agents
In `@api/v1/github_security.go` around lines 43 - 52, ParseMaxAge currently calls
time.ParseDuration which doesn't accept the "d" day suffix; update
GitHubSecurityFilters.ParseMaxAge to detect a trailing "d" (case-insensitive),
parse the numeric portion as an integer or float representing days, convert to a
time.Duration via days * 24 * time.Hour, and return that; for all other inputs
keep using time.ParseDuration and propagate its error. Ensure the implementation
handles empty MaxAge (already returns 0,nil), returns a clear error on invalid
numeric parsing, and references the MaxAge field and ParseMaxAge method when
making the change.

Comment thread api/v1/openssf.go Outdated
Comment thread scrapers/github/security_client.go Outdated
Comment on lines +121 to +133
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}

advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}

return advisories, resp, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

go-github ListRepositorySecurityAdvisoriesOptions pagination support

💡 Result:

go-github’s ListRepositorySecurityAdvisoriesOptions supports offset-based pagination the same way most “List…” calls do: via embedded github.ListOptions (page / per_page). ListOptions is the standard pagination struct in go-github and defines Page and PerPage. [1]

So you paginate like:

  • set opt.ListOptions.PerPage (and opt.ListOptions.Page starting at 1)
  • call ListRepositorySecurityAdvisories(...)
  • read resp.NextPage
  • loop until resp.NextPage == 0

This is the documented go-github pagination pattern in general. [2]
And ListRepositorySecurityAdvisoriesOptions is present as an options type alongside the advisory-listing methods. [3]

Sources
[1] (pkg.go.dev)
[2] (sources.debian.org)
[3] (pkg.go.dev)

Citations:


🏁 Script executed:

# First, let's examine the file to see the full context and compare with other alert methods
find . -name "security_client.go" -path "*/github/*" | head -5

Repository: flanksource/config-db

Length of output: 101


🏁 Script executed:

# Let's also check if there's a go.mod to see which version of go-github is being used
find . -name "go.mod" | head -1 | xargs cat | grep github

Repository: flanksource/config-db

Length of output: 22311


🏁 Script executed:

cat -n ./scrapers/github/security_client.go

Repository: flanksource/config-db

Length of output: 6316


🏁 Script executed:

# Also search for other alert-related methods to compare pagination patterns
grep -n "func.*Alert\|func.*Advisory" ./scrapers/github/security_client.go

Repository: flanksource/config-db

Length of output: 681


Add pagination support to GetSecurityAdvisories.

GetSecurityAdvisories is the only alert method missing ListOptions pagination. All other methods (GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts) include Page and PerPage from AlertListOptions. Without pagination, results will be limited to the API's default page size.

Suggested fix
 func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
 	reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
 		State: opts.State,
+		ListOptions: github.ListOptions{
+			Page:    opts.Page,
+			PerPage: opts.PerPage,
+		},
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
ListOptions: github.ListOptions{
Page: opts.Page,
PerPage: opts.PerPage,
},
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
🤖 Prompt for AI Agents
In `@scrapers/github/security_client.go` around lines 121 - 133,
GetSecurityAdvisories currently omits pagination so results are limited to the
API default; update the request options in GetSecurityAdvisories to copy
AlertListOptions.Page and AlertListOptions.PerPage into the
github.ListRepositorySecurityAdvisoriesOptions (alongside State) before calling
c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories so callers can
control Page and PerPage like the other methods (GetDependabotAlerts,
GetCodeScanningAlerts, GetSecretScanningAlerts).

Comment thread scrapers/github/security.go Outdated
Comment on lines +94 to +106
result := v1.ScrapeResult{
Type: ConfigTypeGitHubSecurityRepo,
ID: fmt.Sprintf("github-security/%s", repoFullName),
Name: repoFullName,
ConfigClass: "Security",
Config: repo,
Tags: v1.Tags{
{Name: "owner", Value: repoConfig.Owner},
{Name: "repo", Value: repoConfig.Repo},
{Name: "security", Value: "true"},
},
CreatedAt: repo.CreatedAt.GetTime(),
Properties: properties,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix CreatedAt type mismatch.

ScrapeResult.CreatedAt expects *time.Time, but repo.CreatedAt.GetTime() returns time.Time.

🐛 Proposed fix
-	result := v1.ScrapeResult{
+	createdAt := repo.GetCreatedAt()
+	result := v1.ScrapeResult{
 		Type:        ConfigTypeGitHubSecurityRepo,
 		ID:          fmt.Sprintf("github-security/%s", repoFullName),
 		Name:        repoFullName,
 		ConfigClass: "Security",
 		Config:      repo,
 		Tags: v1.Tags{
 			{Name: "owner", Value: repoConfig.Owner},
 			{Name: "repo", Value: repoConfig.Repo},
 			{Name: "security", Value: "true"},
 		},
-		CreatedAt:  repo.CreatedAt.GetTime(),
+		CreatedAt:  &createdAt,
 		Properties: properties,
 	}
🤖 Prompt for AI Agents
In `@scrapers/github/security.go` around lines 94 - 106, ScrapeResult.CreatedAt is
typed as *time.Time but the code passes a time.Time from
repo.CreatedAt.GetTime(); fix by creating a local variable to hold the returned
time (e.g., tm := repo.CreatedAt.GetTime()) and assign CreatedAt to its address
(CreatedAt: &tm) when constructing the v1.ScrapeResult; update the construction
in the block that builds the ScrapeResult (v1.ScrapeResult / CreatedAt) so it
uses the pointer to the local time variable.

Comment thread scrapers/openssf/compliance.go Outdated
Comment on lines +10 to +11
// ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
// Based on DataSheet reference: plugins/vite-plugin-security.js
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect source reference in documentation comment.

The comment references "plugins/vite-plugin-security.js" which appears to be from a different project (JavaScript/Vite). This should reference the actual data source for the compliance mappings.

Suggested fix
 // ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
-// Based on DataSheet reference: plugins/vite-plugin-security.js
+// Based on OpenSSF Scorecard checks documentation
 var ComplianceMappings = map[string]ComplianceMapping{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
// Based on DataSheet reference: plugins/vite-plugin-security.js
// ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
// Based on OpenSSF Scorecard checks documentation
var ComplianceMappings = map[string]ComplianceMapping{
🤖 Prompt for AI Agents
In `@scrapers/openssf/compliance.go` around lines 10 - 11, The file-level comment
above ComplianceMappings incorrectly cites "plugins/vite-plugin-security.js" as
the data source; update the comment to reference the correct source (e.g., the
OpenSSF Scorecard documentation or the actual mapping source used in this repo)
so it accurately documents where the compliance mapping originates; locate the
top comment near the ComplianceMappings declaration in
scrapers/openssf/compliance.go and replace the incorrect reference with the
proper source name/URL (or remove the irrelevant Vite/Vue reference) while
keeping the explanatory text about mapping OpenSSF Scorecard checks to
compliance framework controls.

Comment thread scrapers/openssf/scorecard.go Outdated
Comment on lines +274 to +276
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: scorecardCheckNameToKebab is a no-op.

This function replaces "-" with "-", which has no effect. It's also unused in the file. Remove it or fix the implementation if kebab-case conversion is needed.

Remove unused no-op function
-func scorecardCheckNameToKebab(name string) string {
-	return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
-}

Or if actual conversion is needed (e.g., spaces/underscores to dashes):

func scorecardCheckNameToKebab(name string) string {
	return strings.ToLower(strings.ReplaceAll(name, " ", "-"))
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
}
🤖 Prompt for AI Agents
In `@scrapers/openssf/scorecard.go` around lines 274 - 276, The function
scorecardCheckNameToKebab is a no-op (it replaces "-" with "-" and is unused);
either remove the function completely or implement proper kebab-case conversion
and use it where needed—e.g., transform spaces/underscores to dashes and
lowercase (apply to scorecardCheckNameToKebab), or delete the
scorecardCheckNameToKebab declaration and any references to it if no conversion
is required.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Fix all issues with AI agents
In `@api/context.go`:
- Around line 165-167: The IsDebug method on ScrapeContext can panic if
scrapeConfig is nil; update ScrapeContext.IsDebug() to guard against a nil
ctx.scrapeConfig and return false when unset, otherwise return
ctx.scrapeConfig.Spec.IsDebug(); reference the ScrapeContext type and IsDebug()
method and ensure compatibility with NewScrapeContext which may produce a
context without scrapeConfig.

In `@api/v1/github_security.go`:
- Around line 47-53: The ParseMaxAge method in GitHubSecurityFilters currently
calls time.ParseDuration which rejects "90d"/"30d" as documented; change
ParseMaxAge to accept a trailing 'd' by detecting a suffix of 'd'
(case-insensitive), parsing the numeric portion as an integer/float, converting
days to hours (days * 24h) and returning time.Duration (or fall back to
time.ParseDuration for other valid suffixes), and ensure errors are returned for
invalid inputs; update logic inside ParseMaxAge to handle both "Nd" inputs and
standard time.ParseDuration-compatible strings and reference the
GitHubSecurityFilters.ParseMaxAge function when locating the change.

In `@api/v1/interface.go`:
- Around line 815-831: The PrettyRow method (ScrapeResult.PrettyRow) currently
performs unconditional printing via clicky.MustPrint and accepts an unused
options parameter; remove the two clicky.MustPrint calls so the method only
constructs and returns the row map, and drop the unused options parameter from
the function signature (or if you prefer to keep it, ensure it's used) so
callers control printing/formatting. Update any call sites of
ScrapeResult.PrettyRow to handle printing/formatting themselves (e.g., call
clicky.MustPrint with desired FormatOptions where needed).

In `@api/v1/openssf.go`:
- Around line 10-11: The MinScore field currently uses *float64 which breaks CRD
generation; change MinScore to a safer type (e.g., *string or *int) and update
any callers/serialization accordingly: replace the declaration of MinScore
(*float64) with either *string (to store "7.5") or *int representing a scaled
value (e.g., 75 for 0.0–10.0 scaled to 0–100), keep the existing yaml/json tags,
and adjust validation/usage logic that reads MinScore (parsing string to float
or dividing the int by the scale) so consumers like validation, defaults, and
API docs remain correct; if you must keep a numeric type, add the kubebuilder
marker // +kubebuilder:validation:Type=number and rerun controller-gen with
crd:allowDangerousTypes=true (not recommended).

In `@api/v1/types.go`:
- Around line 71-74: The ScraperSpec additions (fields GitHubSecurity and
OpenSSFScorecard) are not wired into AllScraperConfigs and ApplyPlugin; update
the AllScraperConfigs map to include entries "githubsecurity": GitHubSecurity{}
and "openssfscorecard": OpenSSFScorecard{}, and modify the ApplyPlugin method to
iterate over s.GitHubSecurity and s.OpenSSFScorecard and call ApplyPlugin on
each item (mirroring the existing loops for GithubActions and Azure) so those
plugin configs are applied; reference the AllScraperConfigs map and the
ApplyPlugin method and the types GitHubSecurity and OpenSSFScorecard when
locating code to change.

In `@cmd/run.go`:
- Around line 108-125: The current code always exports/prints regardless of the
--export flag; update the branch to honor the boolean export flag (e.g.,
exportFlag or export) so the export path only runs when exportFlag is true and
the print path runs when exportFlag is false; specifically, wrap the export loop
that calls exportResource(results, outputDir) and the
logger.Infof("Exported...") inside if exportFlag { ... } and move the
clicky.MustPrint(all.*) calls into the corresponding else branch, and also
validate that when exportFlag is true an outputDir is provided (return a clear
error if empty) to avoid silent failures.

In `@scrapers/github/security_client.go`:
- Around line 23-48: In NewGitHubSecurityClient, after resolving token (the
block that sets token from ctx.HydrateConnectionByURL or
ctx.GetEnvValueFromCache), add a check that token is not empty and return a
descriptive error if it is; for example, if token == "" return
fmt.Errorf("missing GitHub token: no connection %q and no personalAccessToken
provided", config.ConnectionName or config.PersonalAccessToken) so the function
fails fast instead of creating a client that will 401 on API calls.
- Around line 121-133: GetSecurityAdvisories ignores pagination and always
fetches the first page; set the ListRepositorySecurityAdvisoriesOptions to
include the requested page and per-page values from AlertListOptions (map
opts.Page and opts.PerPage into the options used for the API call). Update the
reqOpts used in GetSecurityAdvisories to populate the ListOptions/Page and
PerPage fields (same approach as
GetDependabotAlerts/GetCodeScanningAlerts/GetSecretScanningAlerts) so the call
to c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories honors
pagination.

In `@scrapers/github/security.go`:
- Around line 59-62: The log call is formatting with "%d" but is concatenating
the alert-count strings, so compute an integer total first by parsing the
individual counts from result.Properties.Find("Critical Alerts").Text, "High
Alerts", "Medium Alerts", and "Low Alerts" (use strconv.Atoi, handle errors
defaulting to 0), sum them into a total int, then call
ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
total) instead of concatenating the Text values.

In `@scrapers/processors/json.go`:
- Around line 599-607: The validation currently checks only ExternalChangeID for
standalone inputs but not ExternalID; update the validator in the same block
that inspects input.Changes (near the lo.Filter usage) to also check for any
change where ExternalID == "" when input.ID is empty (standalone change). Use
the same pattern (e.g., lo.Filter or an additional check) referencing
input.Changes, ExternalID, ExternalChangeID and input.ID, and return the same
formatted error (including e.Config and input.Debug()) when either required
field is missing.

In `@scrapers/processors/script.go`:
- Around line 33-36: The current logger.Infof call in the json.Unmarshal error
block logs the entire script output variable s (risking secrets/PII and large
dumps); update the error handling around json.Unmarshal to avoid printing the
full s — instead log the error and either a safely truncated sample (e.g., first
N characters with "…(truncated)") and the total length, or a redacted/hashed
summary of s, and consider simple redaction for common secrets (tokens,
passwords) before including any sample; modify the logger.Infof invocation that
references err and s to use the new truncated/redacted sample and length-only
information.
🧹 Nitpick comments (4)
api/v1/scrapeconfig_types.go (1)

70-76: Potential ambiguity with ScraperSpec.IsDebug() and ScraperSpec.IsTrace() methods.

ScrapeConfig.IsDebug() checks annotations while ScraperSpec.IsDebug() (in types.go lines 178-184) checks the LogLevel field. Both methods exist on related types with the same name but different behavior.

Consider documenting or clarifying the distinction, or using different method names to avoid confusion when both are accessible through the context chain.

scrapers/openssf/compliance.go (1)

110-115: Return defensive copies from GetComplianceMappings.

The returned slices share backing arrays with the global map; external mutation can alter shared mappings. Consider returning copies to keep mappings immutable (and optionally keep the map unexported if it’s not required externally).

♻️ Suggested adjustment
 func GetComplianceMappings(checkName string) ComplianceMapping {
 	if mapping, ok := ComplianceMappings[checkName]; ok {
-		return mapping
+		return ComplianceMapping{
+			SOC2:     append([]string(nil), mapping.SOC2...),
+			NISTSSDF: append([]string(nil), mapping.NISTSSDF...),
+			CIS:      append([]string(nil), mapping.CIS...),
+		}
 	}
 	return ComplianceMapping{}
 }
scrapers/openssf/client.go (2)

72-104: Respect context cancellation during retry backoff.

Line 82–86 sleeps unconditionally; if ctx is canceled, callers still block through the backoff. Consider a select on ctx.Done() to exit promptly.

♻️ Proposed fix (ctx-aware backoff)
 	for attempt := 0; attempt < maxRetries; attempt++ {
 		if attempt > 0 {
 			backoff := time.Duration(attempt*attempt) * time.Second
 			c.ctx.Logger.V(3).Infof("retrying scorecard request for %s (attempt %d/%d) after %v",
 				repoFullName, attempt+1, maxRetries, backoff)
-			time.Sleep(backoff)
+			timer := time.NewTimer(backoff)
+			select {
+			case <-ctx.Done():
+				timer.Stop()
+				return nil, ctx.Err()
+			case <-timer.C:
+			}
 		}

144-168: Retry classification is brittle; prefer typed errors and strings.Contains.

Line 149–152 relies on substring matching, and contains/findSubstring reimplements stdlib logic. Consider using errors.Is/As with net.Error for timeouts and strings.Contains for server errors, and drop custom helpers.

♻️ Proposed fix (typed errors + stdlib Contains)
 import (
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"io"
+	"net"
 	"net/http"
+	"strings"
 	"time"

 	"github.com/flanksource/config-db/api"
 )
@@
 func (c *ScorecardClient) isRetryable(err error) bool {
 	if err == nil {
 		return false
 	}
-
-	errStr := err.Error()
-	return contains(errStr, "server error") ||
-	       contains(errStr, "timeout") ||
-	       contains(errStr, "connection")
+	if errors.Is(err, context.DeadlineExceeded) {
+		return true
+	}
+	var netErr net.Error
+	if errors.As(err, &netErr) && (netErr.Timeout() || netErr.Temporary()) {
+		return true
+	}
+	return strings.Contains(err.Error(), "server error")
 }
-
-func contains(s, substr string) bool {
-	return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
-		(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
-		 findSubstring(s, substr)))
-}
-
-func findSubstring(s, substr string) bool {
-	for i := 0; i <= len(s)-len(substr); i++ {
-		if s[i:i+len(substr)] == substr {
-			return true
-		}
-	}
-	return false
-}

Comment thread api/context.go Outdated
Comment on lines +165 to +167
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against nil ScrapeConfig.
NewScrapeContext can produce a context without scrapeConfig; calling IsDebug() would panic. Return false when unset.

🐛 Proposed fix
 func (ctx ScrapeContext) IsDebug() bool {
-	return ctx.scrapeConfig.Spec.IsDebug()
+	if ctx.scrapeConfig == nil {
+		return false
+	}
+	return ctx.scrapeConfig.Spec.IsDebug()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
func (ctx ScrapeContext) IsDebug() bool {
if ctx.scrapeConfig == nil {
return false
}
return ctx.scrapeConfig.Spec.IsDebug()
}
🤖 Prompt for AI Agents
In `@api/context.go` around lines 165 - 167, The IsDebug method on ScrapeContext
can panic if scrapeConfig is nil; update ScrapeContext.IsDebug() to guard
against a nil ctx.scrapeConfig and return false when unset, otherwise return
ctx.scrapeConfig.Spec.IsDebug(); reference the ScrapeContext type and IsDebug()
method and ensure compatibility with NewScrapeContext which may produce a
context without scrapeConfig.

Comment thread api/v1/github_security.go Outdated
Comment on lines +47 to +53
// ParseMaxAge converts the MaxAge string to a time.Duration
func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
if f.MaxAge == "" {
return 0, nil
}
return time.ParseDuration(f.MaxAge)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the github_security.go file
find . -name "github_security.go" -type f

Repository: flanksource/config-db

Length of output: 186


🏁 Script executed:

# Check if there are any documentation files or tests related to ParseMaxAge
fd -e md -e txt | head -20
git ls-files | grep -i security

Repository: flanksource/config-db

Length of output: 567


🏁 Script executed:

# Search for references to "90d" and "30d" in the codebase
rg "90d|30d" -i

Repository: flanksource/config-db

Length of output: 15027


🏁 Script executed:

# Read the github_security.go file to see the actual implementation
cat -n api/v1/github_security.go | head -60

Repository: flanksource/config-db

Length of output: 2261


🏁 Script executed:

# Search for tests related to ParseMaxAge
fd -e go -type f | xargs grep -l "ParseMaxAge" 2>/dev/null

Repository: flanksource/config-db

Length of output: 237


🌐 Web query:

Go time.ParseDuration supported format "d" day suffix

💡 Result:

Go’s standard library time.ParseDuration does not support a d (day) suffix—the only valid unit suffixes are ns, us (or µs), ms, s, m, h. [1]

The time package also intentionally provides no Day constant (“to avoid confusion across daylight savings time zone transitions”). [1]

What to do instead

  • If you mean exact 24-hour blocks, use hours: 24h, 48h, 720h (for 30 days), etc. [1]
  • If you mean calendar days (variable-length due to DST), use date math like t.AddDate(0, 0, days) rather than time.Duration.

🏁 Script executed:

# Search for tests of ParseMaxAge or any usages of it
rg "ParseMaxAge" -i -A 3 -B 3

Repository: flanksource/config-db

Length of output: 511


🏁 Script executed:

# Check if there are any existing tests for github_security
fd -name "*github*security*test*" -o -name "*test*github*security*"

Repository: flanksource/config-db

Length of output: 295


🏁 Script executed:

# Also check fixtures that might use MaxAge
grep -r "maxAge\|MaxAge" --include="*.yaml" --include="*.yml" --include="*.json"

Repository: flanksource/config-db

Length of output: 458


ParseMaxAge rejects the documented 90d/30d inputs.

Line 43 documents examples "90d" and "30d", but line 52 uses time.ParseDuration, which doesn't accept a d suffix. The only supported unit suffixes are ns, us, µs, ms, s, m, h. Either update the documentation to reflect supported formats (e.g., "2160h" for 90 days, "720h" for 30 days) or implement custom parsing to support the d suffix.

🤖 Prompt for AI Agents
In `@api/v1/github_security.go` around lines 47 - 53, The ParseMaxAge method in
GitHubSecurityFilters currently calls time.ParseDuration which rejects
"90d"/"30d" as documented; change ParseMaxAge to accept a trailing 'd' by
detecting a suffix of 'd' (case-insensitive), parsing the numeric portion as an
integer/float, converting days to hours (days * 24h) and returning time.Duration
(or fall back to time.ParseDuration for other valid suffixes), and ensure errors
are returned for invalid inputs; update logic inside ParseMaxAge to handle both
"Nd" inputs and standard time.ParseDuration-compatible strings and reference the
GitHubSecurityFilters.ParseMaxAge function when locating the change.

Comment thread api/v1/interface.go Outdated
Comment thread api/v1/openssf.go Outdated
Comment thread api/v1/types.go Outdated
Comment on lines +71 to +74
GithubActions []GitHubActions `json:"githubActions,omitempty" yaml:"githubActions,omitempty"`
GitHubSecurity []GitHubSecurity `json:"githubSecurity,omitempty" yaml:"githubSecurity,omitempty"`
OpenSSFScorecard []OpenSSFScorecard `json:"openssfScorecard,omitempty" yaml:"openssfScorecard,omitempty"`
Azure []Azure `json:"azure,omitempty" yaml:"azure,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing entries in AllScraperConfigs map and ApplyPlugin method.

The new GitHubSecurity and OpenSSFScorecard fields are added to ScraperSpec, but the corresponding entries are missing from:

  1. AllScraperConfigs map (lines 15-31) - should include "githubsecurity": GitHubSecurity{} and "openssfscorecard": OpenSSFScorecard{}
  2. ApplyPlugin method (lines 97-167) - should include plugin application loops for both new types, similar to other scrapers
Proposed additions

Add to AllScraperConfigs map:

 var AllScraperConfigs = map[string]any{
 	"aws":            AWS{},
 	"azure":          Azure{},
 	"azuredevops":    AzureDevops{},
 	"exec":           Exec{},
 	"file":           File{},
 	"gcp":            GCP{},
 	"githubactions":  GitHubActions{},
+	"githubsecurity": GitHubSecurity{},
 	"http":           HTTP{},
 	"kubernetes":     Kubernetes{},
 	"kubernetesfile": KubernetesFile{},
 	"logs":           Logs{},
+	"openssfscorecard": OpenSSFScorecard{},
 	"slack":          Slack{},
 	"sql":            SQL{},
 	"terraform":      Terraform{},
 	"trivy":          Trivy{},
 }

Add to ApplyPlugin method (after line 132):

 	for i := range spec.GithubActions {
 		spec.GithubActions[i].BaseScraper = spec.GithubActions[i].BaseScraper.ApplyPlugins(plugins...)
 	}
+
+	for i := range spec.GitHubSecurity {
+		spec.GitHubSecurity[i].BaseScraper = spec.GitHubSecurity[i].BaseScraper.ApplyPlugins(plugins...)
+	}
+
+	for i := range spec.OpenSSFScorecard {
+		spec.OpenSSFScorecard[i].BaseScraper = spec.OpenSSFScorecard[i].BaseScraper.ApplyPlugins(plugins...)
+	}
🤖 Prompt for AI Agents
In `@api/v1/types.go` around lines 71 - 74, The ScraperSpec additions (fields
GitHubSecurity and OpenSSFScorecard) are not wired into AllScraperConfigs and
ApplyPlugin; update the AllScraperConfigs map to include entries
"githubsecurity": GitHubSecurity{} and "openssfscorecard": OpenSSFScorecard{},
and modify the ApplyPlugin method to iterate over s.GitHubSecurity and
s.OpenSSFScorecard and call ApplyPlugin on each item (mirroring the existing
loops for GithubActions and Azure) so those plugin configs are applied;
reference the AllScraperConfigs map and the ApplyPlugin method and the types
GitHubSecurity and OpenSSFScorecard when locating code to change.

Comment thread scrapers/github/security_client.go Outdated
Comment on lines +23 to +48
func NewGitHubSecurityClient(ctx api.ScrapeContext, config v1.GitHubSecurity, owner, repo string) (*GitHubSecurityClient, error) {
var token string
if connection, err := ctx.HydrateConnectionByURL(config.ConnectionName); err != nil {
return nil, fmt.Errorf("failed to hydrate connection: %w", err)
} else if connection != nil {
token = connection.Password
} else {
token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
if err != nil {
return nil, fmt.Errorf("failed to get token: %w", err)
}
}

ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc := oauth2.NewClient(ctx, ts)
client := github.NewClient(tc)

return &GitHubSecurityClient{
ScrapeContext: ctx,
Client: client,
owner: owner,
repo: repo,
}, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub REST API security alerts code scanning secret scanning authentication requirements

💡 Result:

GitHub REST API: security alerts (code scanning, secret scanning, Dependabot) — auth & access requirements

1) Token types you can use

  • OAuth App token or PAT (classic): supported broadly across these alert APIs (see per-endpoint scope requirements below).
  • Fine-grained tokens / GitHub App tokens: supported for most repo/org alert endpoints, but not for some enterprise-level Dependabot alert endpoints. For example, “List Dependabot alerts for an enterprise” does not work with fine-grained PATs or GitHub App tokens. [3]

2) Scopes / permissions (classic PATs & OAuth tokens)

  • Code scanning alerts: requires security_events (private or public), or public_repo if only accessing public repos. [1]
  • Secret scanning alerts: requires repo or security_events (and public_repo can be sufficient if only accessing public repos). [2]
  • Dependabot alerts: commonly requires security_events (and public_repo can be sufficient if only accessing public repos). Some endpoints (e.g., enterprise-level) accept repo or security_events. [3]

3) Fine-grained PAT / GitHub App permission names (repo-scoped)

  • Code scanning: "Code scanning alerts" repo permission (read for GET/list, write for update). [1]
  • Secret scanning: "Secret scanning alerts" repo permission (read/write depending on endpoint). [2]
  • Dependabot: "Dependabot alerts" repo permission (read/write depending on endpoint). [3]

4) Who is allowed to call the endpoints (role / access gating)

Even with the right token scope/permissions, the caller must have sufficient access:

  • Repository alert visibility/access is generally available to people with write/maintain/admin on the repo (and org owners for org-owned repos). GitHub also lets repo admins/org owners grant additional alert access (with constraints). [4]
  • Some alert endpoints explicitly require elevated roles, e.g.:
    • Secret scanning (org-level listing): caller must be an org admin or security manager. [2]
    • Secret scanning (repo-level listing): caller must be a repo admin (or admin of the owning org). [2]
    • Dependabot (org-level listing): caller must be an org owner or security manager. [3]
    • Dependabot (enterprise-level listing): caller must be an enterprise member, and results are limited to orgs where the caller is an org owner or security manager. [3]

Sources: GitHub REST API docs for Code Scanning [1], Secret Scanning [2], Dependabot Alerts [3], and “Managing security and analysis settings… / Granting access to security alerts” [4].


🏁 Script executed:

find . -name "security_client.go" -type f | head -5

Repository: flanksource/config-db

Length of output: 101


🏁 Script executed:

# Look for the full security_client.go file to understand context
fd security_client.go --type f

Repository: flanksource/config-db

Length of output: 99


🏁 Script executed:

cat -n scrapers/github/security_client.go

Repository: flanksource/config-db

Length of output: 6316


🏁 Script executed:

# Check if there are other places in the codebase that use this client
# to understand the broader context
rg "NewGitHubSecurityClient|GitHubSecurityClient" --type go -A 2 -B 2

Repository: flanksource/config-db

Length of output: 5100


🏁 Script executed:

# Check if token validation patterns exist elsewhere in this project
rg "token == \"\"" --type go

Repository: flanksource/config-db

Length of output: 47


Add token validation to fail fast on missing credentials.

Empty token values proceed silently to client creation, causing all subsequent security alert API calls (Dependabot, code scanning, secret scanning, advisories) to fail with 401 Unauthorized. Validate that a token exists and return an explicit error if neither connection nor personalAccessToken config provides one.

🐛 Proposed fix
 	} else {
 		token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
 		if err != nil {
 			return nil, fmt.Errorf("failed to get token: %w", err)
 		}
 	}
+
+	if token == "" {
+		return nil, fmt.Errorf("github personal access token is required (connection or personalAccessToken)")
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func NewGitHubSecurityClient(ctx api.ScrapeContext, config v1.GitHubSecurity, owner, repo string) (*GitHubSecurityClient, error) {
var token string
if connection, err := ctx.HydrateConnectionByURL(config.ConnectionName); err != nil {
return nil, fmt.Errorf("failed to hydrate connection: %w", err)
} else if connection != nil {
token = connection.Password
} else {
token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
if err != nil {
return nil, fmt.Errorf("failed to get token: %w", err)
}
}
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc := oauth2.NewClient(ctx, ts)
client := github.NewClient(tc)
return &GitHubSecurityClient{
ScrapeContext: ctx,
Client: client,
owner: owner,
repo: repo,
}, nil
}
func NewGitHubSecurityClient(ctx api.ScrapeContext, config v1.GitHubSecurity, owner, repo string) (*GitHubSecurityClient, error) {
var token string
if connection, err := ctx.HydrateConnectionByURL(config.ConnectionName); err != nil {
return nil, fmt.Errorf("failed to hydrate connection: %w", err)
} else if connection != nil {
token = connection.Password
} else {
token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
if err != nil {
return nil, fmt.Errorf("failed to get token: %w", err)
}
}
if token == "" {
return nil, fmt.Errorf("github personal access token is required (connection or personalAccessToken)")
}
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc := oauth2.NewClient(ctx, ts)
client := github.NewClient(tc)
return &GitHubSecurityClient{
ScrapeContext: ctx,
Client: client,
owner: owner,
repo: repo,
}, nil
}
🤖 Prompt for AI Agents
In `@scrapers/github/security_client.go` around lines 23 - 48, In
NewGitHubSecurityClient, after resolving token (the block that sets token from
ctx.HydrateConnectionByURL or ctx.GetEnvValueFromCache), add a check that token
is not empty and return a descriptive error if it is; for example, if token ==
"" return fmt.Errorf("missing GitHub token: no connection %q and no
personalAccessToken provided", config.ConnectionName or
config.PersonalAccessToken) so the function fails fast instead of creating a
client that will 401 on API calls.

Comment thread scrapers/github/security_client.go Outdated
Comment on lines +121 to +133
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}

advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}

return advisories, resp, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*.go" -type f | head -20

Repository: flanksource/config-db

Length of output: 465


🏁 Script executed:

# Search for AlertListOptions definition
rg "type AlertListOptions" -A 10

Repository: flanksource/config-db

Length of output: 857


🏁 Script executed:

# Search for usage of ListRepositorySecurityAdvisoriesOptions
rg "ListRepositorySecurityAdvisoriesOptions" -B 2 -A 5

Repository: flanksource/config-db

Length of output: 782


🏁 Script executed:

# Check the security_client.go file structure
cat -n scrapers/github/security_client.go | head -150

Repository: flanksource/config-db

Length of output: 5681


🌐 Web query:

go-github ListRepositorySecurityAdvisoriesOptions ListOptions pagination

💡 Result:

In go-github, pagination for “list” endpoints is done with github.ListOptions (Page, PerPage). You pass it either directly or via embedding in a more specific *Options struct, then loop using resp.NextPage. [1], [2]

So for ListRepositorySecurityAdvisoriesOptions, you paginate the same way (GitHub’s REST endpoint also supports page / per_page). [1], [3]

opt := &github.ListRepositorySecurityAdvisoriesOptions{
    ListOptions: github.ListOptions{PerPage: 100},
}

for {
    advisories, resp, err := client.SecurityAdvisories.
        ListRepositorySecurityAdvisories(ctx, owner, repo, opt)
    if err != nil { return err }

    // ... use advisories ...

    if resp.NextPage == 0 {
        break
    }
    opt.Page = resp.NextPage // (or opt.ListOptions.Page, depending on embedding)
}

Sources: [1] https://sources.debian.org/src/golang-github-google-go-github/48.1.0-2/README.md (Pagination section) • [2] https://pkg.go.dev/github.com/google/go-github/v30/github (ListOptions) • [3] https://docs.github.com/en/rest/security-advisories/repository-advisories (List repository security advisories)


Pagination options ignored for security advisories.

opts.Page and opts.PerPage are not applied, so only the first page is fetched. This creates a data loss risk for repositories with many advisories. All other alert methods in this file (GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts) correctly include ListOptions; this method should follow the same pattern.

Proposed fix
 	reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
 		State: opts.State,
+		ListOptions: github.ListOptions{
+			Page:    opts.Page,
+			PerPage: opts.PerPage,
+		},
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
// GetSecurityAdvisories fetches repository security advisories
func (c *GitHubSecurityClient) GetSecurityAdvisories(ctx context.Context, opts AlertListOptions) ([]*github.SecurityAdvisory, *github.Response, error) {
reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
State: opts.State,
ListOptions: github.ListOptions{
Page: opts.Page,
PerPage: opts.PerPage,
},
}
advisories, resp, err := c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories(ctx, c.owner, c.repo, reqOpts)
if err != nil {
return nil, resp, fmt.Errorf("failed to fetch security advisories: %w", err)
}
return advisories, resp, nil
}
🤖 Prompt for AI Agents
In `@scrapers/github/security_client.go` around lines 121 - 133,
GetSecurityAdvisories ignores pagination and always fetches the first page; set
the ListRepositorySecurityAdvisoriesOptions to include the requested page and
per-page values from AlertListOptions (map opts.Page and opts.PerPage into the
options used for the API call). Update the reqOpts used in GetSecurityAdvisories
to populate the ListOptions/Page and PerPage fields (same approach as
GetDependabotAlerts/GetCodeScanningAlerts/GetSecretScanningAlerts) so the call
to c.Client.SecurityAdvisories.ListRepositorySecurityAdvisories honors
pagination.

Comment thread scrapers/github/security.go Outdated
Comment on lines +59 to +62
results = append(results, result)
ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
result.Properties.Find("Critical Alerts").Text+result.Properties.Find("High Alerts").Text+
result.Properties.Find("Medium Alerts").Text+result.Properties.Find("Low Alerts").Text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix total-alerts logging (string concat + %d).

The log concatenates strings and formats with %d, so the output is incorrect. Compute an integer total before logging.

🐛 Suggested fix
 import (
 	"fmt"
+	"strconv"
 	"sync"
 	"time"
-			results = append(results, result)
-			ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
-				result.Properties.Find("Critical Alerts").Text+result.Properties.Find("High Alerts").Text+
-				result.Properties.Find("Medium Alerts").Text+result.Properties.Find("Low Alerts").Text)
+			results = append(results, result)
+			totalAlerts := 0
+			for _, name := range []string{"Critical Alerts", "High Alerts", "Medium Alerts", "Low Alerts"} {
+				if p := result.Properties.Find(name); p != nil {
+					if n, err := strconv.Atoi(p.Text); err == nil {
+						totalAlerts += n
+					}
+				}
+			}
+			ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName, totalAlerts)
🤖 Prompt for AI Agents
In `@scrapers/github/security.go` around lines 59 - 62, The log call is formatting
with "%d" but is concatenating the alert-count strings, so compute an integer
total first by parsing the individual counts from
result.Properties.Find("Critical Alerts").Text, "High Alerts", "Medium Alerts",
and "Low Alerts" (use strconv.Atoi, handle errors defaulting to 0), sum them
into a total int, then call ctx.Logger.V(2).Infof("successfully scraped %s: %d
total alerts", repoFullName, total) instead of concatenating the Text values.

Comment thread scrapers/processors/json.go
Comment on lines 33 to +36
if err := json.Unmarshal([]byte(s), &results); err != nil {
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging full script output (potential secrets/PII leak).

Even at verbose level, logging raw s can expose credentials/configs and dump large payloads. Consider truncating/redacting or logging only a sample/length.

💡 Suggested fix (truncate output)
-		if logger.V(5).Enabled() {
-			logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
-		}
+		if logger.V(5).Enabled() {
+			const maxLogBytes = 4096
+			out := s
+			if len(out) > maxLogBytes {
+				out = out[:maxLogBytes] + "...(truncated)"
+			}
+			logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, out)
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := json.Unmarshal([]byte(s), &results); err != nil {
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
}
if err := json.Unmarshal([]byte(s), &results); err != nil {
if logger.V(5).Enabled() {
const maxLogBytes = 4096
out := s
if len(out) > maxLogBytes {
out = out[:maxLogBytes] + "...(truncated)"
}
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, out)
}
🤖 Prompt for AI Agents
In `@scrapers/processors/script.go` around lines 33 - 36, The current logger.Infof
call in the json.Unmarshal error block logs the entire script output variable s
(risking secrets/PII and large dumps); update the error handling around
json.Unmarshal to avoid printing the full s — instead log the error and either a
safely truncated sample (e.g., first N characters with "…(truncated)") and the
total length, or a redacted/hashed summary of s, and consider simple redaction
for common secrets (tokens, passwords) before including any sample; modify the
logger.Infof invocation that references err and s to use the new
truncated/redacted sample and length-only information.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scrapers/kubernetes/ketall.go (1)

53-56: ⚠️ Potential issue | 🔴 Critical

Do not swallow errors returned by retry.

Line 55 always returns nil error even when retry.Do fails, which masks scrape failures.

Suggested fix
 	err = retry.Do(ctx, backoff, func(goctx gocontext.Context) error {
 		...
 	})
-
-	return objs, nil
+	if err != nil {
+		return nil, err
+	}
+	return objs, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/kubernetes/ketall.go` around lines 53 - 56, The code currently
ignores errors from retry.Do and always returns nil; modify the function to
capture the error returned by retry.Do (e.g., err := retry.Do(...)), check if
err != nil and return the collected objs (if applicable) along with that err (or
just return nil, err depending on intended semantics), instead of always
returning objs, nil; reference the retry.Do call and the final return of objs to
locate and fix the issue in ketall.go.
♻️ Duplicate comments (9)
pkg/api/context.go (1)

165-167: ⚠️ Potential issue | 🟠 Major

Guard scrapeConfig in IsDebug() to prevent panic.

Line 166 dereferences ctx.scrapeConfig without a nil check. NewScrapeContext() can return with scrapeConfig == nil, so this can panic; the same guard should be applied to IsTrace() (Line 162) for consistency.

🐛 Proposed fix
 func (ctx ScrapeContext) IsTrace() bool {
+	if ctx.scrapeConfig == nil {
+		return false
+	}
 	return ctx.scrapeConfig.Spec.IsTrace()
 }

 func (ctx ScrapeContext) IsDebug() bool {
+	if ctx.scrapeConfig == nil {
+		return false
+	}
 	return ctx.scrapeConfig.Spec.IsDebug()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/context.go` around lines 165 - 167, The IsDebug() method dereferences
ctx.scrapeConfig and can panic when scrapeConfig is nil (NewScrapeContext may
return nil scrapeConfig); update IsDebug() to guard against nil by checking if
ctx.scrapeConfig == nil and returning false if so, mirroring the same nil-check
you should add to IsTrace() to keep behavior consistent; locate and modify the
IsDebug and IsTrace methods in ScrapeContext to perform the nil guard before
calling scrapeConfig.Spec.IsDebug()/IsTrace().
scrapers/processors/script.go (1)

33-36: ⚠️ Potential issue | 🟠 Major

Avoid logging full script output on unmarshal errors.

Line 35 logs raw s, which can expose secrets/PII and flood logs. Log only a bounded, sanitized preview plus length.

Suggested fix
 	if err := json.Unmarshal([]byte(s), &results); err != nil {
 		if logger.V(5).Enabled() {
-			logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
+			const maxLogBytes = 4096
+			out := s
+			if len(out) > maxLogBytes {
+				out = out[:maxLogBytes] + "...(truncated)"
+			}
+			logger.Infof("Failed to unmarshal script output: %v (len=%d)\n%s\n", err, len(s), out)
 		}
 		return nil, err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/processors/script.go` around lines 33 - 36, The unmarshal error
handler currently logs the full script output variable s which may expose
secrets; in the json.Unmarshal error branch update the logger.Infof call (inside
the block that checks logger.V(5).Enabled()) to log only a sanitized, bounded
preview of s (e.g., first N characters) plus the total length and the error from
json.Unmarshal rather than the full payload; use the existing variables
json.Unmarshal, s, results and logger.Infof so the change is localized to that
error-handling block and ensure any non-printable or sensitive patterns (like
emails/URLs) are redacted in the preview.
scrapers/openssf/compliance.go (1)

10-11: Incorrect source reference in documentation comment.

The comment references "plugins/vite-plugin-security.js" which appears to be from a different project. This should reference the OpenSSF Scorecard documentation or the actual data source for the compliance mappings.

Suggested fix
 // ComplianceMappings maps each OpenSSF Scorecard check to compliance framework controls
-// Based on DataSheet reference: plugins/vite-plugin-security.js
+// Based on OpenSSF Scorecard checks documentation
 var ComplianceMappings = map[string]ComplianceMapping{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/compliance.go` around lines 10 - 11, The documentation
comment above the ComplianceMappings variable incorrectly references
"plugins/vite-plugin-security.js"; update the comment to reference the correct
source (e.g., OpenSSF Scorecard documentation or the actual mapping data source)
and include a brief link or citation to that source so the comment accurately
reflects where the compliance mappings come from (edit the comment that precedes
ComplianceMappings in compliance.go to replace the incorrect reference with the
OpenSSF Scorecard docs or the internal data source identifier).
api/v1/types.go (2)

128-134: ApplyPlugin still missing loops for GitHubSecurity and OpenSSFScorecard.

Plugin specs (transforms, retention overrides, etc.) won't be applied to the two new scraper types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/types.go` around lines 128 - 134, The loops that call ApplyPlugins on
BaseScraper are only present for spec.GithubActions and spec.Azure; add
equivalent loops for spec.GitHubSecurity and spec.OpenSSFScorecard so their
BaseScraper fields also call ApplyPlugins(plugins...), e.g., iterate over
spec.GitHubSecurity and spec.OpenSSFScorecard with the same pattern used for
spec.GithubActions/ spec.Azure to assign BaseScraper =
BaseScraper.ApplyPlugins(plugins...).

13-29: AllScraperConfigs still missing githubsecurity and openssfscorecard entries.

Both new scraper types added to ScraperSpec are absent from this registry map.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/types.go` around lines 13 - 29, AllScraperConfigs is missing entries
for the two new scraper types declared in ScraperSpec; add the corresponding map
entries so the registry includes "githubsecurity" and "openssfscorecard". Update
the map by adding "githubsecurity": GitHubSecurity{} and "openssfscorecard":
OpenSSFScorecard{} to the AllScraperConfigs map (matching the existing lowercase
key -> PascalCase type pattern) so the new scraper types are discoverable.
scrapers/github/security.go (1)

89-101: ⚠️ Potential issue | 🔴 Critical

Fix CreatedAt pointer assignment in ScrapeResult.

CreatedAt is being assigned a time.Time value directly; this path expects a pointer and will fail type-checking.

🐛 Proposed fix
-	result := v1.ScrapeResult{
+	createdAt := repo.CreatedAt.GetTime()
+	result := v1.ScrapeResult{
 		Type:        ConfigTypeGitHubSecurityRepo,
 		ID:          fmt.Sprintf("github-security/%s", repoFullName),
 		Name:        repoFullName,
 		ConfigClass: "Security",
 		Config:      repo,
@@
-		CreatedAt:  repo.CreatedAt.GetTime(),
+		CreatedAt:  &createdAt,
 		Properties: properties,
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 89 - 101, The ScrapeResult struct
field CreatedAt is a pointer but the code assigns a time.Time directly; fix in
the ScrapeResult construction (symbol: result := v1.ScrapeResult{ ... CreatedAt:
... }) by creating a local variable like t := repo.CreatedAt.GetTime() and then
assigning the pointer (CreatedAt: &t) so the field receives *time.Time instead
of time.Time.
scrapers/github/security_client.go (2)

23-34: ⚠️ Potential issue | 🟠 Major

Fail fast when token resolution returns empty.

Client creation currently proceeds with an empty token, which guarantees downstream 401s.

🔐 Proposed fix
 	} else {
 		token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
 		if err != nil {
 			return nil, fmt.Errorf("failed to get token: %w", err)
 		}
 	}
+	if token == "" {
+		return nil, fmt.Errorf("github personal access token is required (connection or personalAccessToken)")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security_client.go` around lines 23 - 34, The
NewGitHubSecurityClient constructor allows creating a client with an empty token
which causes downstream 401s; after resolving token in NewGitHubSecurityClient
(the token variable set from connection.Password or ctx.GetEnvValueFromCache)
validate that token is non-empty and return a descriptive error (e.g., "empty
GitHub token" or "failed to resolve GitHub token") instead of proceeding when
token == ""; ensure you check both branches (connection != nil and the env
lookup) and fail fast from the function if the token is empty.

121-125: ⚠️ Potential issue | 🟠 Major

Apply pagination options in GetSecurityAdvisories.

opts.Page and opts.PerPage are ignored here, so advisory retrieval is limited to the default first page.

go-github v73 ListRepositorySecurityAdvisoriesOptions pagination with ListOptions Page PerPage
📄 Proposed fix
 	reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
 		State: opts.State,
+		ListOptions: github.ListOptions{
+			Page:    opts.Page,
+			PerPage: opts.PerPage,
+		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security_client.go` around lines 121 - 125,
GetSecurityAdvisories currently ignores pagination options; update the creation
of reqOpts (ListRepositorySecurityAdvisoriesOptions) inside
GetSecurityAdvisories to include the ListOptions with Page and PerPage from opts
(e.g., set reqOpts.ListOptions = github.ListOptions{Page: opts.Page, PerPage:
opts.PerPage} or equivalent) so the GitHub API call respects opts.Page and
opts.PerPage when listing repository security advisories.
api/v1/github_security.go (1)

43-53: ⚠️ Potential issue | 🟠 Major

ParseMaxAge rejects the documented "90d" / "30d" inputs.

time.ParseDuration does not support day (d) units, so the current implementation conflicts with the comment examples.

Go time.ParseDuration valid units; does it support the "d" day suffix?
🛠️ Proposed fix (support `d` suffix)
 import (
+	"strconv"
+	"strings"
 	"time"

 	"github.com/flanksource/duty/types"
 )
@@
 func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
 	if f.MaxAge == "" {
 		return 0, nil
 	}
+	raw := strings.TrimSpace(strings.ToLower(f.MaxAge))
+	if strings.HasSuffix(raw, "d") {
+		days, err := strconv.Atoi(strings.TrimSuffix(raw, "d"))
+		if err != nil {
+			return 0, err
+		}
+		return time.Duration(days) * 24 * time.Hour, nil
+	}
 	return time.ParseDuration(f.MaxAge)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/github_security.go` around lines 43 - 53, ParseMaxAge currently calls
time.ParseDuration which rejects values like "90d"; update ParseMaxAge on the
GitHubSecurityFilters type to accept a trailing "d" (days) by detecting if
f.MaxAge ends with "d" (or "D"), converting the numeric part to hours (days *
24h) and returning the corresponding time.Duration, while still delegating to
time.ParseDuration for values without a day suffix and returning errors from
parsing as before; reference the MaxAge field and the ParseMaxAge function when
making this change.
🟡 Minor comments (5)
api/v1/utils.go-15-23 (1)

15-23: ⚠️ Potential issue | 🟡 Minor

Avoid exporting MD5 behind a generic Hash API.

While current usages are for non-security ID generation (naming), the exported generic name could encourage future security-sensitive misuse. MD5 has known collision weaknesses and should not be used for authentication, signatures, or integrity checks.

Proposed safer implementation
 import (
-	"crypto/md5"
+	"crypto/sha256"
 	"encoding/hex"
 	"encoding/json"
 	"strings"
 )
@@
-// Hash returns the MD5 hash of the JSON representation of the given value.
+// Hash returns the SHA-256 hash of the JSON representation of the given value.
 func Hash(v any) (string, error) {
 	data, err := json.Marshal(v)
 	if err != nil {
 		return "", err
 	}
 
-	hash := md5.Sum(data)
+	hash := sha256.Sum256(data)
 	return hex.EncodeToString(hash[:]), nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/utils.go` around lines 15 - 23, The exported function Hash currently
hides MD5 usage and risks future security misuse; change it to a non-exported,
purpose-specific name (e.g., hashName or hashID) or replace the algorithm with a
cryptographically safe one (e.g., SHA-256) and keep the name if you intend a
secure hash; update all callers of Hash accordingly. Specifically, modify the
function symbol Hash in api/v1/utils.go to either (a) rename to a lowercase,
purpose-scoped name like hashName/hashID if it is only used for non-security
IDs, or (b) switch the internal md5.Sum call to a SHA-256 implementation and
keep the exported API if you need a secure hash, then adjust return encoding
(hex) as needed; ensure all references to Hash are updated to the new name or
behavior.
scrapers/aws/aws.go-493-493 (1)

493-493: ⚠️ Potential issue | 🟡 Minor

Log level bump to V(3) is reasonable.

Reduces noise from per-chunk ECS task description logging. Note that the log message still references len(allTaskArns) (total count) rather than len(taskArns) (current chunk size), which could be confusing when reading logs.

Suggested fix
-		ctx.Logger.V(3).Infof("describing %d ECS tasks for cluster %s", len(allTaskArns), clusterArn)
+		ctx.Logger.V(3).Infof("describing %d ECS tasks for cluster %s", len(taskArns), clusterArn)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/aws/aws.go` at line 493, The log call ctx.Logger.V(3).Infof
currently prints len(allTaskArns) which is the total but it's inside a per-chunk
loop — update the message at the ctx.Logger.V(3).Infof call to reference
len(taskArns) for the current chunk (or include both counts, e.g. "describing
%d/%d ECS tasks" with len(taskArns) and len(allTaskArns)) so logs reflect the
chunk size being processed.
api/plugin/go.mod-405-407 (1)

405-407: ⚠️ Potential issue | 🟡 Minor

Add glebarez/sqlite replace directive to match root go.mod.

The root go.mod replaces github.com/glebarez/sqlite with github.com/clarkmcc/gorm-sqlite, but this submodule's go.mod lacks the matching replace directive. This module lists glebarez/sqlite v1.11.0 as an indirect dependency but will pull the original package instead of the fork when built standalone.

Proposed fix
 replace github.com/flanksource/config-db/api => ../v1
+
+replace github.com/glebarez/sqlite => github.com/clarkmcc/gorm-sqlite v0.0.0-20240426202654-00ed082c0311
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/go.mod` around lines 405 - 407, The submodule go.mod is missing
the replace that maps github.com/glebarez/sqlite to the fork used at the repo
root; add a replace directive in api/plugin/go.mod (alongside the existing
replace block that contains "github.com/flanksource/config-db/api => ../v1") to
point github.com/glebarez/sqlite to github.com/clarkmcc/gorm-sqlite (use the
same version used in root go.mod), so builds of this submodule use the forked
sqlite implementation.
api/plugin/proto/scraper.proto-116-133 (1)

116-133: ⚠️ Potential issue | 🟡 Minor

Inconsistent type for JSON-encoded fields: patches is string, details is bytes.

In ChangeResultProto, patches (line 123) is declared as string while details (line 129) is bytes with a comment indicating it's JSON-encoded. Usage throughout the codebase confirms patches contains JSON (e.g., json.Unmarshal([]byte(change.Patches), ...) in scrapers/run.go:143 and test fixtures with JSON objects), yet it remains string. Other JSON-encoded fields like config, properties, base_scraper, and analysis in ScrapeResultProto consistently use bytes. This inconsistency should be resolved: either change patches to bytes for consistency with other JSON fields, or remove the "JSON-encoded map" designation from details.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/proto/scraper.proto` around lines 116 - 133, ChangeResultProto
declares patches as string while details is bytes (commented as JSON); make
patches a bytes field to match other JSON-encoded fields and keep a consistent
pattern (update the ChangeResultProto field "patches" to type bytes and add the
same "JSON-encoded map" comment as on "details"). Then update all call sites
that treat ChangeResultProto.patches as a string (places that convert it to
[]byte before json unmarshalling) to accept the bytes field directly, regenerate
the protobuf artifacts, and adjust test fixtures to provide patches as raw
bytes/JSON bytes.
api/plugin/convert_test.go-164-178 (1)

164-178: ⚠️ Potential issue | 🟡 Minor

Test name is misleading: the test uses non-empty details.

TestProtoToChangeResultWithEmptyDetails actually exercises a non-empty JSON payload ({"key": "value"}). Rename to TestProtoToChangeResultWithDetails or similar.

🔧 Proposed rename
-func TestProtoToChangeResultWithEmptyDetails(t *testing.T) {
+func TestProtoToChangeResultWithDetails(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/convert_test.go` around lines 164 - 178, The test name
TestProtoToChangeResultWithEmptyDetails is misleading because the test
constructs a proto with non-empty JSON Details; rename the test function to
something like TestProtoToChangeResultWithDetails to reflect behavior, updating
the function declaration and any references to
TestProtoToChangeResultWithEmptyDetails so the test runner and readers see a
consistent, accurate name; ensure the function name in the file
(TestProtoToChangeResultWithEmptyDetails) is changed and any test-suite or CI
references (if present) are updated accordingly.
🧹 Nitpick comments (10)
scrapers/aws/cloudtrail.go (1)

15-15: Nit: the v1 alias is misleading for a non-versioned path.

Aliasing a package name to something unrelated to the actual package can confuse readers. Now that the import points to the root api package rather than a v1 sub-path, the alias no longer conveys accurate information. Consider renaming it to api (or a similarly descriptive name) in a follow-up pass.

♻️ Suggested rename (illustrative — apply across the file)
-	v1 "github.com/flanksource/config-db/api"
+	configapi "github.com/flanksource/config-db/api"

Then replace all v1. usages with configapi. throughout this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/aws/cloudtrail.go` at line 15, The import alias `v1` is misleading
for the root `api` package; change the import alias from `v1` to a clearer name
like `api` (or `configapi`) in scrapers/aws/cloudtrail.go and update all
references (e.g., any occurrences of `v1.` used in functions or types in this
file) to the new alias so the package name reflects the actual package; locate
references such as `v1.` in methods and structs in this file and replace them
with `api.` (or `configapi.`) consistently.
Makefile (1)

87-88: gotest target bypasses envtest asset validation.

Unlike gotest-prod and gotest-load which use $(validate-envtest-assets) to verify that etcd and other required binaries exist before running tests, this target uses a bare $(shell ...) expansion. If setup-envtest fails or returns an empty path, the test run will fail with a cryptic error instead of the clear diagnostics provided by the validation macro.

Suggested fix to use the same validation pattern
 .PHONY: gotest
 gotest: ginkgo
-	KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" \
-		ginkgo -r -v --skip-package=tests/e2e,api/plugin -coverprofile cover.out ./...
+	$(validate-envtest-assets) \
+	KUBEBUILDER_ASSETS="$$ASSETS" \
+		ginkgo -r -v --skip-package=tests/e2e,api/plugin -coverprofile cover.out ./...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 87 - 88, The gotest Makefile target uses a direct
shell expansion to set KUBEBUILDER_ASSETS which bypasses the
validate-envtest-assets check; update the gotest target to invoke the existing
validate-envtest-assets validation (the same macro used by gotest-prod and
gotest-load) before using ENVTEST/ENVTEST_K8S_VERSION/LOCALBIN so that etcd and
required binaries are verified and a clear error is produced if validation
fails—locate the gotest target and replace the bare $(shell $(ENVTEST) use ...)
usage with the validated pattern (or add an explicit dependency on
validate-envtest-assets) so KUBEBUILDER_ASSETS is only determined after
validation succeeds.
go.mod (1)

406-406: Pre-release pin for structured-merge-diff/v6.

Pinning to a specific commit (v6.3.2-0.20260122202528-d9cc6641c482) rather than a stable release is fragile. Consider upgrading to a stable tag once one is available, to avoid potential breakage from upstream force-pushes or tag changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 406, The go.mod entry currently pins
sigs.k8s.io/structured-merge-diff/v6 to a pseudo-version commit
(v6.3.2-0.20260122202528-d9cc6641c482); update this to a stable released tag
(e.g., v6.x.y) or otherwise remove the commit-specific pseudo-version so the
module uses a proper semantic version; locate the line containing the module
name "sigs.k8s.io/structured-merge-diff/v6" and replace the pseudo-version with
the intended stable tag (or run `go get
sigs.k8s.io/structured-merge-diff/v6@latest`/appropriate version and `go mod
tidy`) to ensure we don't rely on a fragile pre-release commit.
api/plugin/proto/scraper.proto (1)

1-19: Buf lint warning about package directory mismatch is a non-issue for the current build.

The Buf static analysis flags that configdb.plugin should be in a configdb/plugin directory. Since the Makefile uses raw protoc (not Buf), this doesn't affect the build. If Buf is adopted later, either restructure the directory or add a buf.yaml with a custom root.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/proto/scraper.proto` around lines 1 - 19, The proto package
declaration "package configdb.plugin" in scraper.proto conflicts with Buf's
expected directory layout (configdb/plugin); to fix, either move scraper.proto
into a directory matching the package (create configdb/plugin and place
scraper.proto there) or add a buf.yaml at the repo root configuring a custom
root to avoid the lint error; look for the package line and the services
ScraperPlugin and HostServices in scraper.proto when applying the change so
imports and go_package option remain correct.
.github/workflows/api-lint-test.yml (1)

19-21: Consider using go-version-file to avoid version drift.

Hardcoding go-version: 1.25.x could drift from the go 1.25.1 directive in the submodule's go.mod. Using go-version-file keeps them in sync automatically.

Example
       - uses: buildjet/setup-go@v5
         with:
-          go-version: 1.25.x
+          go-version-file: api/v1/go.mod
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/api-lint-test.yml around lines 19 - 21, The workflow
currently hardcodes the Go version using buildjet/setup-go@v5 with go-version:
1.25.x which can drift from the submodule's go.mod; change the action inputs to
use go-version-file instead and point it at the file that contains the canonical
Go version (e.g., the submodule's go.mod), so replace the go-version parameter
with go-version-file and ensure the path to the go.mod (or a dedicated
.go-version file) is correct to keep versions in sync.
api/v1/Makefile (1)

39-42: golangci-lint install doesn't verify version, unlike controller-gen.

The controller-gen target (lines 47-48) checks that the installed version matches CONTROLLER_TOOLS_VERSION before deciding to reinstall. The golangci-lint target only checks for existence (test -s), so a stale or mismatched version will silently be used.

Proposed fix to add version checking
 .PHONY: golangci-lint
 golangci-lint: $(GOLANGCI_LINT)
 $(GOLANGCI_LINT): $(LOCALBIN)
-	test -s $(LOCALBIN)/golangci-lint || GOBIN=$(LOCALBIN) go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@$(GOLANGCI_LINT_VERSION)
+	test -s $(LOCALBIN)/golangci-lint && $(LOCALBIN)/golangci-lint version --short 2>&1 | grep -q $(GOLANGCI_LINT_VERSION) || \
+	GOBIN=$(LOCALBIN) go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@$(GOLANGCI_LINT_VERSION)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/Makefile` around lines 39 - 42, The golangci-lint target currently
only checks for the binary's existence and can leave a mismatched version
installed; update the golangci-lint target to verify the installed version
against GOLANGCI_LINT_VERSION (similar to the controller-gen logic): invoke
$(LOCALBIN)/golangci-lint --version (or equivalent) and compare the output to
GOLANGCI_LINT_VERSION, and only run GOBIN=$(LOCALBIN) go install
github.com/golangci/golangci-lint/v2/cmd/golangci-lint@$(GOLANGCI_LINT_VERSION)
when the version differs or the binary is missing; use the existing variables
golangci-lint, GOLANGCI_LINT, GOLANGCI_LINT_VERSION, and LOCALBIN to implement
this check.
api/plugin/Makefile (1)

33-37: proto target assumes protoc and plugins are pre-installed.

Unlike the golangci-lint target which auto-installs the tool, the proto target relies on protoc, protoc-gen-go, and protoc-gen-go-grpc being already available. Consider documenting the required tool versions or adding install logic to avoid "command not found" failures for new contributors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/Makefile` around lines 33 - 37, The proto Makefile target named
"proto" currently assumes protoc and its Go plugins are preinstalled; update the
Makefile to either (a) add install steps similar to the existing golangci-lint
flow that will check for and install protoc, protoc-gen-go and
protoc-gen-go-grpc (with pinned versions) before running the protoc command, or
(b) add a clear comment block above the "proto" target listing required tools
and their versions (protoc, protoc-gen-go, protoc-gen-go-grpc) and a one-line
hint on how to install them; reference the "proto" target and mirror the
auto-install pattern used for the golangci-lint target to avoid "command not
found" errors.
scrapers/openssf/scorecard.go (2)

139-143: Hardcoded "failing" / "passing" strings for analysis.Status.

These bare string literals should use constants (e.g., from the models package or a local const block) to prevent silent drift if the expected status values change.

♻️ Proposed refactor
+const (
+	analysisStatusPassing = "passing"
+	analysisStatusFailing = "failing"
+)
 ...
 		if check.Score < 10 {
-			analysis.Status = "failing"
+			analysis.Status = analysisStatusFailing
 		} else {
-			analysis.Status = "passing"
+			analysis.Status = analysisStatusPassing
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 139 - 143, Replace the hardcoded
"failing"/"passing" literals assigned to analysis.Status with well-named
constants (e.g., models.StatusFailing and models.StatusPassing or local consts
StatusFailing/StatusPassing) and update the import if using the models package;
change the branch in the check.Score handling inside the same function (where
analysis.Status and check.Score are used) to assign the constants instead of raw
strings so the status values are centralized and won't drift.

265-272: containsString duplicates slices.Contains from the standard library.

Go 1.21+ includes slices.Contains. Replace the helper to reduce code surface.

♻️ Proposed refactor
-func containsString(slice []string, item string) bool {
-	for _, s := range slice {
-		if s == item {
-			return true
-		}
-	}
-	return false
-}

Then update the call site:

+import "slices"
 ...
-		if check.Score == 0 && containsString(criticalChecks, check.Name) {
+		if check.Score == 0 && slices.Contains(criticalChecks, check.Name) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 265 - 272, The local helper
containsString duplicates the standard library; remove the containsString
function and replace its call sites with slices.Contains(slice, item). Add the
standard library import "slices" to the file (or group it in the existing import
block) and update any references to use slices.Contains where containsString was
used.
api/plugin/convert_test.go (1)

58-90: Round-trip test doesn't verify pointer fields CreatedAt and CreatedBy.

Both fields are set in original and go through a protobuf timestamp conversion. Adding assertions ensures the timestamp serialization round-trip is correct and won't silently regress.

➕ Suggested additional assertions
 	if roundTrip.ChangeType != original.ChangeType {
 		t.Errorf("RoundTrip ChangeType mismatch: got %s, want %s", roundTrip.ChangeType, original.ChangeType)
 	}
+	if roundTrip.CreatedBy == nil || *roundTrip.CreatedBy != *original.CreatedBy {
+		t.Errorf("RoundTrip CreatedBy mismatch: got %v, want %v", roundTrip.CreatedBy, original.CreatedBy)
+	}
+	if roundTrip.CreatedAt == nil || !roundTrip.CreatedAt.Equal(*original.CreatedAt) {
+		t.Errorf("RoundTrip CreatedAt mismatch: got %v, want %v", roundTrip.CreatedAt, original.CreatedAt)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/convert_test.go` around lines 58 - 90, Add assertions to
TestChangeResultRoundTrip to verify the pointer fields CreatedAt and CreatedBy
survive the protobuf round-trip: after converting original -> proto -> roundTrip
(using changeResultToProto and protoToChangeResult), compare that
roundTrip.CreatedBy equals original.CreatedBy (dereference pointers as needed)
and that roundTrip.CreatedAt is non-nil and its time value equals
original.CreatedAt (use time.Equal or equivalent to compare times after proto
timestamp conversion). Ensure you handle nil pointers safely and include clear
failure messages referencing CreatedAt/CreatedBy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/plugin/convert.go`:
- Around line 396-409: The conversion functions (e.g., protoToExternalRole,
protoToExternalUser, protoToExternalGroup, protoToExternalUserGroup,
protoToExternalConfigAccess, protoToExternalConfigAccessLog) currently call
uuid.MustParse which will panic on malformed plugin input; replace each
uuid.MustParse call with uuid.Parse, check the returned error, and only set the
model UUID fields when parsing succeeds (for pointer fields set to &id on
success, for value fields set id on success); on parse error either leave the
field unset/nil or log the invalid UUID (using existing logger) before
continuing so malformed plugin-provided IDs do not crash the process.
- Around line 386-393: externalRoleToProto (and similarly proto conversions for
configs) currently calls lo.FromPtr on r.ScraperID which turns nil into the
zero-UUID string; change the conversion so ScraperId is only set when the
original pointer is non-nil (e.g. if r.ScraperID != nil then
r.ScraperID.String() else ""), so the proto field remains empty for a nil
pointer and the existing protoToExternalRole/protoToExternalConfigAccess logic
(which checks p.ScraperId != "") will correctly preserve nil on round-trip;
update externalRoleToProto to avoid lo.FromPtr for ScraperID and apply the same
pattern where other ScraperID conversions occur.

In `@api/plugin/grpc_plugin.go`:
- Around line 29-37: The GRPCClient implementation dials a host connection and
sets grpcClient.hostClient (via broker.Dial(1) and hostConn) but never uses or
closes it, leaking resources; remove the unused hostClient and the broker.Dial
call: update GRPCPlugin.GRPCClient to stop calling broker.Dial(1) and construct
&grpcClient{client: pb.NewScraperPluginClient(c)} only, and also remove the
hostClient field from the grpcClient struct (and any related imports/fields) so
there is no unused client or open connection; alternatively, if host services
are required, ensure the hostConn is stored, hostClient methods are invoked from
grpcClient methods, and hostConn is closed when no longer needed.

In `@api/plugin/types.go`:
- Around line 35-70: The conversion functions EnvVarFromDuty and (EnvVar) ToDuty
are dropping the EnvVar.Name field; update EnvVarFromDuty to copy e.Name into
the returned EnvVar.Name and update (EnvVar) ToDuty to set types.EnvVar.Name
from e.Name so the name round-trips correctly; locate the conversions in
EnvVarFromDuty and ToDuty and assign the Name field on both sides (EnvVar and
types.EnvVar) while leaving the existing ValueStatic and ValueFrom handling
intact.

In `@scrapers/aws/cloudtrail.go`:
- Line 15: The import alias v1 is referencing a non-existent package root;
update the import to point to the actual package that defines the types (change
the import from v1 "github.com/flanksource/config-db/api" to v1
"github.com/flanksource/config-db/api/v1") so symbols like AWS, ScrapeResults,
ChangeResult and NewJSON resolve; alternatively add re-exports in the api
package root, but the quick fix is to correct the import path where v1 is used.

In `@scrapers/event.go`:
- Around line 10-12: The import alias v1 is pointing to the wrong package;
update the import so v1 refers to the package that actually defines the symbols
used (e.g., v1.ScrapeConfigFromModel, v1.InvolvedObject,
v1.InvolvedObjectFromObj). Replace the current import target for the v1 alias
with the v1 package path (change the import to reference
github.com/flanksource/config-db/api/v1) so all v1.* references resolve
correctly.

In `@scrapers/gcp/cloudsql_backup.go`:
- Line 13: The import change broke compilation because the file uses symbols
declared in api/v1; revert the import to reference the v1 package (e.g., use v1
"github.com/flanksource/config-db/api/v1") so GCP, ScrapeResults, ChangeResult,
CloudSQLInstance, GCPBackupRun and NewScrapeResult resolve correctly, or
alternately move/re-export those symbols into the root api package and update
references accordingly; the simplest fix is to restore the v1 import in
cloudsql_backup.go.

In `@scrapers/gcp/dns.go`:
- Line 12: The import path was changed incorrectly; restore the import to v1
"github.com/flanksource/config-db/api/v1" so the symbols ScrapeResults,
ScrapeResult, GCPManagedZone, and GCPResourceRecordSet referenced in
scrapers/gcp/dns.go resolve correctly; update the import statement in that file
to the api/v1 subpackage (v1) to avoid the compilation error.

In `@scrapers/gcp/security_center.go`:
- Line 9: The import statement using v1 "github.com/flanksource/config-db/api"
is incorrect because the types AnalysisResult, ExternalID, GCP, ScrapeResults
and ScrapeResult are declared in package v1; change the import to v1
"github.com/flanksource/config-db/api/v1" so the code referencing those symbols
(e.g., AnalysisResult, ScrapeResult, GCP) compiles, or alternatively re-export
those types from the root package if you prefer that design.

In `@scrapers/github/security_client.go`:
- Around line 51-57: AlertListOptions.CreatedAt is declared but never used when
building the alert list request, so time-constrained/incremental scrapes are
ineffective; update the code that constructs the GitHub security alerts query
(the logic that reads AlertListOptions in the alert listing path—e.g., functions
like ListAlerts, buildAlertQuery or the request parameter assembly) to include
CreatedAt as a filter/query parameter (convert the string to the appropriate
GitHub API time format if needed) and ensure it is added to the request URL or
body alongside State, Severity, Page, and PerPage so the API call actually
restricts results by creation time.

In `@scrapers/github/security.go`:
- Around line 127-137: The AlertListOptions construction currently only uses
stateFilter and ignores configured filters for severity and maxAge; update the
opts creation in scrapers/github/security.go (where filters := config.Filters
and stateFilter are defined) to propagate filters.Severity and filters.MaxAge
(or map to the expected field names on AlertListOptions, e.g., Severity and
MaxAge or AgeDays) so user-configured severity and maxAge are passed through to
AlertListOptions before passing opts to the list call; ensure you handle
empty/zero values the same way stateFilter is handled.
- Around line 148-170: The fetch loops for Dependabot, code scanning and secret
scanning alerts (calls to GetDependabotAlerts, GetCodeScanningAlerts,
GetSecretScanningAlerts) only request a single page (PerPage:100) and will
truncate results; change each fetch to iterate pagination: start with
opts.Page=1, call the client method, append results into alerts.dependabot /
alerts.codeScanning / alerts.secretScanning, then advance to the next page (use
the returned response.NextPage or increment opts.Page) and repeat until no more
pages or an error; preserve existing error handling and logging (ctx.Debugf) and
ensure maxAlertTime logic remains correct while aggregating across pages.
- Around line 45-51: The code blocks on time.Sleep(duration) after
client.ShouldPauseForRateLimit which prevents prompt shutdown; replace the
blocking sleep with a cancellation-aware wait using select: wait on
time.After(duration) and on ctx.Done(), so if ctx is canceled you stop waiting
and exit/return (or continue) promptly. Update the branch that currently calls
time.Sleep(duration) to use select { case <-time.After(duration): /* resume */
case <-ctx.Done(): ctx.Warnf("rate-limit wait canceled"); return/continue } and
keep existing logging (ctx.Warnf) and error handling around
client.ShouldPauseForRateLimit.

In `@scrapers/openssf/client.go`:
- Around line 122-129: The fetchScorecard HTTP response handling treats 429 as a
generic non-OK error so isRetryable won't catch rate-limits; update the response
checks in fetchScorecard (and the similar block used around the other status
handling) to explicitly treat http.StatusTooManyRequests (429) as a retryable
error — e.g., return an error value or type that is recognized by isRetryable
when resp.StatusCode == http.StatusTooManyRequests, and preserve the existing
body/error text for logging; ensure the same change is applied to the similar
status-check block referenced near the second occurrence so throttling triggers
retries.
- Around line 81-86: The retry backoff currently calls time.Sleep(backoff) which
ignores cancellation; replace that sleep with a context-aware wait (select on
time.After(backoff) and the request/context Done channel) so retries abort
promptly on cancel/timeout. Locate the retry block where repoFullName, attempt,
maxRetries and backoff are computed (the c.ctx.Tracef log and time.Sleep call)
and change the sleep to a select that returns/propagates the context error if
the context is cancelled instead of blocking the goroutine.

In `@scrapers/openssf/scorecard.go`:
- Line 22: The package-level sync.Map LastScorecardScrapeTime currently holds
global state; change it to be scoped per-scraper instance or injectable so cache
doesn't leak across tests or config changes: add a field (e.g., lastScrape
sync.Map or map[string]time.Time with mutex) to the scraper struct used to run
scorecard scrapes (or accept a Cache interface via the scraper's constructor)
and update all uses of LastScorecardScrapeTime to reference that instance field;
ensure the cache honors ScorecardCacheTTL from the scraper's config and write
tests that inject a fresh cache to avoid cross-test contamination.
- Around line 53-57: The cache isn't updated for repos filtered out by MinScore
leading to repeated API calls; in the loop handling scorecard results
(referencing scorecard.Score and config.MinScore) call the cache update
(LastScorecardScrapeTime.Store or the existing scrape-time storage used before)
immediately after receiving the scorecard for repoFullName and before the
MinScore check so that shouldSkipScorecardFetch will hit on subsequent runs;
move the Store(...) invocation to precede the if that checks config.MinScore and
retains the existing timestamp/value you store now.

In `@scrapers/stale.go`:
- Line 8: The import was changed to the root package which is empty; revert the
import to the versioned package so the symbols like DeletedReasonStale are
available: change the import alias back to v1
"github.com/flanksource/config-db/api/v1" in stale.go so uses of
v1.DeletedReasonStale (and any other v1 types) compile.

---

Outside diff comments:
In `@scrapers/kubernetes/ketall.go`:
- Around line 53-56: The code currently ignores errors from retry.Do and always
returns nil; modify the function to capture the error returned by retry.Do
(e.g., err := retry.Do(...)), check if err != nil and return the collected objs
(if applicable) along with that err (or just return nil, err depending on
intended semantics), instead of always returning objs, nil; reference the
retry.Do call and the final return of objs to locate and fix the issue in
ketall.go.

---

Minor comments:
In `@api/plugin/convert_test.go`:
- Around line 164-178: The test name TestProtoToChangeResultWithEmptyDetails is
misleading because the test constructs a proto with non-empty JSON Details;
rename the test function to something like TestProtoToChangeResultWithDetails to
reflect behavior, updating the function declaration and any references to
TestProtoToChangeResultWithEmptyDetails so the test runner and readers see a
consistent, accurate name; ensure the function name in the file
(TestProtoToChangeResultWithEmptyDetails) is changed and any test-suite or CI
references (if present) are updated accordingly.

In `@api/plugin/go.mod`:
- Around line 405-407: The submodule go.mod is missing the replace that maps
github.com/glebarez/sqlite to the fork used at the repo root; add a replace
directive in api/plugin/go.mod (alongside the existing replace block that
contains "github.com/flanksource/config-db/api => ../v1") to point
github.com/glebarez/sqlite to github.com/clarkmcc/gorm-sqlite (use the same
version used in root go.mod), so builds of this submodule use the forked sqlite
implementation.

In `@api/plugin/proto/scraper.proto`:
- Around line 116-133: ChangeResultProto declares patches as string while
details is bytes (commented as JSON); make patches a bytes field to match other
JSON-encoded fields and keep a consistent pattern (update the ChangeResultProto
field "patches" to type bytes and add the same "JSON-encoded map" comment as on
"details"). Then update all call sites that treat ChangeResultProto.patches as a
string (places that convert it to []byte before json unmarshalling) to accept
the bytes field directly, regenerate the protobuf artifacts, and adjust test
fixtures to provide patches as raw bytes/JSON bytes.

In `@api/v1/utils.go`:
- Around line 15-23: The exported function Hash currently hides MD5 usage and
risks future security misuse; change it to a non-exported, purpose-specific name
(e.g., hashName or hashID) or replace the algorithm with a cryptographically
safe one (e.g., SHA-256) and keep the name if you intend a secure hash; update
all callers of Hash accordingly. Specifically, modify the function symbol Hash
in api/v1/utils.go to either (a) rename to a lowercase, purpose-scoped name like
hashName/hashID if it is only used for non-security IDs, or (b) switch the
internal md5.Sum call to a SHA-256 implementation and keep the exported API if
you need a secure hash, then adjust return encoding (hex) as needed; ensure all
references to Hash are updated to the new name or behavior.

In `@scrapers/aws/aws.go`:
- Line 493: The log call ctx.Logger.V(3).Infof currently prints len(allTaskArns)
which is the total but it's inside a per-chunk loop — update the message at the
ctx.Logger.V(3).Infof call to reference len(taskArns) for the current chunk (or
include both counts, e.g. "describing %d/%d ECS tasks" with len(taskArns) and
len(allTaskArns)) so logs reflect the chunk size being processed.

---

Duplicate comments:
In `@api/v1/github_security.go`:
- Around line 43-53: ParseMaxAge currently calls time.ParseDuration which
rejects values like "90d"; update ParseMaxAge on the GitHubSecurityFilters type
to accept a trailing "d" (days) by detecting if f.MaxAge ends with "d" (or "D"),
converting the numeric part to hours (days * 24h) and returning the
corresponding time.Duration, while still delegating to time.ParseDuration for
values without a day suffix and returning errors from parsing as before;
reference the MaxAge field and the ParseMaxAge function when making this change.

In `@api/v1/types.go`:
- Around line 128-134: The loops that call ApplyPlugins on BaseScraper are only
present for spec.GithubActions and spec.Azure; add equivalent loops for
spec.GitHubSecurity and spec.OpenSSFScorecard so their BaseScraper fields also
call ApplyPlugins(plugins...), e.g., iterate over spec.GitHubSecurity and
spec.OpenSSFScorecard with the same pattern used for spec.GithubActions/
spec.Azure to assign BaseScraper = BaseScraper.ApplyPlugins(plugins...).
- Around line 13-29: AllScraperConfigs is missing entries for the two new
scraper types declared in ScraperSpec; add the corresponding map entries so the
registry includes "githubsecurity" and "openssfscorecard". Update the map by
adding "githubsecurity": GitHubSecurity{} and "openssfscorecard":
OpenSSFScorecard{} to the AllScraperConfigs map (matching the existing lowercase
key -> PascalCase type pattern) so the new scraper types are discoverable.

In `@pkg/api/context.go`:
- Around line 165-167: The IsDebug() method dereferences ctx.scrapeConfig and
can panic when scrapeConfig is nil (NewScrapeContext may return nil
scrapeConfig); update IsDebug() to guard against nil by checking if
ctx.scrapeConfig == nil and returning false if so, mirroring the same nil-check
you should add to IsTrace() to keep behavior consistent; locate and modify the
IsDebug and IsTrace methods in ScrapeContext to perform the nil guard before
calling scrapeConfig.Spec.IsDebug()/IsTrace().

In `@scrapers/github/security_client.go`:
- Around line 23-34: The NewGitHubSecurityClient constructor allows creating a
client with an empty token which causes downstream 401s; after resolving token
in NewGitHubSecurityClient (the token variable set from connection.Password or
ctx.GetEnvValueFromCache) validate that token is non-empty and return a
descriptive error (e.g., "empty GitHub token" or "failed to resolve GitHub
token") instead of proceeding when token == ""; ensure you check both branches
(connection != nil and the env lookup) and fail fast from the function if the
token is empty.
- Around line 121-125: GetSecurityAdvisories currently ignores pagination
options; update the creation of reqOpts
(ListRepositorySecurityAdvisoriesOptions) inside GetSecurityAdvisories to
include the ListOptions with Page and PerPage from opts (e.g., set
reqOpts.ListOptions = github.ListOptions{Page: opts.Page, PerPage: opts.PerPage}
or equivalent) so the GitHub API call respects opts.Page and opts.PerPage when
listing repository security advisories.

In `@scrapers/github/security.go`:
- Around line 89-101: The ScrapeResult struct field CreatedAt is a pointer but
the code assigns a time.Time directly; fix in the ScrapeResult construction
(symbol: result := v1.ScrapeResult{ ... CreatedAt: ... }) by creating a local
variable like t := repo.CreatedAt.GetTime() and then assigning the pointer
(CreatedAt: &t) so the field receives *time.Time instead of time.Time.

In `@scrapers/openssf/compliance.go`:
- Around line 10-11: The documentation comment above the ComplianceMappings
variable incorrectly references "plugins/vite-plugin-security.js"; update the
comment to reference the correct source (e.g., OpenSSF Scorecard documentation
or the actual mapping data source) and include a brief link or citation to that
source so the comment accurately reflects where the compliance mappings come
from (edit the comment that precedes ComplianceMappings in compliance.go to
replace the incorrect reference with the OpenSSF Scorecard docs or the internal
data source identifier).

In `@scrapers/processors/script.go`:
- Around line 33-36: The unmarshal error handler currently logs the full script
output variable s which may expose secrets; in the json.Unmarshal error branch
update the logger.Infof call (inside the block that checks
logger.V(5).Enabled()) to log only a sanitized, bounded preview of s (e.g.,
first N characters) plus the total length and the error from json.Unmarshal
rather than the full payload; use the existing variables json.Unmarshal, s,
results and logger.Infof so the change is localized to that error-handling block
and ensure any non-printable or sensitive patterns (like emails/URLs) are
redacted in the preview.

---

Nitpick comments:
In @.github/workflows/api-lint-test.yml:
- Around line 19-21: The workflow currently hardcodes the Go version using
buildjet/setup-go@v5 with go-version: 1.25.x which can drift from the
submodule's go.mod; change the action inputs to use go-version-file instead and
point it at the file that contains the canonical Go version (e.g., the
submodule's go.mod), so replace the go-version parameter with go-version-file
and ensure the path to the go.mod (or a dedicated .go-version file) is correct
to keep versions in sync.

In `@api/plugin/convert_test.go`:
- Around line 58-90: Add assertions to TestChangeResultRoundTrip to verify the
pointer fields CreatedAt and CreatedBy survive the protobuf round-trip: after
converting original -> proto -> roundTrip (using changeResultToProto and
protoToChangeResult), compare that roundTrip.CreatedBy equals original.CreatedBy
(dereference pointers as needed) and that roundTrip.CreatedAt is non-nil and its
time value equals original.CreatedAt (use time.Equal or equivalent to compare
times after proto timestamp conversion). Ensure you handle nil pointers safely
and include clear failure messages referencing CreatedAt/CreatedBy.

In `@api/plugin/Makefile`:
- Around line 33-37: The proto Makefile target named "proto" currently assumes
protoc and its Go plugins are preinstalled; update the Makefile to either (a)
add install steps similar to the existing golangci-lint flow that will check for
and install protoc, protoc-gen-go and protoc-gen-go-grpc (with pinned versions)
before running the protoc command, or (b) add a clear comment block above the
"proto" target listing required tools and their versions (protoc, protoc-gen-go,
protoc-gen-go-grpc) and a one-line hint on how to install them; reference the
"proto" target and mirror the auto-install pattern used for the golangci-lint
target to avoid "command not found" errors.

In `@api/plugin/proto/scraper.proto`:
- Around line 1-19: The proto package declaration "package configdb.plugin" in
scraper.proto conflicts with Buf's expected directory layout (configdb/plugin);
to fix, either move scraper.proto into a directory matching the package (create
configdb/plugin and place scraper.proto there) or add a buf.yaml at the repo
root configuring a custom root to avoid the lint error; look for the package
line and the services ScraperPlugin and HostServices in scraper.proto when
applying the change so imports and go_package option remain correct.

In `@api/v1/Makefile`:
- Around line 39-42: The golangci-lint target currently only checks for the
binary's existence and can leave a mismatched version installed; update the
golangci-lint target to verify the installed version against
GOLANGCI_LINT_VERSION (similar to the controller-gen logic): invoke
$(LOCALBIN)/golangci-lint --version (or equivalent) and compare the output to
GOLANGCI_LINT_VERSION, and only run GOBIN=$(LOCALBIN) go install
github.com/golangci/golangci-lint/v2/cmd/golangci-lint@$(GOLANGCI_LINT_VERSION)
when the version differs or the binary is missing; use the existing variables
golangci-lint, GOLANGCI_LINT, GOLANGCI_LINT_VERSION, and LOCALBIN to implement
this check.

In `@go.mod`:
- Line 406: The go.mod entry currently pins sigs.k8s.io/structured-merge-diff/v6
to a pseudo-version commit (v6.3.2-0.20260122202528-d9cc6641c482); update this
to a stable released tag (e.g., v6.x.y) or otherwise remove the commit-specific
pseudo-version so the module uses a proper semantic version; locate the line
containing the module name "sigs.k8s.io/structured-merge-diff/v6" and replace
the pseudo-version with the intended stable tag (or run `go get
sigs.k8s.io/structured-merge-diff/v6@latest`/appropriate version and `go mod
tidy`) to ensure we don't rely on a fragile pre-release commit.

In `@Makefile`:
- Around line 87-88: The gotest Makefile target uses a direct shell expansion to
set KUBEBUILDER_ASSETS which bypasses the validate-envtest-assets check; update
the gotest target to invoke the existing validate-envtest-assets validation (the
same macro used by gotest-prod and gotest-load) before using
ENVTEST/ENVTEST_K8S_VERSION/LOCALBIN so that etcd and required binaries are
verified and a clear error is produced if validation fails—locate the gotest
target and replace the bare $(shell $(ENVTEST) use ...) usage with the validated
pattern (or add an explicit dependency on validate-envtest-assets) so
KUBEBUILDER_ASSETS is only determined after validation succeeds.

In `@scrapers/aws/cloudtrail.go`:
- Line 15: The import alias `v1` is misleading for the root `api` package;
change the import alias from `v1` to a clearer name like `api` (or `configapi`)
in scrapers/aws/cloudtrail.go and update all references (e.g., any occurrences
of `v1.` used in functions or types in this file) to the new alias so the
package name reflects the actual package; locate references such as `v1.` in
methods and structs in this file and replace them with `api.` (or `configapi.`)
consistently.

In `@scrapers/openssf/scorecard.go`:
- Around line 139-143: Replace the hardcoded "failing"/"passing" literals
assigned to analysis.Status with well-named constants (e.g.,
models.StatusFailing and models.StatusPassing or local consts
StatusFailing/StatusPassing) and update the import if using the models package;
change the branch in the check.Score handling inside the same function (where
analysis.Status and check.Score are used) to assign the constants instead of raw
strings so the status values are centralized and won't drift.
- Around line 265-272: The local helper containsString duplicates the standard
library; remove the containsString function and replace its call sites with
slices.Contains(slice, item). Add the standard library import "slices" to the
file (or group it in the existing import block) and update any references to use
slices.Contains where containsString was used.

Comment thread api/plugin/convert.go Outdated
Comment on lines +386 to +393
func externalRoleToProto(r models.ExternalRole) *pb.ExternalRoleProto {
return &pb.ExternalRoleProto{
Id: r.ID.String(),
AccountId: r.AccountID,
ScraperId: lo.FromPtr(r.ScraperID).String(),
Name: r.Name,
Description: r.Description,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, explore the codebase structure
find . -type f -name "convert.go" | head -20

Repository: flanksource/config-db

Length of output: 88


🏁 Script executed:

# Examine the specific file
head -30 api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 820


🏁 Script executed:

# Check the lines mentioned (386-393 and 506-511)
sed -n '380,415p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 912


🏁 Script executed:

# Search for all lo.FromPtr patterns in the file
rg -n 'lo\.FromPtr' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 180


🏁 Script executed:

# Check for the models.ExternalRole definition to understand ScraperID type
rg -n 'type ExternalRole struct' --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Look for UUID handling and imports in convert.go
head -50 api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 1296


🏁 Script executed:

# Let me check what lo.FromPtr does - search for documentation or usage patterns
# First, let's understand uuid package behavior with nil and zero UUID
cat > /tmp/test_uuid.go << 'EOF'
package main
import (
	"fmt"
	"github.com/google/uuid"
	"github.com/samber/lo"
)

func main() {
	// Test 1: nil pointer
	var nilPtr *uuid.UUID = nil
	fmt.Println("nil pointer:", lo.FromPtr(nilPtr).String())
	
	// Test 2: zero UUID
	zeroUUID := uuid.UUID{}
	fmt.Println("zero UUID:", zeroUUID.String())
	
	// Test 3: non-nil pointer to zero UUID
	ptrToZero := &zeroUUID
	fmt.Println("ptr to zero UUID:", lo.FromPtr(ptrToZero).String())
}
EOF

cd /tmp && go run test_uuid.go 2>&1 || echo "Build failed, checking go.mod"

Repository: flanksource/config-db

Length of output: 428


🏁 Script executed:

# Let's trace through the actual proto conversion flow
# Check the lines around 506-511 to see the second occurrence
sed -n '500,520p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 847


🏁 Script executed:

# Let's understand the types more - check ExternalRole and ExternalConfigAccess structs
rg -A 10 'type ExternalRole struct' --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check how lo.FromPtr behaves with UUID - search samber/lo documentation approach
# Let's look for test cases that might use this pattern
rg -n 'ScraperID' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 1012


🏁 Script executed:

# Check if there are existing tests for these conversion functions
find . -type f -name "*convert*test*" -o -name "*test*convert*" | grep -E '\.go$'

Repository: flanksource/config-db

Length of output: 93


🏁 Script executed:

# Search for how the code handles round-trip conversions
rg -B 5 -A 5 'protoToExternalRole' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 659


🏁 Script executed:

# Let's check the proto definition to understand the field type
find . -type f -name "*.proto" | xargs grep -l "ExternalRoleProto" 2>/dev/null

Repository: flanksource/config-db

Length of output: 97


🏁 Script executed:

# Examine the proto file for ExternalRoleProto and ExternalConfigAccessProto
find . -name "*.proto" -exec grep -l "ExternalRoleProto\|ExternalConfigAccessProto" {} \;

Repository: flanksource/config-db

Length of output: 97


🏁 Script executed:

# Check the convert_test.go file to see if there are tests for nil handling
sed -n '1,100p' api/plugin/convert_test.go

Repository: flanksource/config-db

Length of output: 3134


🏁 Script executed:

# Search for test patterns related to nil ScraperID
rg -n 'ScraperID.*nil\|nil.*ScraperID' api/plugin/convert_test.go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Let me check what go.mod says about samber/lo version
cat go.mod | grep -E 'samber|uuid'

Repository: flanksource/config-db

Length of output: 294


🏁 Script executed:

# Let's look for lo.FromPtr documentation or other similar usages in codebase
rg -n 'lo\.FromPtr' --type go

Repository: flanksource/config-db

Length of output: 23910


🏁 Script executed:

# Check the proto definition for ExternalRoleProto and ExternalConfigAccessProto
cat api/plugin/proto/scraper.proto | grep -A 20 "message ExternalRoleProto"

Repository: flanksource/config-db

Length of output: 462


🏁 Script executed:

# Also check for ExternalConfigAccessProto
cat api/plugin/proto/scraper.proto | grep -A 30 "message ExternalConfigAccessProto"

Repository: flanksource/config-db

Length of output: 820


🏁 Script executed:

# Let's verify the exact behavior by checking how the reverse conversion handles the zero UUID
sed -n '395,415p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 561


🏁 Script executed:

# Check the reverse conversion for ExternalConfigAccess
sed -n '535,560p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 798


🏁 Script executed:

# Let's count how many places have this pattern - search for similar nil ScraperID handling
rg -B 2 -A 2 'if.*ScraperID.*nil' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check if there are other similar UUID pointer serializations
rg -n '\.FromPtr.*UUID.*\.String\(\)' --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check if there are any tests for the round-trip conversion of ExternalRole with nil ScraperID
rg -A 30 'ExternalRole' api/plugin/convert_test.go | grep -i scraper

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Let's examine if there are other pointer-to-string conversions that might have similar issues
sed -n '415,430p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 431


🏁 Script executed:

# Check externalUserToProto for similar pattern
sed -n '415,445p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 732


🏁 Script executed:

# And externalGroupToProto
sed -n '450,485p' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 890


🏁 Script executed:

# Let's check the full context around all lo.FromPtr usages in the convert.go file to see if there are other problem areas
rg -B 3 -A 3 'lo\.FromPtr.*\.String\(\)' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 584


🏁 Script executed:

# Let's verify the samber/lo library behavior with FromPtr on nil
# by checking the library documentation or source
python3 << 'EOF'
# Simulate what lo.FromPtr does with a nil UUID pointer
# FromPtr returns the dereferenced value or zero value if pointer is nil

# For UUID type (google/uuid package):
# zero UUID is 00000000-0000-0000-0000-000000000000

# When lo.FromPtr(nil_uuid_pointer) is called, it returns UUID{}
# When .String() is called on UUID{}, it returns "00000000-0000-0000-0000-000000000000"

print("nil UUID pointer -> UUID{} -> String() = '00000000-0000-0000-0000-000000000000'")
print("This is the root cause of the issue described in the review")
EOF

Repository: flanksource/config-db

Length of output: 204


🏁 Script executed:

# Verify that ExternalRole.ScraperID is indeed a pointer
rg -B 5 -A 10 'type ExternalRole struct' --type go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check ExternalConfigAccess definition
rg -B 5 -A 15 'type ExternalConfigAccess struct' --type go

Repository: flanksource/config-db

Length of output: 1111


Preserve nil ScraperID pointers during proto conversion.

lo.FromPtr(...) on a nil UUID pointer returns the zero UUID 00000000-0000-0000-0000-000000000000, which round-trips as a non-nil pointer. The reverse conversion (protoToExternalRole and protoToExternalConfigAccess) only checks if p.ScraperId != "" before parsing, so the zero UUID gets deserialized back as a concrete pointer, losing the original nil state.

🔧 Proposed fix
 func externalRoleToProto(r models.ExternalRole) *pb.ExternalRoleProto {
-	return &pb.ExternalRoleProto{
+	result := &pb.ExternalRoleProto{
 		Id:          r.ID.String(),
 		AccountId:   r.AccountID,
-		ScraperId:   lo.FromPtr(r.ScraperID).String(),
 		Name:        r.Name,
 		Description: r.Description,
 	}
+	if r.ScraperID != nil {
+		result.ScraperId = r.ScraperID.String()
+	}
+	return result
 }

 func externalConfigAccessToProto(ca v1.ExternalConfigAccess) *pb.ExternalConfigAccessProto {
 	result := &pb.ExternalConfigAccessProto{
 		Id:                   ca.ID,
 		ConfigId:             ca.ConfigID.String(),
-		ScraperId:            lo.FromPtr(ca.ScraperID).String(),
 		ExternalUserAliases:  ca.ExternalUserAliases,
 		ExternalRoleAliases:  ca.ExternalRoleAliases,
 		ExternalGroupAliases: ca.ExternalGroupAliases,
@@ -517,6 +520,9 @@ func externalConfigAccessToProto(ca v1.ExternalConfigAccess) *pb.ExternalConfi
 		},
 	}
+	if ca.ScraperID != nil {
+		result.ScraperId = ca.ScraperID.String()
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/convert.go` around lines 386 - 393, externalRoleToProto (and
similarly proto conversions for configs) currently calls lo.FromPtr on
r.ScraperID which turns nil into the zero-UUID string; change the conversion so
ScraperId is only set when the original pointer is non-nil (e.g. if r.ScraperID
!= nil then r.ScraperID.String() else ""), so the proto field remains empty for
a nil pointer and the existing protoToExternalRole/protoToExternalConfigAccess
logic (which checks p.ScraperId != "") will correctly preserve nil on
round-trip; update externalRoleToProto to avoid lo.FromPtr for ScraperID and
apply the same pattern where other ScraperID conversions occur.

Comment thread api/plugin/convert.go Outdated
Comment on lines +396 to +409
func protoToExternalRole(p *pb.ExternalRoleProto) models.ExternalRole {
role := models.ExternalRole{
Name: p.Name,
AccountID: p.AccountId,
Description: p.Description,
}
if p.Id != "" {
role.ID = uuid.MustParse(p.Id)
}
if p.ScraperId != "" {
id := uuid.MustParse(p.ScraperId)
role.ScraperID = &id
}
return role
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and check the code at the mentioned lines
cd api/plugin && wc -l convert.go

Repository: flanksource/config-db

Length of output: 79


🏁 Script executed:

# Check if the file exists and read specific line ranges
cat -n api/plugin/convert.go | sed -n '396,409p'

Repository: flanksource/config-db

Length of output: 500


🏁 Script executed:

# Find all uuid.MustParse occurrences in the conversion file
rg -n 'uuid\.MustParse' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 765


🏁 Script executed:

# Find all protoTo conversion functions to understand the pattern
rg -n '^func protoTo' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 915


🏁 Script executed:

# Check the context of the proto parameters - are they plugin-provided untrusted values?
cat -n api/plugin/convert.go | sed -n '1,50p'

Repository: flanksource/config-db

Length of output: 1646


🏁 Script executed:

# Check import statements to understand the uuid library being used
cat -n api/plugin/convert.go | head -30

Repository: flanksource/config-db

Length of output: 1030


🏁 Script executed:

# View the full protoToExternalUser function to verify the pattern
cat -n api/plugin/convert.go | sed -n '430,453p'

Repository: flanksource/config-db

Length of output: 752


🏁 Script executed:

# View protoToExternalConfigAccess to check all mentioned locations
cat -n api/plugin/convert.go | sed -n '536,577p'

Repository: flanksource/config-db

Length of output: 1472


🏁 Script executed:

# View protoToExternalGroup function
cat -n api/plugin/convert.go | sed -n '470,490p'

Repository: flanksource/config-db

Length of output: 695


🏁 Script executed:

# View protoToExternalConfigAccessLog function
cat -n api/plugin/convert.go | sed -n '593,600p'

Repository: flanksource/config-db

Length of output: 416


🏁 Script executed:

# Verify uuid.Parse behavior - search for any existing error handling patterns
rg -B2 -A2 'uuid\.Parse\(' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# View protoToExternalUserGroup function (lines 499-503)
cat -n api/plugin/convert.go | sed -n '499,510p'

Repository: flanksource/config-db

Length of output: 653


🏁 Script executed:

# Search for any existing uuid.Parse usage with error handling in the repo
rg -r 'uuid\.Parse' --type go | head -20

Repository: flanksource/config-db

Length of output: 461


🏁 Script executed:

# Check if there's a logger.Warnf pattern used elsewhere for validation errors
rg 'logger\.Warnf' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 177


🏁 Script executed:

# Verify that uuid.MustParse actually panics - check google/uuid documentation or source
# Also check if there's a handler for panics in the plugin layer
rg -r 'recover\|defer' api/plugin/ --type go | head -10

Repository: flanksource/config-db

Length of output: 593


🏁 Script executed:

# Check if these functions are called from plugin handlers and what error handling exists
rg 'protoToExternal' api/plugin/convert.go

Repository: flanksource/config-db

Length of output: 1086


🏁 Script executed:

# Let me verify the context where these functions are called and check for error handling
cat -n api/plugin/convert.go | sed -n '155,175p'

Repository: flanksource/config-db

Length of output: 926


🏁 Script executed:

# Check the calling context to understand the flow
rg -B10 'protoToExternalRole\|protoToExternalUser\|protoToExternalGroup' api/plugin/convert.go | head -50

Repository: flanksource/config-db

Length of output: 47


Replace all uuid.MustParse calls with error-safe parsing in plugin conversion functions.

Plugin-provided UUIDs in ExternalRole, ExternalUser, ExternalGroup, ExternalUserGroup, ExternalConfigAccess, and ExternalConfigAccessLog conversion functions can crash the process if malformed. Use uuid.Parse with error handling instead.

Affected lines: 403, 406, 437, 440, 477, 480, 501, 502, 544, 547, 551, 555, 559, 596, 599

Example fix for line 403:

-	if p.Id != "" {
-		role.ID = uuid.MustParse(p.Id)
-	}
+	if p.Id != "" {
+		if id, err := uuid.Parse(p.Id); err == nil {
+			role.ID = id
+		} else {
+			logger.Warnf("invalid external role id %q: %v", p.Id, err)
+		}
+	}

Apply the same pattern to all 15 uuid.MustParse occurrences in the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/convert.go` around lines 396 - 409, The conversion functions
(e.g., protoToExternalRole, protoToExternalUser, protoToExternalGroup,
protoToExternalUserGroup, protoToExternalConfigAccess,
protoToExternalConfigAccessLog) currently call uuid.MustParse which will panic
on malformed plugin input; replace each uuid.MustParse call with uuid.Parse,
check the returned error, and only set the model UUID fields when parsing
succeeds (for pointer fields set to &id on success, for value fields set id on
success); on parse error either leave the field unset/nil or log the invalid
UUID (using existing logger) before continuing so malformed plugin-provided IDs
do not crash the process.

Comment thread api/plugin/grpc_plugin.go Outdated
Comment on lines +29 to +37
func (p *GRPCPlugin) GRPCClient(ctx context.Context, broker *plugin.GRPCBroker, c *grpc.ClientConn) (interface{}, error) {
hostConn, err := broker.Dial(1)
if err != nil {
return nil, err
}
return &grpcClient{
client: pb.NewScraperPluginClient(c),
hostClient: pb.NewHostServicesClient(hostConn),
}, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether host-service client is actually used and where it's registered.
rg -nP '\bhostClient\b' --type go api/plugin
rg -nP 'RegisterHostServices|HostServicesServer|NewHostServicesClient|broker\.Dial\(' --type go api/plugin

Repository: flanksource/config-db

Length of output: 3149


🏁 Script executed:

cat -n api/plugin/grpc_plugin.go

Repository: flanksource/config-db

Length of output: 3268


🏁 Script executed:

rg -nP '\bhostClient\b' --type go --context 3 api/plugin/grpc_plugin.go

Repository: flanksource/config-db

Length of output: 438


🏁 Script executed:

rg -nP 'grpcClient|ScraperPlugin' --type go api/plugin | head -40

Repository: flanksource/config-db

Length of output: 4141


Remove unused hostClient field or implement missing host service calls.

The hostClient is initialized via broker.Dial(1) at line 30 but never consumed by any grpcClient method. This creates unused gRPC client and an unclosed connection that leaks resources. Either remove the field and the dial call, or implement the intended host service integration if this was incomplete.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/grpc_plugin.go` around lines 29 - 37, The GRPCClient
implementation dials a host connection and sets grpcClient.hostClient (via
broker.Dial(1) and hostConn) but never uses or closes it, leaking resources;
remove the unused hostClient and the broker.Dial call: update
GRPCPlugin.GRPCClient to stop calling broker.Dial(1) and construct
&grpcClient{client: pb.NewScraperPluginClient(c)} only, and also remove the
hostClient field from the grpcClient struct (and any related imports/fields) so
there is no unused client or open connection; alternatively, if host services
are required, ensure the hostConn is stored, hostClient methods are invoked from
grpcClient methods, and hostConn is closed when no longer needed.

Comment thread api/plugin/types.go Outdated
Comment on lines +35 to +70
func EnvVarFromDuty(e types.EnvVar) EnvVar {
ev := EnvVar{
ValueStatic: e.ValueStatic,
}
if e.ValueFrom != nil && e.ValueFrom.ConfigMapKeyRef != nil {
ev.ValueFromConfigMapKeyRefName = e.ValueFrom.ConfigMapKeyRef.Name
ev.ValueFromConfigMapKeyRefKey = e.ValueFrom.ConfigMapKeyRef.Key
}
if e.ValueFrom != nil && e.ValueFrom.SecretKeyRef != nil {
ev.ValueFromSecretKeyRefName = e.ValueFrom.SecretKeyRef.Name
ev.ValueFromSecretKeyRefKey = e.ValueFrom.SecretKeyRef.Key
}
return ev
}

func (e EnvVar) ToDuty() types.EnvVar {
ev := types.EnvVar{
ValueStatic: e.ValueStatic,
}
if e.ValueFromConfigMapKeyRefName != "" || e.ValueFromSecretKeyRefName != "" {
ev.ValueFrom = &types.EnvVarSource{}
}
if e.ValueFromConfigMapKeyRefName != "" {
ev.ValueFrom.ConfigMapKeyRef = &types.ConfigMapKeySelector{
LocalObjectReference: types.LocalObjectReference{Name: e.ValueFromConfigMapKeyRefName},
Key: e.ValueFromConfigMapKeyRefKey,
}
}
if e.ValueFromSecretKeyRefName != "" {
ev.ValueFrom.SecretKeyRef = &types.SecretKeySelector{
LocalObjectReference: types.LocalObjectReference{Name: e.ValueFromSecretKeyRefName},
Key: e.ValueFromSecretKeyRefKey,
}
}
return ev
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find where types.EnvVar is defined
echo "=== Searching for types.EnvVar definition ==="
rg -n "type EnvVar struct" --type go -A 10

# Also search for EnvVar imports to understand the source
echo -e "\n=== Searching for EnvVar imports ==="
rg -n "import.*EnvVar|types\.EnvVar" --type go api/plugin/types.go

Repository: flanksource/config-db

Length of output: 864


🏁 Script executed:

#!/bin/bash
# Let's check the current api/plugin/types.go file to see the actual code
echo "=== Checking api/plugin/types.go ==="
cat -n api/plugin/types.go | head -80

Repository: flanksource/config-db

Length of output: 2814


🏁 Script executed:

#!/bin/bash
# Search for the duty types package location
echo "=== Locating duty types package ==="
fd -e go -x rg -l "package.*types|type EnvVar" {} \; | head -20
rg -n "type EnvVar" --type go -B 2 -A 8

Repository: flanksource/config-db

Length of output: 3177


EnvVarFromDuty and ToDuty silently drop the Name field.

The local EnvVar struct (line 26) includes a Name field, but EnvVarFromDuty (line 35) never copies e.Name to the returned struct, and ToDuty (line 50) never restores it. This causes data loss on any round-trip conversion through the plugin boundary, breaking any caller that depends on preserving environment variable names.

Fix
 func EnvVarFromDuty(e types.EnvVar) EnvVar {
 	ev := EnvVar{
+		Name:        e.Name,
 		ValueStatic: e.ValueStatic,
 	}
 func (e EnvVar) ToDuty() types.EnvVar {
 	ev := types.EnvVar{
+		Name:        e.Name,
 		ValueStatic: e.ValueStatic,
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func EnvVarFromDuty(e types.EnvVar) EnvVar {
ev := EnvVar{
ValueStatic: e.ValueStatic,
}
if e.ValueFrom != nil && e.ValueFrom.ConfigMapKeyRef != nil {
ev.ValueFromConfigMapKeyRefName = e.ValueFrom.ConfigMapKeyRef.Name
ev.ValueFromConfigMapKeyRefKey = e.ValueFrom.ConfigMapKeyRef.Key
}
if e.ValueFrom != nil && e.ValueFrom.SecretKeyRef != nil {
ev.ValueFromSecretKeyRefName = e.ValueFrom.SecretKeyRef.Name
ev.ValueFromSecretKeyRefKey = e.ValueFrom.SecretKeyRef.Key
}
return ev
}
func (e EnvVar) ToDuty() types.EnvVar {
ev := types.EnvVar{
ValueStatic: e.ValueStatic,
}
if e.ValueFromConfigMapKeyRefName != "" || e.ValueFromSecretKeyRefName != "" {
ev.ValueFrom = &types.EnvVarSource{}
}
if e.ValueFromConfigMapKeyRefName != "" {
ev.ValueFrom.ConfigMapKeyRef = &types.ConfigMapKeySelector{
LocalObjectReference: types.LocalObjectReference{Name: e.ValueFromConfigMapKeyRefName},
Key: e.ValueFromConfigMapKeyRefKey,
}
}
if e.ValueFromSecretKeyRefName != "" {
ev.ValueFrom.SecretKeyRef = &types.SecretKeySelector{
LocalObjectReference: types.LocalObjectReference{Name: e.ValueFromSecretKeyRefName},
Key: e.ValueFromSecretKeyRefKey,
}
}
return ev
}
func EnvVarFromDuty(e types.EnvVar) EnvVar {
ev := EnvVar{
Name: e.Name,
ValueStatic: e.ValueStatic,
}
if e.ValueFrom != nil && e.ValueFrom.ConfigMapKeyRef != nil {
ev.ValueFromConfigMapKeyRefName = e.ValueFrom.ConfigMapKeyRef.Name
ev.ValueFromConfigMapKeyRefKey = e.ValueFrom.ConfigMapKeyRef.Key
}
if e.ValueFrom != nil && e.ValueFrom.SecretKeyRef != nil {
ev.ValueFromSecretKeyRefName = e.ValueFrom.SecretKeyRef.Name
ev.ValueFromSecretKeyRefKey = e.ValueFrom.SecretKeyRef.Key
}
return ev
}
func (e EnvVar) ToDuty() types.EnvVar {
ev := types.EnvVar{
Name: e.Name,
ValueStatic: e.ValueStatic,
}
if e.ValueFromConfigMapKeyRefName != "" || e.ValueFromSecretKeyRefName != "" {
ev.ValueFrom = &types.EnvVarSource{}
}
if e.ValueFromConfigMapKeyRefName != "" {
ev.ValueFrom.ConfigMapKeyRef = &types.ConfigMapKeySelector{
LocalObjectReference: types.LocalObjectReference{Name: e.ValueFromConfigMapKeyRefName},
Key: e.ValueFromConfigMapKeyRefKey,
}
}
if e.ValueFromSecretKeyRefName != "" {
ev.ValueFrom.SecretKeyRef = &types.SecretKeySelector{
LocalObjectReference: types.LocalObjectReference{Name: e.ValueFromSecretKeyRefName},
Key: e.ValueFromSecretKeyRefKey,
}
}
return ev
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/plugin/types.go` around lines 35 - 70, The conversion functions
EnvVarFromDuty and (EnvVar) ToDuty are dropping the EnvVar.Name field; update
EnvVarFromDuty to copy e.Name into the returned EnvVar.Name and update (EnvVar)
ToDuty to set types.EnvVar.Name from e.Name so the name round-trips correctly;
locate the conversions in EnvVarFromDuty and ToDuty and assign the Name field on
both sides (EnvVar and types.EnvVar) while leaving the existing ValueStatic and
ValueFrom handling intact.

Comment thread scrapers/aws/cloudtrail.go Outdated
"github.com/samber/lo"

v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the types that must be exported from the new api package

echo "=== Checking api package structure ==="
find api -name "*.go" -type f | head -20

echo -e "\n=== Searching for type/func definitions in api package (excluding api/v1) ==="
rg -n "^(type|func)\s+(AWS|ScrapeResults|ChangeResult|NewJSON)" api --type-list | head -50 || echo "Using alternative search..."

echo -e "\n=== Alternative search for AWS type ==="
rg -n "type\s+AWS\s" api --type go

echo -e "\n=== Alternative search for ScrapeResults type ==="
rg -n "type\s+ScrapeResults\s" api --type go

echo -e "\n=== Alternative search for ChangeResult type ==="
rg -n "type\s+ChangeResult\s" api --type go

echo -e "\n=== Alternative search for NewJSON function ==="
rg -n "func\s+NewJSON" api --type go

echo -e "\n=== Check if api/v1 still exists ==="
ls -la api/v1 2>/dev/null | head -5 || echo "api/v1 directory not found (or no permissions)"

echo -e "\n=== List files in api package root ==="
ls -la api/*.go 2>/dev/null | head -20 || echo "No .go files in api root"

Repository: flanksource/config-db

Length of output: 1698


🏁 Script executed:

#!/bin/bash
# Check if api package root has any re-export files

echo "=== Files directly in api/ root ==="
find api -maxdepth 1 -name "*.go" -type f

echo -e "\n=== Check api/go.mod or package declaration ==="
head -20 api/*.go 2>/dev/null || echo "No .go files in api/"

echo -e "\n=== Search for re-exports of AWS, ScrapeResults, ChangeResult, NewJSON ==="
rg -n "^\s*(AWS|ScrapeResults|ChangeResult|NewJSON)" api --type go -B 2 | grep -v "^api/v1" | head -30 || echo "No re-exports found outside api/v1"

echo -e "\n=== Check if scrapers/aws/cloudtrail.go is the only file with new import ==="
rg -l "github.com/flanksource/config-db/api\"$" . --type go | grep -v "api/v1" | head -20

Repository: flanksource/config-db

Length of output: 949


🏁 Script executed:

#!/bin/bash
# Look for initialization files or re-export patterns

echo "=== Search for api.go, init.go, types.go in api root ==="
find api -maxdepth 1 \( -name "*.go" -o -name "go.mod" \) -type f

echo -e "\n=== Check git status to see what changed ==="
cd scrapers/aws && git diff cloudtrail.go 2>/dev/null | head -40

echo -e "\n=== Look at the import block of cloudtrail.go ==="
head -30 scrapers/aws/cloudtrail.go

echo -e "\n=== Check if api/v1 package is re-exported via api package somehow ==="
cd api && cat go.mod 2>/dev/null | head -20 || echo "No go.mod in api"

echo -e "\n=== Verify that types in api/v1 are still there ==="
head -20 api/v1/aws.go

Repository: flanksource/config-db

Length of output: 591


🏁 Script executed:

#!/bin/bash
# Find the actual file paths

echo "=== Search for cloudtrail.go ==="
find . -name "cloudtrail.go" -type f

echo -e "\n=== Search for api directory ==="
find . -maxdepth 2 -name "api" -type d

echo -e "\n=== List repository structure ==="
ls -la

Repository: flanksource/config-db

Length of output: 2704


🏁 Script executed:

#!/bin/bash
# Now examine the actual file and api structure

echo "=== View cloudtrail.go imports (first 30 lines) ==="
head -30 ./scrapers/aws/cloudtrail.go

echo -e "\n=== List api/ directory contents ==="
find ./api -type f -name "*.go" | sort

echo -e "\n=== List pkg/api/ directory contents ==="
find ./pkg/api -type f -name "*.go" 2>/dev/null | sort || echo "pkg/api has no .go files"

echo -e "\n=== Search for AWS type in api/ ==="
grep -rn "^type AWS" ./api --include="*.go"

echo -e "\n=== Search for ScrapeResults type in api/ ==="
grep -rn "^type ScrapeResults" ./api --include="*.go"

echo -e "\n=== Search for ChangeResult type in api/ ==="
grep -rn "^type ChangeResult" ./api --include="*.go"

echo -e "\n=== Search for NewJSON function in api/ ==="
grep -rn "^func NewJSON" ./api --include="*.go"

Repository: flanksource/config-db

Length of output: 2414


🏁 Script executed:

#!/bin/bash
# Check if the api/v1 package is actually part of the api package namespace

echo "=== Check package declaration in api/v1/aws.go ==="
head -15 ./api/v1/aws.go

echo -e "\n=== Check if there's any api.go or init.go in api/ root ==="
ls -la ./api/ | grep -E "\.(go|mod)$"

echo -e "\n=== Check api/v1 groupversion_info.go (likely contains package meta) ==="
cat ./api/v1/groupversion_info.go

echo -e "\n=== Verify: does api package exist as a module/package? ==="
grep -r "^package api[^v]" ./api --include="*.go" | head -5

echo -e "\n=== Check for any import remapping in api/v1 ==="
grep -n "import" ./api/v1/aws.go | head -10

Repository: flanksource/config-db

Length of output: 1823


🏁 Script executed:

#!/bin/bash
# Critical check: verify if api package even exists or if this is a module structure issue

echo "=== Check if api is a Go module itself ==="
cat ./go.mod | grep -A 5 "^module"

echo -e "\n=== Search for 'package api' (not v1) anywhere ==="
grep -rn "^package api$" . --include="*.go" | grep -v "/v1/" | head -20

echo -e "\n=== Check if api/ has a go.mod file ==="
ls -la ./api/go.mod 2>/dev/null || echo "No go.mod in api/"

echo -e "\n=== Try to understand the api directory structure ==="
tree -L 2 ./api/ 2>/dev/null || find ./api -maxdepth 2 -type d

echo -e "\n=== Final verification: what package does cloudtrail.go actually get with 'v1'? ==="
echo "The import 'v1 \"github.com/flanksource/config-db/api\"' would resolve to:"
echo "- Files in api/ root with 'package api'"
echo "- OR api/api.go with 'package api'"
echo "Let's check what exists:"
ls -la ./api/*.go 2>/dev/null || echo "NO .go FILES IN api/ ROOT"

Repository: flanksource/config-db

Length of output: 1688


🏁 Script executed:

#!/bin/bash
# Final verification: this import should fail at compile time

echo "=== Confirm: there is NO 'package api' in api/ directory ==="
find ./api -name "*.go" -type f -exec grep -l "^package api$" {} \;

echo -e "\n=== Verify package declarations in api/ ==="
find ./api -maxdepth 1 -name "*.go" -type f -exec head -1 {} \;
echo "(None found above = no .go files in api/ root)"

echo -e "\n=== The types ARE in api/v1 with 'package v1' ==="
head -1 ./api/v1/aws.go
head -1 ./api/v1/interface.go
head -1 ./api/v1/json_types.go

echo -e "\n=== Conclusion: Import path mismatch ==="
echo "Current import in cloudtrail.go: v1 \"github.com/flanksource/config-db/api\""
echo "Actual package location: ./api/v1/ (package v1)"
echo "This import will FAIL because api/ has no Go package (no .go files with 'package api')"
echo ""
echo "Fix should be either:"
echo "1. Import should be: v1 \"github.com/flanksource/config-db/api/v1\""
echo "2. OR api/ root needs a package with re-exports from api/v1"

Repository: flanksource/config-db

Length of output: 702


The import path is incorrect and will cause a compilation error.

The import was changed to v1 "github.com/flanksource/config-db/api", but the api package doesn't exist—there are no .go files in the api/ root directory. The types (AWS, ScrapeResults, ChangeResult, NewJSON) are all defined in api/v1/ with package v1 declaration. Go will fail to resolve the import.

The import must either be reverted to v1 "github.com/flanksource/config-db/api/v1" or the api/ package root must be populated with re-exports from api/v1/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/aws/cloudtrail.go` at line 15, The import alias v1 is referencing a
non-existent package root; update the import to point to the actual package that
defines the types (change the import from v1
"github.com/flanksource/config-db/api" to v1
"github.com/flanksource/config-db/api/v1") so symbols like AWS, ScrapeResults,
ChangeResult and NewJSON resolve; alternatively add re-exports in the api
package root, but the quick fix is to correct the import path where v1 is used.

Comment thread scrapers/openssf/client.go Outdated
Comment on lines +81 to +86
if attempt > 0 {
backoff := time.Duration(attempt*attempt) * time.Second
c.ctx.Tracef("retrying scorecard request for %s (attempt %d/%d) after %v",
repoFullName, attempt+1, maxRetries, backoff)
time.Sleep(backoff)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Backoff sleep ignores context cancellation.

Use a context-aware wait to avoid blocking shutdown/timeout paths during retries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/client.go` around lines 81 - 86, The retry backoff currently
calls time.Sleep(backoff) which ignores cancellation; replace that sleep with a
context-aware wait (select on time.After(backoff) and the request/context Done
channel) so retries abort promptly on cancel/timeout. Locate the retry block
where repoFullName, attempt, maxRetries and backoff are computed (the
c.ctx.Tracef log and time.Sleep call) and change the sleep to a select that
returns/propagates the context error if the context is cancelled instead of
blocking the goroutine.

Comment thread scrapers/openssf/client.go Outdated
Comment on lines +122 to +129
if resp.StatusCode >= 500 {
return nil, fmt.Errorf("server error: %d", resp.StatusCode)
}

if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return nil, fmt.Errorf("unexpected status %d: %s", resp.StatusCode, string(body))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HTTP 429 is treated as non-retryable.

fetchScorecard returns a generic non-OK error for rate limiting, and isRetryable won’t match it, so transient throttling fails fast instead of retrying.

🔁 Proposed fix
 	if resp.StatusCode >= 500 {
 		return nil, fmt.Errorf("server error: %d", resp.StatusCode)
 	}
+	if resp.StatusCode == http.StatusTooManyRequests {
+		return nil, fmt.Errorf("rate limited: retry-after=%s", resp.Header.Get("Retry-After"))
+	}
@@
-	return contains(errStr, "server error") ||
-	       contains(errStr, "timeout") ||
-	       contains(errStr, "connection")
+	return contains(errStr, "server error") ||
+		contains(errStr, "rate limited") ||
+		contains(errStr, "timeout") ||
+		contains(errStr, "connection")

Also applies to: 149-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/client.go` around lines 122 - 129, The fetchScorecard HTTP
response handling treats 429 as a generic non-OK error so isRetryable won't
catch rate-limits; update the response checks in fetchScorecard (and the similar
block used around the other status handling) to explicitly treat
http.StatusTooManyRequests (429) as a retryable error — e.g., return an error
value or type that is recognized by isRetryable when resp.StatusCode ==
http.StatusTooManyRequests, and preserve the existing body/error text for
logging; ensure the same change is applied to the similar status-check block
referenced near the second occurrence so throttling triggers retries.

Comment thread scrapers/openssf/scorecard.go Outdated
)

// LastScorecardScrapeTime tracks the last assessment date for each repository to avoid redundant API calls
var LastScorecardScrapeTime = sync.Map{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Package-level LastScorecardScrapeTime bleeds state across scraper instances and test runs.

A process-global sync.Map means:

  1. Tests cannot run in parallel without cross-contaminating cache state.
  2. Changing ScorecardCacheTTL in config has no effect at runtime since the map persists.

Consider scoping the cache per-scraper config ID or making it injectable for testing.

Would you like me to open a new issue to track the per-scraper-config scoping of this cache?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` at line 22, The package-level sync.Map
LastScorecardScrapeTime currently holds global state; change it to be scoped
per-scraper instance or injectable so cache doesn't leak across tests or config
changes: add a field (e.g., lastScrape sync.Map or map[string]time.Time with
mutex) to the scraper struct used to run scorecard scrapes (or accept a Cache
interface via the scraper's constructor) and update all uses of
LastScorecardScrapeTime to reference that instance field; ensure the cache
honors ScorecardCacheTTL from the scraper's config and write tests that inject a
fresh cache to avoid cross-test contamination.

Comment thread scrapers/openssf/scorecard.go Outdated
Comment on lines +53 to +57
if config.MinScore != nil && scorecard.Score < float64(*config.MinScore) {
ctx.Debugf("skipping %s: score %.1f below minimum %d",
repoFullName, scorecard.Score, *config.MinScore)
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Repos filtered by MinScore don't update the cache, causing repeated unnecessary API calls.

When scorecard.Score < float64(*config.MinScore), the function skips the LastScorecardScrapeTime.Store(...) call on line 59. On every subsequent scrape run, shouldSkipScorecardFetch will return false (cache miss), triggering a fresh API call that will again be thrown away. Move the store before the filter check.

🐛 Proposed fix
 		scorecard, err := client.GetRepositoryScorecard(ctx, repoConfig.Owner, repoConfig.Repo)
 		if err != nil {
 			results.Errorf(err, "failed to scrape repository %s", repoFullName)
 			continue
 		}

+		LastScorecardScrapeTime.Store(repoFullName, time.Now())
+
 		if config.MinScore != nil && scorecard.Score < float64(*config.MinScore) {
 			ctx.Debugf("skipping %s: score %.1f below minimum %d",
 				repoFullName, scorecard.Score, *config.MinScore)
 			continue
 		}

-		LastScorecardScrapeTime.Store(repoFullName, time.Now())
 		ctx.Tracef("stored last scorecard scrape time for %s: %v", repoFullName, scorecard.Date)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 53 - 57, The cache isn't updated
for repos filtered out by MinScore leading to repeated API calls; in the loop
handling scorecard results (referencing scorecard.Score and config.MinScore)
call the cache update (LastScorecardScrapeTime.Store or the existing scrape-time
storage used before) immediately after receiving the scorecard for repoFullName
and before the MinScore check so that shouldSkipScorecardFetch will hit on
subsequent runs; move the Store(...) invocation to precede the if that checks
config.MinScore and retains the existing timestamp/value you store now.

Comment thread scrapers/stale.go Outdated

"github.com/flanksource/commons/duration"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find where DeletedReasonStale is defined and used
echo "=== Searching for DeletedReasonStale definitions and usages ==="
rg -n "DeletedReasonStale" --type go -B 2 -A 2

echo ""
echo "=== Checking directory structure of api package ==="
find . -type d -name api | head -20

Repository: flanksource/config-db

Length of output: 1228


🏁 Script executed:

#!/bin/bash
# Check if root api package re-exports v1 symbols
echo "=== Contents of root api directory ==="
ls -la ./api/

echo ""
echo "=== Checking for imports/exports in api/*.go ==="
rg -n "v1" ./api --type go -m 5

Repository: flanksource/config-db

Length of output: 4149


🏁 Script executed:

#!/bin/bash
# Check for any root-level api package files
echo "=== Files directly in api directory (not subdirs) ==="
find ./api -maxdepth 1 -type f -name "*.go"

echo ""
echo "=== Check api package initialization or re-exports ==="
rg -n "package api" --type go -B 2 -A 5 | head -40

Repository: flanksource/config-db

Length of output: 772


🏁 Script executed:

#!/bin/bash
# Look at what api/plugin/convert.go imports and uses
echo "=== Content of api/plugin/convert.go (first 50 lines) ==="
head -50 ./api/plugin/convert.go

echo ""
echo "=== Check if there's any Go file in api/ root with hidden files ==="
find ./api -maxdepth 1 -name "*.go" -o -name "doc.go"

echo ""
echo "=== Search for where ScrapeResult is defined ==="
rg -n "type ScrapeResult" --type go

Repository: flanksource/config-db

Length of output: 1649


🏁 Script executed:

#!/bin/bash
# Check if api package has any init or re-export mechanism
echo "=== Check all files in api directory recursively ==="
find ./api -type f -name "*.go" | sort

echo ""
echo "=== Verify: is api/ an empty package? ==="
ls -la ./api/*.go 2>/dev/null || echo "No .go files in root api/"

echo ""
echo "=== Confirm: verify stale.go import path in actual file ==="
head -15 ./scrapers/stale.go

Repository: flanksource/config-db

Length of output: 1428


This import change will cause a build failure.

The root api directory contains no Go files—all code resides in api/v1/. Changing the import from v1 "github.com/flanksource/config-db/api/v1" to v1 "github.com/flanksource/config-db/api" means the alias v1 now points to an empty package. The usages of v1.DeletedReasonStale at lines 58 and 68 will fail to compile because DeletedReasonStale is defined only in api/v1/types.go and is not accessible from the root api package.

The import must remain v1 "github.com/flanksource/config-db/api/v1".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/stale.go` at line 8, The import was changed to the root package
which is empty; revert the import to the versioned package so the symbols like
DeletedReasonStale are available: change the import alias back to v1
"github.com/flanksource/config-db/api/v1" in stale.go so uses of
v1.DeletedReasonStale (and any other v1 types) compile.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scrapers/config_access_test.go`:
- Around line 7-8: Remove the duplicate import of
"github.com/flanksource/config-db/api" aliased as v1 (keep the single import as
api) and update all usages of v1.ScrapeConfig to api.ScrapeConfig (replace the
four occurrences of v1.ScrapeConfig in this file). This eliminates the
redeclared import error by using the existing api import and consistent type
references.

In `@scrapers/external_entities_test.go`:
- Around line 4-5: Remove the duplicate import of
"github.com/flanksource/config-db/api" that uses the alias v1 and keep the
single import as api; then update all usages of v1.ScrapeConfig to
api.ScrapeConfig (occurrences include the references at the top-level imports
and the tests that currently use v1.ScrapeConfig) so there are no duplicate
imports or redeclared aliases and the code compiles cleanly.

In `@scrapers/helpers_test.go`:
- Around line 10-11: Remove the duplicate unaliased import of
"github.com/flanksource/config-db/api" from the import block and keep the
aliased import v1; then update the single occurrence of the bare api identifier
to use the v1 alias (replace api.XXX with v1.XXX where used, e.g. the test
reference that currently uses api) so the import is only declared once and
references the v1 alias consistently.

In `@scrapers/kubernetes_scraper_test.go`:
- Line 11: The file currently imports the same package twice as both the default
name (api) and alias (v1); remove the duplicate import (either the `v1
"github.com/flanksource/config-db/api"` line or the default
`github.com/flanksource/config-db/api` import) and then update all references
consistently (replace all `v1.` occurrences with `api.` if you remove the alias,
or replace all `api.` occurrences with `v1.` if you keep the alias) so there are
no duplicate import paths and all symbols compile.

In `@scrapers/plugins_test.go`:
- Around line 8-10: Remove the duplicated import of
github.com/flanksource/config-db/api (the unaliased `api`) and consolidate to
the existing `v1` alias; then replace all uses of the `api.` identifier in this
file with `v1.` (the occurrences currently using `api.` around the test
helper/initialization) so the file only imports the package once as `v1`.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7eed272 and 4037510.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • db/update.go
  • go.mod
  • scrapers/aws/cloudtrail.go
  • scrapers/config_access_test.go
  • scrapers/external_entities_test.go
  • scrapers/helpers_test.go
  • scrapers/kubernetes/kubernetes.go
  • scrapers/kubernetes_scraper_test.go
  • scrapers/plugins_test.go
  • scrapers/processors/json.go
  • scrapers/system/system.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • scrapers/system/system.go
  • scrapers/aws/cloudtrail.go
  • go.mod
  • scrapers/processors/json.go
  • db/update.go

Comment thread scrapers/config_access_test.go Outdated
Comment on lines +7 to +8
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate import of the same package path breaks compilation.

Line 7 already imports github.com/flanksource/config-db/api under its default package name api. Adding the same path again with alias v1 on line 8 is not permitted by the Go compiler and is the direct cause of the v1 redeclared in this block build failure.

The fix is to drop the redundant v1 alias import and replace the four v1.ScrapeConfig references in this file with api.ScrapeConfig.

🐛 Proposed fix
 	"github.com/flanksource/config-db/api"
-	v1 "github.com/flanksource/config-db/api"

Then update every usage of v1.ScrapeConfigapi.ScrapeConfig (lines 21, 140, 295, 550):

-	var scrapeConfig v1.ScrapeConfig
+	var scrapeConfig api.ScrapeConfig
🧰 Tools
🪛 GitHub Actions: Lint

[error] 8-8: v1 redeclared in this block

🪛 GitHub Actions: Test

[error] 8-8: vet: scrapers/config_access_test.go:8:2: v1 redeclared in this block

🪛 GitHub Check: lint

[failure] 7-7:
other declaration of v1


[failure] 8-8:
v1 redeclared in this block

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/config_access_test.go` around lines 7 - 8, Remove the duplicate
import of "github.com/flanksource/config-db/api" aliased as v1 (keep the single
import as api) and update all usages of v1.ScrapeConfig to api.ScrapeConfig
(replace the four occurrences of v1.ScrapeConfig in this file). This eliminates
the redeclared import error by using the existing api import and consistent type
references.

Comment thread scrapers/external_entities_test.go Outdated
Comment on lines +4 to +5
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate import of the same package — compile error.

Lines 4 and 5 both import github.com/flanksource/config-db/api; Go rejects duplicate import paths in the same block (confirmed by the linter: "v1 redeclared in this block"). The v1 alias is only needed for v1.ScrapeConfig; since api resolves to the same package, simply drop the duplicate and use api.ScrapeConfig everywhere.

🐛 Proposed fix
 import (
 	"github.com/flanksource/config-db/api"
-	v1 "github.com/flanksource/config-db/api"
 	"github.com/flanksource/config-db/db"
 	"github.com/flanksource/config-db/db/models"
 	dutymodels "github.com/flanksource/duty/models"
 	"github.com/google/uuid"
 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
 	"github.com/samber/lo"
 	k8sTypes "k8s.io/apimachinery/pkg/types"
 )

Then replace every v1.ScrapeConfig occurrence (lines 17, 142, 229, 334) with api.ScrapeConfig:

-	var scrapeConfig v1.ScrapeConfig
+	var scrapeConfig api.ScrapeConfig
🧰 Tools
🪛 GitHub Check: lint

[failure] 4-4:
other declaration of v1


[failure] 5-5:
v1 redeclared in this block

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/external_entities_test.go` around lines 4 - 5, Remove the duplicate
import of "github.com/flanksource/config-db/api" that uses the alias v1 and keep
the single import as api; then update all usages of v1.ScrapeConfig to
api.ScrapeConfig (occurrences include the references at the top-level imports
and the tests that currently use v1.ScrapeConfig) so there are no duplicate
imports or redeclared aliases and the code compiles cleanly.

Comment thread scrapers/helpers_test.go Outdated
Comment on lines +10 to +11
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate import of the same package path causes a compile error.

Lines 10 and 11 both import "github.com/flanksource/config-db/api" — Go forbids importing the same path twice in one block. The lint check confirms: v1 redeclared in this block. Remove the unaliased import and update the one call site that used the bare api identifier.

🐛 Proposed fix
 import (
 	gocontext "context"
 	"encoding/json"
 	"fmt"
 	"math/rand/v2"
 	"os"

-	"github.com/flanksource/config-db/api"
 	v1 "github.com/flanksource/config-db/api"
 	"github.com/flanksource/config-db/db/models"
 	. "github.com/onsi/ginkgo/v2"
 	apiv1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )

Update the only usage of the bare api identifier (line 58) to use the v1 alias:

-func GetConfigItemFromID(ctx api.ScrapeContext, id string) (*models.ConfigItem, error) {
+func GetConfigItemFromID(ctx v1.ScrapeContext, id string) (*models.ConfigItem, error) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
import (
gocontext "context"
"encoding/json"
"fmt"
"math/rand/v2"
"os"
v1 "github.com/flanksource/config-db/api"
"github.com/flanksource/config-db/db/models"
. "github.com/onsi/ginkgo/v2"
apiv1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
Suggested change
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
func GetConfigItemFromID(ctx v1.ScrapeContext, id string) (*models.ConfigItem, error) {
🧰 Tools
🪛 GitHub Check: lint

[failure] 10-10:
other declaration of v1


[failure] 11-11:
v1 redeclared in this block

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/helpers_test.go` around lines 10 - 11, Remove the duplicate
unaliased import of "github.com/flanksource/config-db/api" from the import block
and keep the aliased import v1; then update the single occurrence of the bare
api identifier to use the v1 alias (replace api.XXX with v1.XXX where used, e.g.
the test reference that currently uses api) so the import is only declared once
and references the v1 alias consistently.

Comment thread scrapers/kubernetes_scraper_test.go Outdated
"github.com/flanksource/commons/logger"
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n scrapers/kubernetes_scraper_test.go | head -20

Repository: flanksource/config-db

Length of output: 733


🏁 Script executed:

rg "api\.|v1\." scrapers/kubernetes_scraper_test.go

Repository: flanksource/config-db

Length of output: 1711


🏁 Script executed:

rg "apiv1\." scrapers/kubernetes_scraper_test.go -A 2 -B 2 | head -20

Repository: flanksource/config-db

Length of output: 592


🏁 Script executed:

rg "apiv1" scrapers/kubernetes_scraper_test.go | head -5

Repository: flanksource/config-db

Length of output: 220


🏁 Script executed:

# Check if there are other files in scrapers/ that might import apiv1
rg "apiv1\s*=" scrapers/

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

sed -n '3,30p' scrapers/kubernetes_scraper_test.go

Repository: flanksource/config-db

Length of output: 865


Duplicate import path will fail compilation.

Line 11 re-imports github.com/flanksource/config-db/api, which is already imported on Line 10. Go does not allow duplicate import paths in a single file. Keep one import alias and update all references (v1.* or api.*) consistently throughout the file.

Fix guidance

Either:

  1. Remove line 11 and replace all v1. usages with api.
  2. Remove the default import on line 10 and update all api. usages to v1.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/kubernetes_scraper_test.go` at line 11, The file currently imports
the same package twice as both the default name (api) and alias (v1); remove the
duplicate import (either the `v1 "github.com/flanksource/config-db/api"` line or
the default `github.com/flanksource/config-db/api` import) and then update all
references consistently (replace all `v1.` occurrences with `api.` if you remove
the alias, or replace all `api.` occurrences with `v1.` if you keep the alias)
so there are no duplicate import paths and all symbols compile.

Comment thread scrapers/plugins_test.go
Comment on lines 8 to 10
"github.com/flanksource/config-db/api"
v1 "github.com/flanksource/config-db/api/v1"
v1 "github.com/flanksource/config-db/api"
"github.com/flanksource/config-db/db"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, read the import block and surrounding context
head -50 scrapers/plugins_test.go | cat -n

Repository: flanksource/config-db

Length of output: 1637


🏁 Script executed:

# Search for all occurrences of api. and v1. prefixes in the file
echo "=== api. usage ===" && rg 'api\.' scrapers/plugins_test.go | head -20
echo ""
echo "=== v1. usage ===" && rg 'v1\.' scrapers/plugins_test.go | head -20

Repository: flanksource/config-db

Length of output: 382


🏁 Script executed:

# Run the verification script to check for duplicate imports across all Go files
python - <<'PY'
import re, pathlib, sys
dups = []

for f in pathlib.Path(".").rglob("*.go"):
    txt = f.read_text(encoding="utf-8", errors="ignore")
    # import (...) blocks
    for m in re.finditer(r'import\s*\((.*?)\)', txt, re.S):
        paths = re.findall(r'"([^"]+)"', m.group(1))
        counts = {}
        for p in paths:
            counts[p] = counts.get(p, 0) + 1
        for p, c in counts.items():
            if c > 1:
                dups.append((str(f), p, c))

if dups:
    for f, p, c in dups:
        print(f"{f}: {p} imported {c} times in same import block")
    sys.exit(1)

print("No duplicate import paths in import blocks.")
PY

Repository: flanksource/config-db

Length of output: 596


Consolidate duplicated package imports to a single local name.

Lines 8-9 import the same package path github.com/flanksource/config-db/api under two different local names. The file uses api. on lines 24-25 and v1. elsewhere, creating unnecessary confusion and maintenance overhead.

Recommendation: Keep the v1 alias, remove line 8, and update the two api. references to v1..

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/plugins_test.go` around lines 8 - 10, Remove the duplicated import
of github.com/flanksource/config-db/api (the unaliased `api`) and consolidate to
the existing `v1` alias; then replace all uses of the `api.` identifier in this
file with `v1.` (the occurrences currently using `api.` around the test
helper/initialization) so the file only imports the package once as `v1`.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (16)
api/v1/types.go (2)

97-167: ApplyPlugin is missing loops for GitHubSecurity and OpenSSFScorecard.

All other scraper types have a corresponding BaseScraper.ApplyPlugins loop in ApplyPlugin, but the two new types are unhandled, so plugins (e.g., retention, relationships) will silently have no effect on them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/types.go` around lines 97 - 167, The ApplyPlugin method on ScraperSpec
fails to apply plugins to GitHubSecurity and OpenSSFScorecard scrapers; update
ScraperSpec.ApplyPlugin to include loops for spec.GitHubSecurity and
spec.OpenSSFScorecard (same pattern as other types) and call each item’s
BaseScraper = spec.<Type>[i].BaseScraper.ApplyPlugins(plugins...) so Retention
and other plugin effects are applied to those scrapers as well. Ensure you
follow the existing loop/assignment pattern used for types like
spec.GithubActions and spec.Azure.

15-31: AllScraperConfigs missing "githubsecurity" and "openssfscorecard" entries.

New scraper types are absent from the discovery map, which may break config-type-based lookups that rely on AllScraperConfigs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/types.go` around lines 15 - 31, AllScraperConfigs is missing entries
for the new scraper types; update the map named AllScraperConfigs to include the
"githubsecurity" and "openssfscorecard" keys mapped to their corresponding
config struct zero-values (e.g., GitHubSecurity{} and OpenSSFScorecard{} or
whatever the actual type names are) so discovery and type-based lookups can find
those scrapers; locate the AllScraperConfigs map and add those two key→struct
entries consistent with the existing naming style.
scrapers/openssf/client.go (2)

118-153: HTTP 429 is not treated as retryable.

fetchScorecard returns a generic error for 429 (falls through to the unexpected status branch), and isRetryable only matches "server error", "timeout", and "connection", so rate-limit responses are never retried.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/client.go` around lines 118 - 153, The fetchScorecard path
currently treats 429 as an unexpected status; update the HTTP handling in the
fetchScorecard function to explicitly detect http.StatusTooManyRequests and
return a distinct error (e.g., containing "rate limited" or "too many requests")
and optionally include the Retry-After header value if present, and then update
ScorecardClient.isRetryable to consider that error by checking for "rate
limit"/"too many requests"/"429" in err.Error() in addition to the existing
"server error", "timeout", and "connection" checks so 429 responses will be
retried.

80-86: Backoff sleep ignores context cancellation.

time.Sleep(backoff) blocks unconditionally during retries, preventing timely shutdown or timeout propagation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/client.go` around lines 80 - 86, The retry backoff currently
calls time.Sleep(backoff) which blocks and ignores cancellation; update the
retry loop in scrapers/openssf/client.go so the sleep uses a cancellable wait
(select on time.After(backoff) and the request/context cancellation) instead of
time.Sleep. Specifically, inside the for-attempt loop where c.ctx.Logger is
used, replace the unconditional time.Sleep(backoff) with a select that waits on
time.After(backoff) and on c.ctx.Done() (or the request context field you use),
and when the context is done return or propagate the context error so retries
stop promptly.
api/context.go (1)

161-167: Nil-dereference on scrapeConfig in IsTrace() and IsDebug().

Both methods dereference ctx.scrapeConfig.Spec without guarding against a nil scrapeConfig, unlike ScraperID() which already performs this check. Calling either before WithScrapeConfig() will panic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/context.go` around lines 161 - 167, The IsTrace and IsDebug methods
dereference ctx.scrapeConfig.Spec without a nil check; update both IsTrace() and
IsDebug() to guard for ctx.scrapeConfig == nil (like ScraperID() does) and
return a safe default (false) when nil, otherwise call
ctx.scrapeConfig.Spec.IsTrace()/IsDebug(); ensure behavior aligns with
WithScrapeConfig usage.
api/v1/openssf.go (1)

10-11: MinScore *float64 is still breaking CRD generation.

The pipeline (lint, test, test-prod) continues to fail with found float, the usage of which is highly discouraged. Despite being marked as addressed in earlier commits, the *float64 type remains and zz_generated.deepcopy.go reflects it unchanged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/openssf.go` around lines 10 - 11, The MinScore field is still declared
as *float64 which breaks CRD generation; change the MinScore field declaration
from *float64 to an integer type (e.g. *int32 or *int) in the struct that
declares MinScore, update the yaml/json tags if needed, then regenerate the
deepcopy/CRD artifacts (re-run your generation script or make generate /
controller-gen) so zz_generated.deepcopy.go and CRDs no longer contain floats;
ensure any callers or tests using MinScore are updated to use the integer type
as well.
scrapers/github/security_client.go (3)

23-48: ⚠️ Potential issue | 🟠 Major

Missing empty-token validation — will silently produce 401s on all API calls.

If neither ConnectionName resolves a connection nor PersonalAccessToken yields a value, token remains "". The client is created with an empty OAuth token and every subsequent API call fails with 401, which surfaces as confusing "failed to fetch" errors downstream.

🐛 Proposed fix
 	} else {
 		token, err = ctx.GetEnvValueFromCache(config.PersonalAccessToken, ctx.Namespace())
 		if err != nil {
 			return nil, fmt.Errorf("failed to get token: %w", err)
 		}
 	}
+
+	if token == "" {
+		return nil, fmt.Errorf("github personal access token is required: configure either 'connection' or 'personalAccessToken'")
+	}
 
 	ts := oauth2.StaticTokenSource(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security_client.go` around lines 23 - 48, In
NewGitHubSecurityClient ensure the resolved token is validated and not empty
before creating the oauth2 client: after trying ctx.HydrateConnectionByURL and
ctx.GetEnvValueFromCache, check that token != "" (and return a clear error
referencing ConnectionName/PersonalAccessToken) so you don't construct an
oauth2.StaticTokenSource / github.NewClient with an empty token; update the
function to return an error when token resolution yields an empty string.

50-57: ⚠️ Potential issue | 🟠 Major

CreatedAt field is declared but never consumed by any Get* method.

fetchAllAlerts in security.go (Line 147) sets opts.CreatedAt for incremental scrapes, but none of the four client methods (GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts, GetSecurityAdvisories) pass it through to the underlying GitHub API request options. Incremental scraping is therefore non-functional.


121-133: ⚠️ Potential issue | 🟠 Major

GetSecurityAdvisories is missing pagination — Page/PerPage from AlertListOptions are not applied.

All other Get* methods in this file include ListOptions but this one does not, so only the API's default page size is returned.

🐛 Proposed fix
 	reqOpts := &github.ListRepositorySecurityAdvisoriesOptions{
 		State: opts.State,
+		ListOptions: github.ListOptions{
+			Page:    opts.Page,
+			PerPage: opts.PerPage,
+		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security_client.go` around lines 121 - 133,
GetSecurityAdvisories is not applying pagination from AlertListOptions; update
the reqOpts passed to github.SecurityAdvisories.ListRepositorySecurityAdvisories
to include Page and PerPage from opts (e.g., set reqOpts.Page = opts.Page and
reqOpts.PerPage = opts.PerPage) so it honors paging like the other Get* methods;
ensure you reference the ListRepositorySecurityAdvisoriesOptions struct and the
AlertListOptions fields when making this change in GetSecurityAdvisories.
scrapers/github/security.go (5)

59-62: ⚠️ Potential issue | 🔴 Critical

Logging produces garbage: string concatenation passed to %d format verb.

result.Properties.Find(...).Text returns strings. Concatenating them with + yields a single string, which is then passed to %d. This will print %!d(string=...) instead of a count.

🐛 Proposed fix
-		ctx.Logger.V(2).Infof("successfully scraped %s: %d total alerts", repoFullName,
-			result.Properties.Find("Critical Alerts").Text+result.Properties.Find("High Alerts").Text+
-			result.Properties.Find("Medium Alerts").Text+result.Properties.Find("Low Alerts").Text)
+		ctx.Logger.V(2).Infof("successfully scraped %s: critical=%s, high=%s, medium=%s, low=%s",
+			repoFullName,
+			result.Properties.Find("Critical Alerts").Text,
+			result.Properties.Find("High Alerts").Text,
+			result.Properties.Find("Medium Alerts").Text,
+			result.Properties.Find("Low Alerts").Text)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 59 - 62, The log call in
ctx.Logger.V(2).Infof is passing a concatenated string of
result.Properties.Find(...).Text values to a %d format, producing garbage;
change this to compute numeric counts (parse each of
result.Properties.Find("Critical Alerts").Text, "High Alerts", "Medium Alerts",
"Low Alerts" to ints using strconv.Atoi with safe fallbacks to 0), sum them into
a totalCount int, and then pass totalCount to %d (and/or include the individual
counts as %d/%d/%d/%d) so the format verbs match numeric values; update the call
site where repoFullName and result are used (ctx.Logger.V(2).Infof) accordingly
and handle parse errors by treating them as zero.

94-107: ⚠️ Potential issue | 🔴 Critical

CreatedAt type mismatch: time.Time assigned to *time.Time.

repo.CreatedAt.GetTime() returns time.Time, but ScrapeResult.CreatedAt expects *time.Time.

🐛 Proposed fix
+	createdAt := repo.CreatedAt.GetTime()
 	result := v1.ScrapeResult{
 		...
-		CreatedAt:  repo.CreatedAt.GetTime(),
+		CreatedAt:  &createdAt,
 		...
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 94 - 107, The CreatedAt field on
v1.ScrapeResult expects a *time.Time but the code assigns
repo.CreatedAt.GetTime() (a time.Time); fix by creating a local variable (e.g.,
t := repo.CreatedAt.GetTime()) and assign its address to CreatedAt (CreatedAt:
&t) in the ScrapeResult construction so the types match; update the ScrapeResult
construction in scrapers/github/security.go where result is built.

153-175: ⚠️ Potential issue | 🟠 Major

All alert fetches are single-page (PerPage: 100) with no pagination loop.

Repositories with more than 100 alerts of any type will have truncated results, leading to undercounting and incorrect health assessments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 153 - 175, The current fetches
using client.GetDependabotAlerts, client.GetCodeScanningAlerts, and
client.GetSecretScanningAlerts only request a single page (opts.PerPage=100) and
will truncate results for repos with >100 alerts; change each call to paginate:
loop over opts.Page starting at 1, call the respective method
(GetDependabotAlerts/GetCodeScanningAlerts/GetSecretScanningAlerts) per page,
append results to alerts.dependabot/alerts.codeScanning/alerts.secretScanning,
and break when the returned page is empty or less than opts.PerPage (or when the
API response indicates no next page). Ensure opts.Page is incremented each
iteration and preserve error handling and logging inside the loop.

128-142: ⚠️ Potential issue | 🟠 Major

filters.Severity and filters.MaxAge are declared but never propagated into AlertListOptions.

Only State is applied. The Severity field on AlertListOptions and the MaxAge/ParseMaxAge() method are unused here, meaning user-configured severity and age filters are silently ignored.

🔧 Proposed fix
 	opts := AlertListOptions{
 		State:   stateFilter,
 		Page:    1,
 		PerPage: 100,
 	}
+
+	if len(filters.Severity) > 0 {
+		opts.Severity = filters.Severity[0]
+	}
+
+	if maxAge, err := filters.ParseMaxAge(); err != nil {
+		return nil, time.Time{}, fmt.Errorf("invalid maxAge filter: %w", err)
+	} else if maxAge > 0 {
+		since := time.Now().Add(-maxAge)
+		opts.CreatedAt = since.Format(time.RFC3339)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 128 - 142, In fetchAllAlerts,
propagate the configured severity and max-age into AlertListOptions: read
filters.Severity (use filters.Severity[0] or default when present) and set the
AlertListOptions.Severity field, and call filters.ParseMaxAge() (or equivalent)
to compute the time/duration and assign it to the appropriate
AlertListOptions.MaxAge/MaxAgeTime field so the client request honors
user-configured filters; update the opts variable (used to list alerts) before
making the API calls (references: fetchAllAlerts, AlertListOptions,
filters.Severity, filters.MaxAge, ParseMaxAge).

45-51: ⚠️ Potential issue | 🟠 Major

time.Sleep blocks indefinitely and is not cancellation-aware.

If the scrape context is cancelled while waiting for a rate-limit reset, the goroutine will remain blocked. Use a select with ctx.Done().

⏱️ Proposed fix
 		} else if shouldPause {
 			ctx.Warnf("pausing for %v due to rate limit", duration)
-			time.Sleep(duration)
+			timer := time.NewTimer(duration)
+			select {
+			case <-timer.C:
+			case <-ctx.Done():
+				timer.Stop()
+				results.Errorf(ctx.Err(), "scrape cancelled while waiting for rate-limit reset")
+				return results
+			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 45 - 51, The current code in
security.go calls time.Sleep(duration) after client.ShouldPauseForRateLimit
which blocks even if ctx is cancelled; replace the blocking sleep with a
cancellation-aware wait: create a timer (e.g., timer := time.NewTimer(duration))
and use select { case <-timer.C: /* continue */; case <-ctx.Done():
timer.Stop(); return (or break to stop processing) } so ctx.Warnf and the pause
logic remain but the goroutine will stop waiting if ctx is cancelled; update the
branch handling in the block that calls ShouldPauseForRateLimit to use this
select instead of time.Sleep(duration).
scrapers/openssf/scorecard.go (1)

53-60: ⚠️ Potential issue | 🟠 Major

Repos filtered by MinScore don't update the cache, causing repeated unnecessary API calls.

When scorecard.Score < *config.MinScore, the continue on Line 56 skips the Store on Line 59. Every subsequent scrape run will re-fetch the same repo from the API, only to discard it again. Move the store before the filter.

🐛 Proposed fix
+		LastScorecardScrapeTime.Store(repoFullName, time.Now())
+
 		if config.MinScore != nil && scorecard.Score < *config.MinScore {
 			ctx.Debugf("skipping %s: score %.1f below minimum %.1f",
 				repoFullName, scorecard.Score, *config.MinScore)
 			continue
 		}
 
-		LastScorecardScrapeTime.Store(repoFullName, time.Now())
 		ctx.Logger.V(3).Infof("stored last scorecard scrape time for %s: %v", repoFullName, scorecard.Date)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 53 - 60, The cache update is
skipped when a repo fails the MinScore filter, so move the
LastScorecardScrapeTime.Store call to happen before the MinScore check: call
LastScorecardScrapeTime.Store(repoFullName, time.Now()) (and the related log
ctx.Logger.V(3).Infof...) immediately after obtaining scorecard (before checking
if config.MinScore != nil && scorecard.Score < *config.MinScore) so filtered
repos still update the scrape timestamp; keep existing variable names
(repoFullName, scorecard.Score, config.MinScore) and preserve the log message
content.
api/v1/github_security.go (1)

43-53: ⚠️ Potential issue | 🟠 Major

ParseMaxAge cannot handle "90d" / "30d" as documented on Line 43.

time.ParseDuration only supports ns, us, ms, s, m, h. The documented examples will fail at runtime.

🐛 Proposed fix (support "d" suffix)
+import (
+	"strconv"
+	"strings"
+	"time"
+
+	"github.com/flanksource/duty/types"
+)

 func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
 	if f.MaxAge == "" {
 		return 0, nil
 	}
+	if strings.HasSuffix(f.MaxAge, "d") {
+		days, err := strconv.Atoi(strings.TrimSuffix(f.MaxAge, "d"))
+		if err != nil {
+			return 0, fmt.Errorf("invalid day duration %q: %w", f.MaxAge, err)
+		}
+		return time.Duration(days) * 24 * time.Hour, nil
+	}
 	return time.ParseDuration(f.MaxAge)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/github_security.go` around lines 43 - 53, The ParseMaxAge method on
GitHubSecurityFilters currently calls time.ParseDuration which doesn't accept
"d" (days); update ParseMaxAge to first detect a trailing 'd' or 'D' suffix
(e.g., "90d"/"30d"), trim it, parse the remaining numeric portion (using
strconv.ParseInt or strconv.ParseFloat to allow fractional days), multiply by
24*time.Hour to produce the time.Duration, and return that; if there is no 'd'
suffix, fall back to time.ParseDuration for supported units and return any
parsing error—make sure to update ParseMaxAge to handle invalid formats with a
clear error.
🧹 Nitpick comments (3)
scrapers/openssf/client.go (1)

155-168: Replace custom contains/findSubstring with strings.Contains.

Both helpers are a manual reimplementation of strings.Contains from the standard library. The custom version adds unnecessary complexity and a non-obvious short-circuit path, with no performance or correctness advantage.

♻️ Proposed refactor
+import "strings"

 func (c *ScorecardClient) isRetryable(err error) bool {
 	if err == nil {
 		return false
 	}

 	errStr := err.Error()
-	return contains(errStr, "server error") ||
-	       contains(errStr, "timeout") ||
-	       contains(errStr, "connection")
+	return strings.Contains(errStr, "server error") ||
+		strings.Contains(errStr, "timeout") ||
+		strings.Contains(errStr, "connection")
 }
-
-func contains(s, substr string) bool {
-	return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
-		(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
-		 findSubstring(s, substr)))
-}
-
-func findSubstring(s, substr string) bool {
-	for i := 0; i <= len(s)-len(substr); i++ {
-		if s[i:i+len(substr)] == substr {
-			return true
-		}
-	}
-	return false
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/client.go` around lines 155 - 168, The custom contains and
findSubstring functions are duplicating standard library behavior; remove
contains and findSubstring and replace their callers with strings.Contains,
ensuring the package imports "strings"; update any call sites that reference
contains(...) to strings.Contains(...), delete the two helper functions
(contains and findSubstring), and run `go vet`/`go test` to confirm no missing
references.
scrapers/openssf/scorecard.go (1)

265-272: Replace containsString with slices.Contains from the standard library.

Go 1.21+ provides slices.Contains, and this project targets Go 1.25.1, so the helper function can be removed.

Changes required

Remove the function definition at lines 265-272:

-func containsString(slice []string, item string) bool {
-	for _, s := range slice {
-		if s == item {
-			return true
-		}
-	}
-	return false
-}

Add import and update line 182:

+import "slices"
 ...
-if check.Score == 0 && containsString(criticalChecks, check.Name) {
+if check.Score == 0 && slices.Contains(criticalChecks, check.Name) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 265 - 272, Remove the custom
containsString helper and use the Go stdlib slices.Contains instead: delete the
containsString(...) function, add the standard library import "slices", and
replace any calls to containsString(slice, item) (e.g., the usage near where it
was referenced) with slices.Contains(slice, item).
scrapers/github/security.go (1)

193-198: Add a comment explaining why secret scanning alerts don't count severity.

Secret scanning alerts skip countAlertSeverity because the GitHub REST API does not return severity information for these alerts. Unlike Dependabot and code scanning alerts, SecretScanningAlert in go-github v73 has no severity field. Adding a comment to clarify this API limitation would help future maintainers understand why the handling differs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 193 - 198, Add a short inline
comment above the loop that processes secretAlerts (the for _, alert := range
secretAlerts loop) explaining that secret scanning alerts do not include
severity in the GitHub REST API (SecretScanningAlert in go-github v73 has no
severity field), and therefore we intentionally skip calling countAlertSeverity
for these alerts; reference the go-github v73 limitation and the
countAlertSeverity function name in the comment so future maintainers understand
this special-case handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@api/context.go`:
- Around line 161-167: The IsTrace and IsDebug methods dereference
ctx.scrapeConfig.Spec without a nil check; update both IsTrace() and IsDebug()
to guard for ctx.scrapeConfig == nil (like ScraperID() does) and return a safe
default (false) when nil, otherwise call
ctx.scrapeConfig.Spec.IsTrace()/IsDebug(); ensure behavior aligns with
WithScrapeConfig usage.

In `@api/v1/github_security.go`:
- Around line 43-53: The ParseMaxAge method on GitHubSecurityFilters currently
calls time.ParseDuration which doesn't accept "d" (days); update ParseMaxAge to
first detect a trailing 'd' or 'D' suffix (e.g., "90d"/"30d"), trim it, parse
the remaining numeric portion (using strconv.ParseInt or strconv.ParseFloat to
allow fractional days), multiply by 24*time.Hour to produce the time.Duration,
and return that; if there is no 'd' suffix, fall back to time.ParseDuration for
supported units and return any parsing error—make sure to update ParseMaxAge to
handle invalid formats with a clear error.

In `@api/v1/openssf.go`:
- Around line 10-11: The MinScore field is still declared as *float64 which
breaks CRD generation; change the MinScore field declaration from *float64 to an
integer type (e.g. *int32 or *int) in the struct that declares MinScore, update
the yaml/json tags if needed, then regenerate the deepcopy/CRD artifacts (re-run
your generation script or make generate / controller-gen) so
zz_generated.deepcopy.go and CRDs no longer contain floats; ensure any callers
or tests using MinScore are updated to use the integer type as well.

In `@api/v1/types.go`:
- Around line 97-167: The ApplyPlugin method on ScraperSpec fails to apply
plugins to GitHubSecurity and OpenSSFScorecard scrapers; update
ScraperSpec.ApplyPlugin to include loops for spec.GitHubSecurity and
spec.OpenSSFScorecard (same pattern as other types) and call each item’s
BaseScraper = spec.<Type>[i].BaseScraper.ApplyPlugins(plugins...) so Retention
and other plugin effects are applied to those scrapers as well. Ensure you
follow the existing loop/assignment pattern used for types like
spec.GithubActions and spec.Azure.
- Around line 15-31: AllScraperConfigs is missing entries for the new scraper
types; update the map named AllScraperConfigs to include the "githubsecurity"
and "openssfscorecard" keys mapped to their corresponding config struct
zero-values (e.g., GitHubSecurity{} and OpenSSFScorecard{} or whatever the
actual type names are) so discovery and type-based lookups can find those
scrapers; locate the AllScraperConfigs map and add those two key→struct entries
consistent with the existing naming style.

In `@scrapers/github/security_client.go`:
- Around line 23-48: In NewGitHubSecurityClient ensure the resolved token is
validated and not empty before creating the oauth2 client: after trying
ctx.HydrateConnectionByURL and ctx.GetEnvValueFromCache, check that token != ""
(and return a clear error referencing ConnectionName/PersonalAccessToken) so you
don't construct an oauth2.StaticTokenSource / github.NewClient with an empty
token; update the function to return an error when token resolution yields an
empty string.
- Around line 121-133: GetSecurityAdvisories is not applying pagination from
AlertListOptions; update the reqOpts passed to
github.SecurityAdvisories.ListRepositorySecurityAdvisories to include Page and
PerPage from opts (e.g., set reqOpts.Page = opts.Page and reqOpts.PerPage =
opts.PerPage) so it honors paging like the other Get* methods; ensure you
reference the ListRepositorySecurityAdvisoriesOptions struct and the
AlertListOptions fields when making this change in GetSecurityAdvisories.

In `@scrapers/github/security.go`:
- Around line 59-62: The log call in ctx.Logger.V(2).Infof is passing a
concatenated string of result.Properties.Find(...).Text values to a %d format,
producing garbage; change this to compute numeric counts (parse each of
result.Properties.Find("Critical Alerts").Text, "High Alerts", "Medium Alerts",
"Low Alerts" to ints using strconv.Atoi with safe fallbacks to 0), sum them into
a totalCount int, and then pass totalCount to %d (and/or include the individual
counts as %d/%d/%d/%d) so the format verbs match numeric values; update the call
site where repoFullName and result are used (ctx.Logger.V(2).Infof) accordingly
and handle parse errors by treating them as zero.
- Around line 94-107: The CreatedAt field on v1.ScrapeResult expects a
*time.Time but the code assigns repo.CreatedAt.GetTime() (a time.Time); fix by
creating a local variable (e.g., t := repo.CreatedAt.GetTime()) and assign its
address to CreatedAt (CreatedAt: &t) in the ScrapeResult construction so the
types match; update the ScrapeResult construction in scrapers/github/security.go
where result is built.
- Around line 153-175: The current fetches using client.GetDependabotAlerts,
client.GetCodeScanningAlerts, and client.GetSecretScanningAlerts only request a
single page (opts.PerPage=100) and will truncate results for repos with >100
alerts; change each call to paginate: loop over opts.Page starting at 1, call
the respective method
(GetDependabotAlerts/GetCodeScanningAlerts/GetSecretScanningAlerts) per page,
append results to alerts.dependabot/alerts.codeScanning/alerts.secretScanning,
and break when the returned page is empty or less than opts.PerPage (or when the
API response indicates no next page). Ensure opts.Page is incremented each
iteration and preserve error handling and logging inside the loop.
- Around line 128-142: In fetchAllAlerts, propagate the configured severity and
max-age into AlertListOptions: read filters.Severity (use filters.Severity[0] or
default when present) and set the AlertListOptions.Severity field, and call
filters.ParseMaxAge() (or equivalent) to compute the time/duration and assign it
to the appropriate AlertListOptions.MaxAge/MaxAgeTime field so the client
request honors user-configured filters; update the opts variable (used to list
alerts) before making the API calls (references: fetchAllAlerts,
AlertListOptions, filters.Severity, filters.MaxAge, ParseMaxAge).
- Around line 45-51: The current code in security.go calls time.Sleep(duration)
after client.ShouldPauseForRateLimit which blocks even if ctx is cancelled;
replace the blocking sleep with a cancellation-aware wait: create a timer (e.g.,
timer := time.NewTimer(duration)) and use select { case <-timer.C: /* continue
*/; case <-ctx.Done(): timer.Stop(); return (or break to stop processing) } so
ctx.Warnf and the pause logic remain but the goroutine will stop waiting if ctx
is cancelled; update the branch handling in the block that calls
ShouldPauseForRateLimit to use this select instead of time.Sleep(duration).

In `@scrapers/openssf/client.go`:
- Around line 118-153: The fetchScorecard path currently treats 429 as an
unexpected status; update the HTTP handling in the fetchScorecard function to
explicitly detect http.StatusTooManyRequests and return a distinct error (e.g.,
containing "rate limited" or "too many requests") and optionally include the
Retry-After header value if present, and then update ScorecardClient.isRetryable
to consider that error by checking for "rate limit"/"too many requests"/"429" in
err.Error() in addition to the existing "server error", "timeout", and
"connection" checks so 429 responses will be retried.
- Around line 80-86: The retry backoff currently calls time.Sleep(backoff) which
blocks and ignores cancellation; update the retry loop in
scrapers/openssf/client.go so the sleep uses a cancellable wait (select on
time.After(backoff) and the request/context cancellation) instead of time.Sleep.
Specifically, inside the for-attempt loop where c.ctx.Logger is used, replace
the unconditional time.Sleep(backoff) with a select that waits on
time.After(backoff) and on c.ctx.Done() (or the request context field you use),
and when the context is done return or propagate the context error so retries
stop promptly.

In `@scrapers/openssf/scorecard.go`:
- Around line 53-60: The cache update is skipped when a repo fails the MinScore
filter, so move the LastScorecardScrapeTime.Store call to happen before the
MinScore check: call LastScorecardScrapeTime.Store(repoFullName, time.Now())
(and the related log ctx.Logger.V(3).Infof...) immediately after obtaining
scorecard (before checking if config.MinScore != nil && scorecard.Score <
*config.MinScore) so filtered repos still update the scrape timestamp; keep
existing variable names (repoFullName, scorecard.Score, config.MinScore) and
preserve the log message content.

---

Nitpick comments:
In `@scrapers/github/security.go`:
- Around line 193-198: Add a short inline comment above the loop that processes
secretAlerts (the for _, alert := range secretAlerts loop) explaining that
secret scanning alerts do not include severity in the GitHub REST API
(SecretScanningAlert in go-github v73 has no severity field), and therefore we
intentionally skip calling countAlertSeverity for these alerts; reference the
go-github v73 limitation and the countAlertSeverity function name in the comment
so future maintainers understand this special-case handling.

In `@scrapers/openssf/client.go`:
- Around line 155-168: The custom contains and findSubstring functions are
duplicating standard library behavior; remove contains and findSubstring and
replace their callers with strings.Contains, ensuring the package imports
"strings"; update any call sites that reference contains(...) to
strings.Contains(...), delete the two helper functions (contains and
findSubstring), and run `go vet`/`go test` to confirm no missing references.

In `@scrapers/openssf/scorecard.go`:
- Around line 265-272: Remove the custom containsString helper and use the Go
stdlib slices.Contains instead: delete the containsString(...) function, add the
standard library import "slices", and replace any calls to containsString(slice,
item) (e.g., the usage near where it was referenced) with slices.Contains(slice,
item).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4037510 and 24de88c.

📒 Files selected for processing (22)
  • api/context.go
  • api/v1/github_security.go
  • api/v1/openssf.go
  • api/v1/scrapeconfig_types.go
  • api/v1/types.go
  • api/v1/zz_generated.deepcopy.go
  • fixtures/github-security.yaml
  • fixtures/openssf.yaml
  • scrapers/aws/aws.go
  • scrapers/common.go
  • scrapers/github/security.go
  • scrapers/github/security_client.go
  • scrapers/github/testdata/security/code-scanning-alerts.json
  • scrapers/github/testdata/security/dependabot-alerts.json
  • scrapers/github/testdata/security/secret-scanning-alerts.json
  • scrapers/github/testdata/security/security-advisories.json
  • scrapers/openssf/client.go
  • scrapers/openssf/compliance.go
  • scrapers/openssf/scorecard.go
  • scrapers/openssf/testdata/scorecard-high-score.json
  • scrapers/openssf/testdata/scorecard-low-score.json
  • scrapers/openssf/testdata/scorecard-medium-score.json
✅ Files skipped from review due to trivial changes (1)
  • scrapers/openssf/testdata/scorecard-medium-score.json
🚧 Files skipped from review as they are similar to previous changes (9)
  • api/v1/scrapeconfig_types.go
  • scrapers/aws/aws.go
  • scrapers/openssf/testdata/scorecard-high-score.json
  • scrapers/github/testdata/security/security-advisories.json
  • scrapers/github/testdata/security/code-scanning-alerts.json
  • scrapers/openssf/compliance.go
  • fixtures/github-security.yaml
  • fixtures/openssf.yaml
  • scrapers/github/testdata/security/dependabot-alerts.json

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
scrapers/openssf/scorecard.go (2)

54-61: ⚠️ Potential issue | 🟠 Major

Min-score handling is brittle and cache update happens too late.

Two issues here:

  1. malformed MinScore is silently ignored (filter effectively disabled), and
  2. repos filtered by min-score never update cache, causing repeated unnecessary API calls every run.
🐛 Proposed fix (parse once per config, fail fast on invalid value, store cache before filter)
 func (o OpenSSFScorecardScraper) Scrape(ctx api.ScrapeContext) v1.ScrapeResults {
 	results := v1.ScrapeResults{}

 	for _, config := range ctx.ScrapeConfig().Spec.OpenSSFScorecard {
 		ctx.Logger.V(2).Infof("scraping OpenSSF Scorecard for %d repositories", len(config.Repositories))
 		client := NewScorecardClient(ctx)
+		var minScore *float64
+		if config.MinScore != "" {
+			parsed, err := strconv.ParseFloat(config.MinScore, 64)
+			if err != nil {
+				results.Errorf(err, "invalid minScore %q", config.MinScore)
+				continue
+			}
+			minScore = &parsed
+		}

 		for _, repoConfig := range config.Repositories {
 			repoFullName := fmt.Sprintf("%s/%s", repoConfig.Owner, repoConfig.Repo)
@@
 			scorecard, err := client.GetRepositoryScorecard(ctx, repoConfig.Owner, repoConfig.Repo)
 			if err != nil {
 				results.Errorf(err, "failed to scrape repository %s", repoFullName)
 				continue
 			}
+
+			LastScorecardScrapeTime.Store(repoFullName, time.Now())

-			if minScore, err := strconv.ParseFloat(config.MinScore, 64); err == nil && scorecard.Score < minScore {
+			if minScore != nil && scorecard.Score < *minScore {
 				ctx.Logger.V(3).Infof("skipping %s: score %.1f below minimum %.1f",
-					repoFullName, scorecard.Score, minScore)
+					repoFullName, scorecard.Score, *minScore)
 				continue
 			}
-
-			LastScorecardScrapeTime.Store(repoFullName, time.Now())
 			ctx.Logger.V(3).Infof("stored last scorecard scrape time for %s: %v", repoFullName, scorecard.Date)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 54 - 61, Parse config.MinScore
once (not per-repo) at startup and fail fast if it is malformed; use that parsed
value (e.g., minScoreParsed) when evaluating scorecard.Score so malformed values
don't silently disable the filter, and move the
LastScorecardScrapeTime.Store(repoFullName, time.Now()) call to occur before the
min-score check so repositories skipped by the min-score filter still update the
cache; update references to config.MinScore, repoFullName, scorecard.Score and
LastScorecardScrapeTime accordingly.

22-23: ⚠️ Potential issue | 🟠 Major

Package-level scrape cache leaks state across scraper instances/tests.

LastScorecardScrapeTime as process-global mutable state can cause cross-run contamination and nondeterministic behavior when multiple configs/tests run in the same process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/openssf/scorecard.go` around lines 22 - 23, LastScorecardScrapeTime
is a process-global sync.Map causing cross-test/instance leakage; instead, make
the scrape cache an instance field (e.g., add a field like scorecardCache
sync.Map on the ScorecardScraper/OpenssfScraper struct) and replace all uses of
the package-level LastScorecardScrapeTime with that instance field so each
scraper has its own cache; update constructors to initialize the field and
update tests to create new scraper instances (or provide a Reset method) to
avoid shared mutable state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@scrapers/openssf/scorecard.go`:
- Around line 54-61: Parse config.MinScore once (not per-repo) at startup and
fail fast if it is malformed; use that parsed value (e.g., minScoreParsed) when
evaluating scorecard.Score so malformed values don't silently disable the
filter, and move the LastScorecardScrapeTime.Store(repoFullName, time.Now())
call to occur before the min-score check so repositories skipped by the
min-score filter still update the cache; update references to config.MinScore,
repoFullName, scorecard.Score and LastScorecardScrapeTime accordingly.
- Around line 22-23: LastScorecardScrapeTime is a process-global sync.Map
causing cross-test/instance leakage; instead, make the scrape cache an instance
field (e.g., add a field like scorecardCache sync.Map on the
ScorecardScraper/OpenssfScraper struct) and replace all uses of the
package-level LastScorecardScrapeTime with that instance field so each scraper
has its own cache; update constructors to initialize the field and update tests
to create new scraper instances (or provide a Reset method) to avoid shared
mutable state.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24de88c and 8f8ab18.

📒 Files selected for processing (3)
  • api/v1/openssf.go
  • api/v1/zz_generated.deepcopy.go
  • scrapers/openssf/scorecard.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
scrapers/github/security.go (2)

50-66: ⚠️ Potential issue | 🟠 Major

Alert retrieval is single-page and can truncate counts.

Each alert category is fetched once at PerPage=100; repositories with more alerts will be undercounted and health status can be skewed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 50 - 66, The current single-call
fetch for dependabot, code scanning, and secret scanning alerts
(client.GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts)
only requests PerPage=100 and will miss additional pages; change each call to a
paginated loop that sets opts.Page (starting at 1) and calls the respective
method repeatedly, capture the response (not "_"), append results to
alerts.dependabot / alerts.codeScanning / alerts.secretScanning across
iterations, and stop when resp.NextPage == 0 (or when the returned next page is
zero). Keep PerPage=100 (or configurable) and propagate errors as before inside
the loop.

30-40: ⚠️ Potential issue | 🟠 Major

Configured security filters are only partially applied.

securityFilters.severity and securityFilters.maxAge are ignored when building AlertListOptions, so configured filtering is silently dropped.

Suggested fix
 	opts := AlertListOptions{
 		State:   stateFilter,
 		Page:    1,
 		PerPage: 100,
 	}
+	if len(filters.Severity) > 0 {
+		opts.Severity = filters.Severity[0]
+	}
+	if d, err := filters.ParseMaxAge(); err == nil && d > 0 {
+		opts.CreatedAt = time.Now().Add(-d).Format(time.RFC3339)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 30 - 40, The current
AlertListOptions only sets State/Page/PerPage and ignores
config.SecurityFilters.severity and .maxAge; update the code that builds
AlertListOptions (where filters := config.SecurityFilters and opts :=
AlertListOptions{...}) to include the severity filter (set the appropriate
Severity field from filters.Severity or filters.Severity[0]) and apply maxAge by
computing a cutoff time (e.g., time.Now().Add(-filters.MaxAge) or converting
filters.MaxAge to the API's date/age field) and setting the corresponding
AlertListOptions field (e.g., CreatedAfter, Since or MaxAge) so the API returns
only alerts matching severity and within maxAge alongside State/Page/PerPage.
🧹 Nitpick comments (1)
scrapers/github/openssf.go (1)

82-85: Consider cancellation-aware retry backoff.

Using direct time.Sleep delays cancellation responsiveness during retries; a select on timer vs ctx.Done() is safer for shutdown behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/openssf.go` around lines 82 - 85, The retry backoff currently
uses time.Sleep which ignores cancellation; replace the direct sleep with a
context-aware timer: create a timer like
time.NewTimer(time.Duration(attempt*attempt)*time.Second) and use select { case
<-timer.C: /* continue */ case <-ctx.Done(): timer.Stop(); return ctx.Err() }
(or break/propagate error) so retries abort promptly on ctx cancellation; apply
this change around the for attempt := 0; attempt < maxRetries; attempt++ loop
(use the existing attempt, maxRetries and ctx identifiers) and ensure timers are
stopped to avoid leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scrapers/github/client.go`:
- Around line 60-67: AlertListOptions.CreatedAt is ignored so incremental
scraping silently does full fetch; update the three fetch paths
(GetDependabotAlerts, GetCodeScanningAlerts, GetSecretScanningAlerts) to apply
CreatedAt to the corresponding go-github list option struct before making
requests (e.g., map AlertListOptions.CreatedAt into the API option field such as
CreatedAt/CreatedSince/Created parameter used by each GitHub alerts client),
ensure you only set this field when AlertListOptions.CreatedAt is non-empty, and
propagate it through any helper that builds the request options so incremental
scraping is honored.

In `@scrapers/github/openssf.go`:
- Around line 92-101: The response body is deferred closed which keeps it open
across retry iterations; after the httpClient.Do(req) call in the retry loop
(variables resp, err, lastErr, and function isRetryable), read and close
resp.Body immediately instead of deferring: ensure you consume or explicitly
close resp.Body right after receiving resp (before any continue or status
checks) so connections are returned to the pool; then proceed to check status
codes and handle retryable errors or return values.

In `@scrapers/github/scraper_e2e_test.go`:
- Around line 36-37: The failure formatting calls run.ProcessState.ExitCode()
unguarded and can panic if the command never started; change the require.NoError
invocation so that when err != nil you compute an exit code safely by checking
if run.ProcessState != nil and using run.ProcessState.ExitCode() only then,
otherwise use a sentinel (e.g., -1 or "N/A") and include that safe value in the
error message passed to require.NoError (keep references to run, runOut, err,
run.ProcessState and require.NoError to locate the change).

In `@scrapers/github/scraper.go`:
- Around line 62-67: The current logic in the block that calls
scrapeSecurityAlerts (guarded by config.Security) calls continue on error which
discards the repository's other collected data; instead, remove the continue and
ensure you record the error via results.Errorf(err, "failed to scrape security
alerts for %s", repoFullName) but allow processing to continue so the repository
metadata remains in results; if scrapeSecurityAlerts returns no alerts on error,
set alerts to nil or an empty slice and proceed to attach whatever alerts
(possibly empty) to the repo result so other enrichment and storage (use the
same variables repoFullName, alerts, results) still runs.
- Around line 245-257: Update the mapGitHubSeverity function to handle GitHub
code-scanning values "error", "warning", and "note" so they don't fall through
to models.SeverityInfo; add cases mapping "error" -> models.SeverityHigh,
"warning" -> models.SeverityMedium, and "note" -> models.SeverityLow (and
consider case-insensitive matching) while keeping existing mappings for "low",
"medium", "high", and "critical".

In `@scrapers/github/security.go`:
- Around line 75-77: The loop over codeScanAlerts currently uses
alert.Rule.GetSeverity() and passes its value to countAlertSeverity, which only
handles "critical"/"high"/"medium"/"low" and thus silently drops non-CVSS values
like "error"/"warning"/"note"; change the logic in the loop to first check
alert.Rule.GetSecuritySeverityLevel() (or alert.GetSecuritySeverityLevel()) and
if that returns a non-empty CVSS level use that to call countAlertSeverity,
otherwise map fallback values from alert.Rule.GetSeverity() to a normalized
bucket (or increment an "other"/"note" count) before calling countAlertSeverity;
update any related handling in the same block and in the functions referenced
(countAlertSeverity and any aggregation using alerts.counts) so non-CVSS
severities are accounted for instead of being ignored.

---

Duplicate comments:
In `@scrapers/github/security.go`:
- Around line 50-66: The current single-call fetch for dependabot, code
scanning, and secret scanning alerts (client.GetDependabotAlerts,
GetCodeScanningAlerts, GetSecretScanningAlerts) only requests PerPage=100 and
will miss additional pages; change each call to a paginated loop that sets
opts.Page (starting at 1) and calls the respective method repeatedly, capture
the response (not "_"), append results to alerts.dependabot /
alerts.codeScanning / alerts.secretScanning across iterations, and stop when
resp.NextPage == 0 (or when the returned next page is zero). Keep PerPage=100
(or configurable) and propagate errors as before inside the loop.
- Around line 30-40: The current AlertListOptions only sets State/Page/PerPage
and ignores config.SecurityFilters.severity and .maxAge; update the code that
builds AlertListOptions (where filters := config.SecurityFilters and opts :=
AlertListOptions{...}) to include the severity filter (set the appropriate
Severity field from filters.Severity or filters.Severity[0]) and apply maxAge by
computing a cutoff time (e.g., time.Now().Add(-filters.MaxAge) or converting
filters.MaxAge to the API's date/age field) and setting the corresponding
AlertListOptions field (e.g., CreatedAfter, Since or MaxAge) so the API returns
only alerts matching severity and within maxAge alongside State/Page/PerPage.

---

Nitpick comments:
In `@scrapers/github/openssf.go`:
- Around line 82-85: The retry backoff currently uses time.Sleep which ignores
cancellation; replace the direct sleep with a context-aware timer: create a
timer like time.NewTimer(time.Duration(attempt*attempt)*time.Second) and use
select { case <-timer.C: /* continue */ case <-ctx.Done(): timer.Stop(); return
ctx.Err() } (or break/propagate error) so retries abort promptly on ctx
cancellation; apply this change around the for attempt := 0; attempt <
maxRetries; attempt++ loop (use the existing attempt, maxRetries and ctx
identifiers) and ensure timers are stopped to avoid leaks.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f8ab18 and 583741d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (17)
  • api/v1/github.go
  • api/v1/interface.go
  • api/v1/interface_rate_limit_test.go
  • api/v1/types.go
  • cmd/run.go
  • fixtures/github.yaml
  • go.mod
  • scrapers/common.go
  • scrapers/cron.go
  • scrapers/github/client.go
  • scrapers/github/compliance.go
  • scrapers/github/openssf.go
  • scrapers/github/scraper.go
  • scrapers/github/scraper_e2e_test.go
  • scrapers/github/scraper_test.go
  • scrapers/github/security.go
  • scrapers/run.go
✅ Files skipped from review due to trivial changes (1)
  • api/v1/interface_rate_limit_test.go

Comment thread scrapers/github/client.go
Comment on lines +92 to +101
resp, err := httpClient.Do(req)
if err != nil {
lastErr = err
if isRetryable(err) {
continue
}
return nil, err
}
defer func() { _ = resp.Body.Close() }()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n scrapers/github/openssf.go | head -150

Repository: flanksource/config-db

Length of output: 5294


Close response bodies immediately in the retry loop, not via defer.

When continue is executed after a retryable error (line 96) or server error (line 107), the deferred close is not invoked until the function exits, leaving the response body open across retry iterations. This prevents connection reuse and risks resource exhaustion.

Read and close the body immediately after the response is received, before status checking:

Suggested fix
		resp, err := httpClient.Do(req)
		if err != nil {
			lastErr = err
			if isRetryable(err) {
				continue
			}
			return nil, err
		}
-		defer func() { _ = resp.Body.Close() }()
+		body, readErr := io.ReadAll(resp.Body)
+		_ = resp.Body.Close()
+		if readErr != nil {
+			return nil, fmt.Errorf("failed to read response body: %w", readErr)
+		}

		if resp.StatusCode == http.StatusNotFound {
			return nil, fmt.Errorf("repository not found or not assessed by OpenSSF Scorecard (404)")
		}
		if resp.StatusCode >= 500 {
			lastErr = fmt.Errorf("server error: %d", resp.StatusCode)
			continue
		}
		if resp.StatusCode != http.StatusOK {
-			body, _ := io.ReadAll(resp.Body)
			return nil, fmt.Errorf("unexpected status %d: %s", resp.StatusCode, string(body))
		}
-
-		body, err := io.ReadAll(resp.Body)
-		if err != nil {
-			return nil, fmt.Errorf("failed to read response body: %w", err)
-		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/openssf.go` around lines 92 - 101, The response body is
deferred closed which keeps it open across retry iterations; after the
httpClient.Do(req) call in the retry loop (variables resp, err, lastErr, and
function isRetryable), read and close resp.Body immediately instead of
deferring: ensure you consume or explicitly close resp.Body right after
receiving resp (before any continue or status checks) so connections are
returned to the pool; then proceed to check status codes and handle retryable
errors or return values.

Comment on lines +36 to +37
runOut, err := run.CombinedOutput()
require.NoError(t, err, "config-db run failed (exit %v):\n%s", run.ProcessState.ExitCode(), string(runOut))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard ProcessState in failure formatting.

run.ProcessState.ExitCode() can panic when the command fails before process state is populated (e.g., spawn/setup failures).

💡 Suggested fix
 runOut, err := run.CombinedOutput()
-require.NoError(t, err, "config-db run failed (exit %v):\n%s", run.ProcessState.ExitCode(), string(runOut))
+exitCode := -1
+if run.ProcessState != nil {
+	exitCode = run.ProcessState.ExitCode()
+}
+require.NoError(t, err, "config-db run failed (exit %v):\n%s", exitCode, string(runOut))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/scraper_e2e_test.go` around lines 36 - 37, The failure
formatting calls run.ProcessState.ExitCode() unguarded and can panic if the
command never started; change the require.NoError invocation so that when err !=
nil you compute an exit code safely by checking if run.ProcessState != nil and
using run.ProcessState.ExitCode() only then, otherwise use a sentinel (e.g., -1
or "N/A") and include that safe value in the error message passed to
require.NoError (keep references to run, runOut, err, run.ProcessState and
require.NoError to locate the change).

Comment on lines +62 to +67
if config.Security {
alerts, err = scrapeSecurityAlerts(ctx, client, config, repoFullName)
if err != nil {
results.Errorf(err, "failed to scrape security alerts for %s", repoFullName)
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t drop repository results when security enrichment fails.

A security-alert fetch error currently skips the entire repository (continue), so core repository metadata is lost even though it was successfully fetched.

💡 Suggested fix
 if config.Security {
 	alerts, err = scrapeSecurityAlerts(ctx, client, config, repoFullName)
 	if err != nil {
 		results.Errorf(err, "failed to scrape security alerts for %s", repoFullName)
-		continue
+		alerts = nil
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if config.Security {
alerts, err = scrapeSecurityAlerts(ctx, client, config, repoFullName)
if err != nil {
results.Errorf(err, "failed to scrape security alerts for %s", repoFullName)
continue
}
if config.Security {
alerts, err = scrapeSecurityAlerts(ctx, client, config, repoFullName)
if err != nil {
results.Errorf(err, "failed to scrape security alerts for %s", repoFullName)
alerts = nil
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/scraper.go` around lines 62 - 67, The current logic in the
block that calls scrapeSecurityAlerts (guarded by config.Security) calls
continue on error which discards the repository's other collected data; instead,
remove the continue and ensure you record the error via results.Errorf(err,
"failed to scrape security alerts for %s", repoFullName) but allow processing to
continue so the repository metadata remains in results; if scrapeSecurityAlerts
returns no alerts on error, set alerts to nil or an empty slice and proceed to
attach whatever alerts (possibly empty) to the repo result so other enrichment
and storage (use the same variables repoFullName, alerts, results) still runs.

Comment on lines +245 to +257
func mapGitHubSeverity(severity string) models.Severity {
switch severity {
case "critical":
return models.SeverityCritical
case "high":
return models.SeverityHigh
case "medium":
return models.SeverityMedium
case "low":
return models.SeverityLow
default:
return models.SeverityInfo
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub REST API Dependabot alerts security_advisory severity values 2024 2025

💡 Result:

For the GitHub REST API Dependabot alerts (/repos/{owner}/{repo}/dependabot/alerts and related endpoints), the security_advisory.severity value is a string with four possible values:

  • low
  • medium
  • high
  • critical [1][2]

This matches the severity filter parameter for listing Dependabot alerts (same four values) [1] and the example alert payloads where security_advisory.severity appears as e.g. "medium" [2].

Note: If you’ve seen moderate, that’s commonly used in other contexts (e.g., some UI/other APIs), but for the REST Dependabot alerts API the value is medium, not moderate [1][2].

Sources
[1] GitHub Docs – REST API endpoints for Dependabot alerts (severity filter values) (docs.github.com)
[2] GitHub Docs – Dependabot alerts REST example payload showing security_advisory.severity (docs.github.com)

Citations:


🌐 Web query:

GitHub REST API code scanning alert rule severity values version 2022-11-28

💡 Result:

For GitHub REST API Code Scanning alerts with X-GitHub-Api-Version: 2022-11-28, the alert rule object uses these severity-related fields:

  • rule.severity: error, warning, note (shown in the code scanning alerts REST responses) [1]
  • rule.security_severity_level (for security queries): critical, high, medium, low [1], [2]

Also, the severity filter parameter for listing alerts accepts the combined set: critical, high, medium, low, warning, note, error [1].

Sources: [1] [2] [3]


🏁 Script executed:

cat -n scrapers/github/scraper.go | head -300

Repository: flanksource/config-db

Length of output: 9904


🏁 Script executed:

rg "mapGitHubSeverity" scrapers/github/scraper.go -B 5 -A 2

Repository: flanksource/config-db

Length of output: 852


🏁 Script executed:

rg "dependabot|code.?scanning|CodeQL" scrapers/github/scraper.go -i

Repository: flanksource/config-db

Length of output: 455


Map GitHub code scanning severity values to prevent silent downgrades to Info.

Code scanning alerts return error, warning, and note in rule.severity, which are not handled and fall through to the default case. Dependabot already uses the correct set of values (low, medium, high, critical).

💡 Suggested fix
 import (
 	"fmt"
+	"strings"
 	"sync"
 	"time"
@@
 func mapGitHubSeverity(severity string) models.Severity {
-	switch severity {
+	switch strings.ToLower(strings.TrimSpace(severity)) {
 	case "critical":
 		return models.SeverityCritical
-	case "high":
+	case "high", "error":
 		return models.SeverityHigh
-	case "medium":
+	case "medium", "warning":
 		return models.SeverityMedium
-	case "low":
+	case "low", "note":
 		return models.SeverityLow
 	default:
 		return models.SeverityInfo
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/scraper.go` around lines 245 - 257, Update the
mapGitHubSeverity function to handle GitHub code-scanning values "error",
"warning", and "note" so they don't fall through to models.SeverityInfo; add
cases mapping "error" -> models.SeverityHigh, "warning" ->
models.SeverityMedium, and "note" -> models.SeverityLow (and consider
case-insensitive matching) while keeping existing mappings for "low", "medium",
"high", and "critical".

Comment on lines +75 to +77
for _, alert := range codeScanAlerts {
countAlertSeverity(&alerts.counts, alert.Rule.GetSeverity())
if alert.UpdatedAt != nil && alert.UpdatedAt.After(maxAlertTime) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the security.go file to understand the context
cat -n scrapers/github/security.go | head -120

Repository: flanksource/config-db

Length of output: 4141


🏁 Script executed:

# Let's search for documentation or comments about severity levels in the codebase
rg -i "severity" scrapers/github/ -A 2 -B 2

Repository: flanksource/config-db

Length of output: 5498


🏁 Script executed:

# Let's see how mapGitHubSeverity is used in scraper.go
rg -A 10 "mapGitHubSeverity" scrapers/github/scraper.go | head -40

Repository: flanksource/config-db

Length of output: 1061


🌐 Web query:

GitHub code scanning alert severity levels documentation

💡 Result:

GitHub documents two related “severity” concepts for Code Scanning alerts:

  • Severity (non-security level): Error, Warning, or Note. [1]
  • Security severity (security alerts, e.g., CodeQL security queries): Critical, High, Medium, or Low—and when present, GitHub uses this in preference to severity. [1]

Related docs (GitHub Docs):

  • “About code scanning alerts” (includes “About alert severity and security severity levels”). [1]
  • “Triaging code scanning alerts in pull requests” (notes default PR check failures for error, critical, high). [2]
  • “Filtering alerts in security overview” (shows severity: filter values include both critical/high/medium/low and error/warning/note). [3]

Sources: [1] [2] [3]


🏁 Script executed:

# Search for any use of GetSecuritySeverityLevel in the codebase
rg "GetSecuritySeverityLevel" .

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Check go.mod for go-github version
rg "github.com/google/go-github" go.mod

Repository: flanksource/config-db

Length of output: 158


🌐 Web query:

go-github v73 Alert Rule GetSecuritySeverityLevel method

💡 Result:

In go-github/v73, GetSecuritySeverityLevel() is a getter on the github.Rule type used by code scanning alerts (not an AlertRule type).

  • Struct field: SecuritySeverityLevel *string \json:"security_severity_level,omitempty"`ontype Rule struct` (code scanning rule metadata) [1]
  • Getter implementation:
func (r *Rule) GetSecuritySeverityLevel() string {
    if r == nil || r.SecuritySeverityLevel == nil { return "" }
    return *r.SecuritySeverityLevel
}
``` [2]

Typical usage from an alert:

```go
sev := alert.GetRule().GetSecuritySeverityLevel()
``` [1][2]

---

🏁 Script executed:

```shell
# Check all usages of countAlertSeverity
rg "countAlertSeverity" scrapers/github/

Repository: flanksource/config-db

Length of output: 350


Code scanning severity mapping silently drops non-CVSS alerts.

alert.Rule.GetSeverity() can return "error", "warning", or "note" (non-security levels), but countAlertSeverity only handles "critical", "high", "medium", "low" with no default case, so these alerts are never counted. GitHub's Code Scanning API provides GetSecuritySeverityLevel() for CVSS levels and should be checked first.

Suggested normalization approach
-	for _, alert := range codeScanAlerts {
-		countAlertSeverity(&alerts.counts, alert.Rule.GetSeverity())
+	for _, alert := range codeScanAlerts {
+		sev := alert.Rule.GetSecuritySeverityLevel()
+		if sev == "" {
+			sev = alert.Rule.GetSeverity()
+		}
+		countAlertSeverity(&alerts.counts, strings.ToLower(sev))
 		if alert.UpdatedAt != nil && alert.UpdatedAt.After(maxAlertTime) {
 			maxAlertTime = alert.UpdatedAt.Time
 		}
 	}
...
 func countAlertSeverity(counts *alertCounts, severity string) {
 	switch severity {
 	case "critical":
 		counts.critical++
-	case "high":
+	case "high", "error":
 		counts.high++
-	case "medium":
+	case "medium", "warning":
 		counts.medium++
-	case "low":
+	case "low", "note":
 		counts.low++
 	}
 }

Also applies to: 99-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scrapers/github/security.go` around lines 75 - 77, The loop over
codeScanAlerts currently uses alert.Rule.GetSeverity() and passes its value to
countAlertSeverity, which only handles "critical"/"high"/"medium"/"low" and thus
silently drops non-CVSS values like "error"/"warning"/"note"; change the logic
in the loop to first check alert.Rule.GetSecuritySeverityLevel() (or
alert.GetSecuritySeverityLevel()) and if that returns a non-empty CVSS level use
that to call countAlertSeverity, otherwise map fallback values from
alert.Rule.GetSeverity() to a normalized bucket (or increment an "other"/"note"
count) before calling countAlertSeverity; update any related handling in the
same block and in the functions referenced (countAlertSeverity and any
aggregation using alerts.counts) so non-CVSS severities are accounted for
instead of being ignored.

Keep OpenSSF Scorecard results (richer data: scores, compliance
mappings, details) and drop matching Code Scanning alerts. Previously
only Vulnerabilities/SAST were filtered from OpenSSF side.
@moshloop moshloop merged commit 7400158 into main Feb 26, 2026
13 of 15 checks passed
@moshloop moshloop deleted the security-scrapers branch February 26, 2026 21:19
This was referenced Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant