Skip to content

roachtest: upload runner-level logs to Datadog#166175

Open
williamchoe3 wants to merge 3 commits intocockroachdb:masterfrom
williamchoe3:datadog-runner-logs
Open

roachtest: upload runner-level logs to Datadog#166175
williamchoe3 wants to merge 3 commits intocockroachdb:masterfrom
williamchoe3:datadog-runner-logs

Conversation

@williamchoe3
Copy link
Contributor

@williamchoe3 williamchoe3 commented Mar 19, 2026

Extends roachtest's datadog integration by adding support for uploading test runner level logs (test_runner-*.log, w*.log) to Datadog. Previously, these were ignored.

@trunk-io
Copy link
Contributor

trunk-io bot commented Mar 19, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@blathers-crl
Copy link

blathers-crl bot commented Mar 19, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

williamchoe3 and others added 2 commits March 19, 2026 11:18
Epic: None
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Epic: None
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@williamchoe3
Copy link
Contributor Author

The first 2 commits are from #166059, which this PR depends on. Once that PR merges, this branch will be rebased onto master and the diff will only show the third commit.

@williamchoe3
Copy link
Contributor Author

williamchoe3 commented Mar 19, 2026

@williamchoe3
Copy link
Contributor Author

PR #166175 Review: roachtest: upload runner-level logs to Datadog

Reviewing commit 3 only (f34b01f — "roachtest: upload runner-level logs to Datadog"). Commits 1-2 were reviewed separately.

Overview

Extends the datadog log upload system with a new "runner logs" category for test_runner-*.log and w*.log files from _runner-logs/. The changes are well-structured: readTCBuildProperties is cleanly extracted, the new NewRunnerLogMetadata/ShouldUploadRunnerLogs/MaybeUploadRunnerLogs follow the established pattern, and renames (ShouldUploadLogsToDatadogShouldUploadTestLogsToDatadog, discoverLogFilesdiscoverTestLogFiles) disambiguate the two categories. Good test coverage across all new functions.


BLOCKERS

None.


MEDIUM

M1. MaybeUploadRunnerLogs duplicates the upload loop from MaybeUploadTestLogs

The two functions share ~30 lines of identical logic: Datadog context creation, API client setup, the file iteration loop (stat → copy metadata → select parser → parse/upload → log timing), and the summary log. The only differences are the discovery function called and the child logger name.

This is manageable for two categories, but the package doc comment establishes categories as a first-class concept ("Current categories: Test logs, Runner logs"), suggesting more may follow. If a third category arrives, the duplication becomes a real maintenance burden.

Consider extracting the shared upload loop into a helper:

func uploadLogFiles(
    ctx context.Context, l *logger.Logger, logsAPI *datadogV2.LogsApi,
    files []string, baseDir string, cfg LogMetadata,
) error { ... }

Not blocking since two copies is tolerable, but worth noting before a third appears.

M2. Runner metadata produces empty-value tags in Datadog

NewRunnerLogMetadata leaves test-specific fields empty (TestName, Owner, Cloud, Platform), but makeTags() unconditionally includes all fields:

func (m LogMetadata) makeTags() map[string]string {
    tagMap := map[string]string{
        testNameTagName: m.TestName,  // ""
        ownerTagName:    m.Owner,     // ""
        cloudTagName:    m.Cloud,     // ""
        platformTagName: m.Platform,  // ""
        ...
    }
}

This produces Datadog tags like name:, owner:, cloud:, platform: with empty values. In the Datadog UI, these appear in tag facet dropdowns as blank entries, making it harder to filter and distinguish runner logs from test logs.

Options:

  1. Skip empty values in makeTags() (changes behavior for test logs too — probably fine since empty tags aren't useful there either).
  2. Add a category tag (category:runner vs category:test) to explicitly distinguish log types, regardless of empty fields.
  3. Accept the empty tags for now.

NITS

N1. Typo in selectParser comment: "it's" → "its"

// ... and it's parser does not match any lines ...

Should be "its parser" (possessive, not contraction).

N2. "uploaded X log events" printed even after upload error

if uploadErr != nil {
    l.Printf("error uploading %s to Datadog (%d entries uploaded): %v", ...)
}
l.Printf("uploaded %d log events from %s in %s", ...)

After a failed upload, the log shows both an error and a success-like message ("uploaded 0 log events"). Same pattern exists in MaybeUploadTestLogs, so not a regression, but worth wrapping the success message in an else:

if uploadErr != nil {
    l.Printf("error uploading %s ...", ...)
} else {
    l.Printf("uploaded %d log events from %s in %s", ...)
}

N3. shouldUploadRunnerFile matches more than intended

if strings.HasSuffix(name, ".log") && name[0] == 'w' && name[1] >= '0' && name[1] <= '9' {

This matches any file starting with w + digit + anything + .log, e.g., w0extra.log or widget.log (if it existed). Since the _runner-logs/ directory is controlled by roachtest, this won't cause issues in practice. But a stricter check would be:

// Match w<digits>.log exactly.
if strings.HasSuffix(name, ".log") && len(name) > 4 {
    stem := strings.TrimSuffix(name, ".log")
    if stem[0] == 'w' {
        if _, err := strconv.Atoi(stem[1:]); err == nil {
            return true
        }
    }
}

Not worth the complexity unless false matches become a problem.

N4. Blank line removed in NewLogMetadata

The diff removes a blank line after the function signature opening brace in NewLogMetadata. Minor formatting change mixed in with the refactoring — fine, just noting for commit hygiene.

@williamchoe3
Copy link
Contributor Author

williamchoe3 commented Mar 19, 2026

M1 ... This is manageable for two categories, but the package doc comment establishes categories as a first-class concept ("Current categories: Test logs, Runner logs"), suggesting more may follow. If a third category arrives, the duplication becomes a real maintenance burden.

Added a comment on the top of file to explain the boilerplate, I agree if this needs to be extended for a 3rd category of logs for roachtest, I would look to implementing an interface that clearly defines what is needed by a runnerLogsUploaderImpl or testLogsUploaderImpl, but at this point I'm not sure what that 3rd category would be so I'm leaving it as is for now.

M2, N1: addressed
N2, N3: I think these are fine as is. For N2, since we're best effort, we can both fail and succeed so a strict if else wouldn't make as much sense. For N3, I don't think we need to be that cautious.

@williamchoe3 williamchoe3 force-pushed the datadog-runner-logs branch 5 times, most recently from 1d65ecd to 61aa4cd Compare March 19, 2026 20:01
@williamchoe3
Copy link
Contributor Author

link
image

@williamchoe3 williamchoe3 marked this pull request as ready for review March 19, 2026 20:04
@williamchoe3 williamchoe3 requested review from a team as code owners March 19, 2026 20:04
@williamchoe3 williamchoe3 requested review from DarrylWong and srosenberg and removed request for a team and DarrylWong March 19, 2026 20:04
Add support for uploading test runner logs (test_runner-*.log, w*.log)
to Datadog.

Epic: None
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants