Fix GitLab adapter path contract parity#96
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis PR adds a core layout contract, loads and applies writeback contracts to discovery, normalizes adapter endpoints, and refactors GitLab VFS paths to {id}__{slug}/meta.json with auxiliary emitters, digest, updated ingestion/webhook logic, tests, and documentation. ChangesShared Layout Contract Foundation
Alias Slug Source Consolidation
Writeback Contract Infrastructure
Writeback Discovery Normalization
GitLab Path Refactoring with New Conventions
Documentation and Process Rules
Sequence DiagramsequenceDiagram
participant Webhook as GitLab Webhook Normalizer
participant PathMapper as packages/gitlab/src/path-mapper.js
participant Adapter as GitLab Adapter
participant Auxiliary as emitGitLabAuxiliaryFiles
participant Indexer as index-emitter
Webhook->>PathMapper: computeGitLabPath(objectType, objectId, { title/ref })
PathMapper->>PathMapper: encodeGitLabPathSegment / select meta.json vs flat
PathMapper-->>Webhook: normalized meta.json or encoded path
Webhook-->>Adapter: normalized objectId + path
Adapter->>Auxiliary: emitGitLabAuxiliaryFiles(records)
Auxiliary->>PathMapper: gitLabRecordDirectorySegment(id, title)
Auxiliary->>Indexer: upsertGitLabIndexRow(rows, newRow)
Indexer-->>Auxiliary: sorted index JSON file content
Auxiliary-->>Adapter: write/delete counts and errors
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
| return computeCanonicalPath('gitlab', objectType, objectId); | ||
| } |
There was a problem hiding this comment.
🔴 computeGitLabPath returns slug-less paths that no longer match actual canonical files on disk
Before this PR, computeMetadataPath(projectPath, 'merge_requests', '42') (without a title) produced /gitlab/projects/acme/api/merge_requests/42/metadata.json, which matched the actual file on disk. After this PR, the same call produces /gitlab/projects/acme/api/merge_requests/42/meta.json, but the actual canonical file is at /gitlab/projects/acme/api/merge_requests/42__add-oauth/meta.json (with slug). The computeGitLabPath switch statement at lines 329-341 calls computeMetadataPath without the title parameter for all directory resource types, so the returned path always lacks the __<slug> segment. This breaks two exported APIs: adapter.computePath() (called by packages/webhook-server/src/routes.ts:324) and the exported computePathFromWebhook() (packages/gitlab/src/webhook/normalizer.ts:180-181), both of which now return paths pointing to non-existent files.
(Refers to lines 329-344)
Prompt for agents
The computeGitLabPath function (path-mapper.ts lines 278-345) calls computeMetadataPath without a title parameter for directory resources (merge_requests, issues, commits, pipelines). This means the resulting path omits the __<slug> directory segment that the actual canonical file uses. Before this PR the paths always matched because there was no slug component; now they diverge.
The root cause is that computeGitLabPath only receives an objectId string (e.g. 'acme/api/merge_requests/42') which doesn't contain slug information. Two possible fixes:
1. Accept an optional title/slug parameter in computeGitLabPath and propagate it to computeMetadataPath. Update adapter.computePath() to accept and forward the payload's title field. This requires updating the IntegrationAdapter base class or the GitLab-specific override.
2. Make computeGitLabPath produce a 'lookup path' (without slug) that the system understands as equivalent to the slugged canonical path. This would require parseGitLabPath or file-lookup logic to resolve slug-less paths to the actual canonical path on disk. This is more complex but avoids changing the computePath signature.
The same issue affects computePathFromWebhook in webhook/normalizer.ts which calls computeGitLabPath internally.
Related files: packages/gitlab/src/adapter.ts (computePath method at line 128), packages/gitlab/src/webhook/normalizer.ts (computePathFromWebhook at line 180), packages/webhook-server/src/routes.ts (calls adapter.computePath at line 324).
Was this helpful? React with 👍 or 👎 to provide feedback.
| path: /gitlab/projects/{{project.path_with_namespace}}/pipelines/{{object_attributes.id}}/metadata.json | ||
| path: /gitlab/projects/{{project.path_with_namespace}}/pipelines/{{object_attributes.id}}__{{object_attributes.ref}}/meta.json | ||
| build: | ||
| path: /gitlab/projects/{{project.path_with_namespace}}/pipelines/{{pipeline_id}}/jobs/{{build_id}}.json |
There was a problem hiding this comment.
🟡 gitlab.mapping.yaml build webhook path not updated with __ref pipeline directory segment
Every other webhook path in gitlab.mapping.yaml was updated to include the new __<slug> or __<ref> directory segments, but the build entry at line 26 was missed. The YAML says pipelines/{{pipeline_id}}/jobs/{{build_id}}.json, but the code at packages/gitlab/src/adapter.ts:268 and packages/gitlab/src/adapter.ts:283 produces paths like pipelines/<pipeline_id>__<ref>/jobs/<build_id>.json via computeMetadataPath(projectPath, 'pipelines', pipelineId, jobPayload.ref) and computePipelineJobPath(projectPath, pipelineId, jobPayload.build_id, jobPayload.ref). The ref field IS available in the build webhook payload. This violates the AGENTS.md rule: "If a generated template disagrees with path-mapper.ts, update the template."
| path: /gitlab/projects/{{project.path_with_namespace}}/pipelines/{{pipeline_id}}/jobs/{{build_id}}.json | |
| path: /gitlab/projects/{{project.path_with_namespace}}/pipelines/{{pipeline_id}}__{{ref}}/jobs/{{build_id}}.json |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
scripts/writeback-discovery-normalizer.mjs (1)
159-187: ⚡ Quick winConsider adding explicit bounds check in
layoutPathMatchesResource.The function skips dynamic resource segments (lines 176-178), then immediately accesses
resourceSegments[resourceIndex](line 180) without an explicit bounds check. While the comparison implicitly handles undefined (returning false), adding an explicit check would make the code more defensive and readable.🛡️ Suggested defensive bounds check
while (isDynamicSegment(resourceSegments[resourceIndex])) { resourceIndex += 1; } + if (resourceIndex >= resourceSegments.length) { + return false; + } + if (resourceSegments[resourceIndex] !== layoutSegment) { return false; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/writeback-discovery-normalizer.mjs` around lines 159 - 187, The function layoutPathMatchesResource is vulnerable to out-of-bounds access after advancing resourceIndex past dynamic segments; after the while loop that skips dynamic resource segments (the one that increments resourceIndex while isDynamicSegment(resourceSegments[resourceIndex]) is true), add an explicit bounds check (e.g., if resourceIndex >= resourceSegments.length return false) before accessing resourceSegments[resourceIndex]; ensure the same defensive check exists wherever resourceSegments[resourceIndex] is accessed after increments to resourceIndex to avoid undefined comparisons.packages/gitlab/src/emit-auxiliary-files.ts (1)
117-130: 💤 Low valueConsider adding a comment to clarify the null-byte key separator.
The null-byte separator (
\0) betweenprojectPathandobjectTypeis safe (null bytes cannot appear in filesystem paths), but unusual enough that a brief inline comment would help future maintainers understand the choice.📝 Suggested clarification
const getRecordReconciler = (projectPath: string, objectType: GitLabIndexedResourceType) => { + // Use null-byte separator to avoid collisions; null bytes cannot appear in path strings const key = `${projectPath}\0${objectType}`; let reconciler = recordReconcilers.get(key);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/gitlab/src/emit-auxiliary-files.ts` around lines 117 - 130, Add a brief inline comment next to the key construction in getRecordReconciler explaining why the null-byte separator (`\0`) is used (e.g., to create an unambiguous composite map key because null bytes cannot appear in filesystem paths) so future maintainers understand the choice; annotate the line where `const key = \`${projectPath}\0${objectType}\`;` (and/or above it) and mention that it avoids collisions and is safe given path constraints in recordReconcilers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/gitlab/src/commits/ingestion.ts`:
- Line 32: The commit-note path currently uses the optional
webhook.commit?.title (in computeCommitCommentPath call), which can diverge from
the canonical commit path that assumes commit.title exists; change the logic so
the canonical slug for the commit path is derived from a stable source: if
webhook.commit is present use its title, otherwise fetch the commit by sha (or
compute a deterministic fallback slug from sha) and pass that slug into
computeCommitCommentPath and any metadata path construction so both comments and
meta use the same slug; update the code paths that read webhook.commit (the call
site using projectPath, sha, webhook.commit?.title and the earlier code that
directly accesses webhook.commit.title) to use the new canonicalSlug variable.
In `@packages/gitlab/src/path-mapper.ts`:
- Around line 363-366: The decoder uses basename.lastIndexOf('__') which cuts
off IDs containing multiple "__" (e.g., release__candidate) — change
decodeFlatObjectId to locate the first "__" (use basename.indexOf('__') instead
of lastIndexOf) and return basename.slice(separatorIndex + 2) when
separatorIndex >= 0; ensure basename is still computed via
decodeURIComponent(segment.replace(/\.json$/, '')) and that the separatorIndex
check handles no-separator case by returning basename unchanged.
In `@scripts/writeback-contracts.mjs`:
- Around line 377-379: The reduce over schema.allOf that calls mergeJsonSchema
(via resolveAllOf) is replacing array-valued keywords (notably "required")
instead of merging them, which drops earlier required fields; update
mergeJsonSchema so that when both target and source have array-valued schema
keywords (e.g., "required", "enum", "required") it performs a
union/concatenation with deduplication rather than a simple replace, and ensure
resolveAllOf's reduce flow uses that behavior to preserve all required entries
when flattening allOf (also apply the same array-merge logic to the code paths
around the other merges referenced in the 409-428 area to keep behavior
consistent).
- Around line 44-52: The code currently allows duplicate operationId keys to be
overwritten when populating the operations Map; modify the loop that processes
source.operations so it checks if operations.has(operation.operationId) and, on
duplicate, throws an Error (or process.exit with a clear message) that includes
the conflicting operationId and the source/title (use source.title or
sourcePath) to fail fast; update the block that currently does
operations.set(operation.operationId, operation) to perform this existence check
and raise the duplicate-operationId error instead of silently replacing entries.
---
Nitpick comments:
In `@packages/gitlab/src/emit-auxiliary-files.ts`:
- Around line 117-130: Add a brief inline comment next to the key construction
in getRecordReconciler explaining why the null-byte separator (`\0`) is used
(e.g., to create an unambiguous composite map key because null bytes cannot
appear in filesystem paths) so future maintainers understand the choice;
annotate the line where `const key = \`${projectPath}\0${objectType}\`;` (and/or
above it) and mention that it avoids collisions and is safe given path
constraints in recordReconcilers.
In `@scripts/writeback-discovery-normalizer.mjs`:
- Around line 159-187: The function layoutPathMatchesResource is vulnerable to
out-of-bounds access after advancing resourceIndex past dynamic segments; after
the while loop that skips dynamic resource segments (the one that increments
resourceIndex while isDynamicSegment(resourceSegments[resourceIndex]) is true),
add an explicit bounds check (e.g., if resourceIndex >= resourceSegments.length
return false) before accessing resourceSegments[resourceIndex]; ensure the same
defensive check exists wherever resourceSegments[resourceIndex] is accessed
after increments to resourceIndex to avoid undefined comparisons.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 09223232-c21b-4619-985d-a8d4b7149356
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (61)
.claude/rules/adapter-writeback-discovery.md.claude/rules/alias-subtrees.md.claude/rules/generated-adapter-paths.md.claude/rules/naming-conventions.md.gitignoreAGENTS.mddocs/writeback-spec-coverage.mdpackage.jsonpackages/confluence/src/layout.tspackages/core/src/index.tspackages/core/src/layout-contract.test.tspackages/core/src/layout-contract.tspackages/github/discovery/github/.adapter.mdpackages/github/discovery/github/repos/{owner}/{repo}/issues/.schema.jsonpackages/github/discovery/github/repos/{owner}/{repo}/issues/{issueNumber}/comments/.schema.jsonpackages/github/discovery/github/repos/{owner}/{repo}/pulls/{pullNumber}/reviews/.schema.jsonpackages/github/src/layout.tspackages/gitlab/README.mdpackages/gitlab/discovery/gitlab/.adapter.mdpackages/gitlab/discovery/gitlab/projects/{projectPath}/issues/{issueIid}__{slug}/comments/.create.example.jsonpackages/gitlab/discovery/gitlab/projects/{projectPath}/issues/{issueIid}__{slug}/comments/.schema.jsonpackages/gitlab/discovery/gitlab/projects/{projectPath}/merge_requests/{mergeRequestIid}__{slug}/discussions/.create.example.jsonpackages/gitlab/discovery/gitlab/projects/{projectPath}/merge_requests/{mergeRequestIid}__{slug}/discussions/.schema.jsonpackages/gitlab/gitlab.mapping.yamlpackages/gitlab/src/adapter.tspackages/gitlab/src/alias-slug.tspackages/gitlab/src/commits/ingestion.tspackages/gitlab/src/digest.tspackages/gitlab/src/emit-auxiliary-files.tspackages/gitlab/src/index-emitter.tspackages/gitlab/src/index.tspackages/gitlab/src/issues/ingestion.tspackages/gitlab/src/layout-prompt.tspackages/gitlab/src/layout.tspackages/gitlab/src/mr/approvals.tspackages/gitlab/src/mr/discussions.tspackages/gitlab/src/mr/ingestion.tspackages/gitlab/src/path-mapper.tspackages/gitlab/src/pipeline/ingestion.tspackages/gitlab/src/resources.tspackages/gitlab/src/webhook/normalizer.tspackages/gitlab/src/writeback.tspackages/gitlab/test/digest.test.tspackages/gitlab/test/e2e-ingestion.test.tspackages/gitlab/test/emit-auxiliary-files.test.tspackages/gitlab/test/layout.test.tspackages/gitlab/test/path-mapper.test.tspackages/gitlab/test/webhook-normalizer.test.tspackages/gitlab/test/writeback.test.tspackages/gitlab/workflows/060-gitlab-adapter.tspackages/jira/src/layout.tspackages/linear/src/layout.tspackages/notion/src/layout.tspackages/slack/src/layout.tsscripts/generate-writeback-discovery.mjsscripts/integration-contracts/github/source/openapi.yamlscripts/integration-contracts/github/writeback.openapi.jsonscripts/writeback-contracts.mjsscripts/writeback-discovery-data.mjsscripts/writeback-discovery-normalizer.mjsscripts/writeback-discovery-normalizer.test.mjs
| ): IngestOperation { | ||
| return { | ||
| path: computeCommitCommentPath(projectPath, sha, webhook.object_attributes.id), | ||
| path: computeCommitCommentPath(projectPath, sha, webhook.object_attributes.id, webhook.commit?.title), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf 'Inspect GitLab note webhook typing around commit.title:\n'
rg -n -C3 'GitLabNoteWebhook|commit\??:|title\??:' packages/gitlab/src
printf '\nInspect webhook fixtures/tests mentioning commit title:\n'
rg -n -C2 'commit.*title|object_kind.*note|note webhook|commit_note' packages/gitlabRepository: AgentWorkforce/relayfile-adapters
Length of output: 31908
Ensure commit-note paths are derived from a stable canonical source, not optional webhook fields.
When line 32 receives an optional webhook.commit?.title, note files can land under /commits/<sha>/comments/... while the commit record is emitted under /commits/<sha>__<slug>/meta.json, splitting one commit across two canonical prefixes.
Line 18 assumes commit.title exists (direct access without optional chaining), contradicting line 32's optional-chaining fallback. The gitlab.mapping.yaml template (line 17) also assumes commit fields are present. If GitLab sends commit-note webhooks without a commit field, this creates a path consistency violation.
Verify whether commit is guaranteed in GitLabNoteWebhook for commit-note events. If it can be absent, derive the canonical path segment from a stable source (e.g., the fetched commit or a fallback slug mechanism) rather than silently omitting the slug when the webhook field is missing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/gitlab/src/commits/ingestion.ts` at line 32, The commit-note path
currently uses the optional webhook.commit?.title (in computeCommitCommentPath
call), which can diverge from the canonical commit path that assumes
commit.title exists; change the logic so the canonical slug for the commit path
is derived from a stable source: if webhook.commit is present use its title,
otherwise fetch the commit by sha (or compute a deterministic fallback slug from
sha) and pass that slug into computeCommitCommentPath and any metadata path
construction so both comments and meta use the same slug; update the code paths
that read webhook.commit (the call site using projectPath, sha,
webhook.commit?.title and the earlier code that directly accesses
webhook.commit.title) to use the new canonicalSlug variable.
| function decodeFlatObjectId(segment: string): string { | ||
| const basename = decodeURIComponent(segment.replace(/\.json$/, '')); | ||
| const separatorIndex = basename.lastIndexOf('__'); | ||
| return separatorIndex > 0 ? basename.slice(separatorIndex + 2) : basename; |
There was a problem hiding this comment.
Decode flat record IDs from the first __, not the last.
Line 365 breaks round-tripping for flat IDs that contain __. A tag like release__candidate is emitted as release-candidate__release__candidate.json, but the current decoder returns only candidate.
Proposed fix
function decodeFlatObjectId(segment: string): string {
const basename = decodeURIComponent(segment.replace(/\.json$/, ''));
- const separatorIndex = basename.lastIndexOf('__');
+ const separatorIndex = basename.indexOf('__');
return separatorIndex > 0 ? basename.slice(separatorIndex + 2) : basename;
}As per coding guidelines: Slug format rules: ASCII only, lowercase, hyphen-separated; truncate to 80 characters at word boundary; empty slugs fall back to bare ID or untitled for aliases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/gitlab/src/path-mapper.ts` around lines 363 - 366, The decoder uses
basename.lastIndexOf('__') which cuts off IDs containing multiple "__" (e.g.,
release__candidate) — change decodeFlatObjectId to locate the first "__" (use
basename.indexOf('__') instead of lastIndexOf) and return
basename.slice(separatorIndex + 2) when separatorIndex >= 0; ensure basename is
still computed via decodeURIComponent(segment.replace(/\.json$/, '')) and that
the separatorIndex check handles no-separator case by returning basename
unchanged.
| sources.push({ | ||
| path: sourcePath, | ||
| kind: source.kind, | ||
| title: source.title, | ||
| sourceUrl: source.sourceUrl, | ||
| }); | ||
| for (const operation of source.operations) { | ||
| operations.set(operation.operationId, operation); | ||
| } |
There was a problem hiding this comment.
Fail fast on duplicate contract operationIds.
Line 51 currently lets the last file win when two provider sources declare the same operationId. That makes discovery output depend on directory order and can attach the wrong schema to an endpoint.
Proposed fix
sources.push({
path: sourcePath,
kind: source.kind,
title: source.title,
sourceUrl: source.sourceUrl,
});
for (const operation of source.operations) {
+ const existing = operations.get(operation.operationId);
+ if (existing) {
+ throw new Error(
+ `Duplicate writeback contract operationId "${operation.operationId}" for provider "${provider}" in ${existing.sourcePath} and ${operation.sourcePath}`,
+ );
+ }
operations.set(operation.operationId, operation);
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/writeback-contracts.mjs` around lines 44 - 52, The code currently
allows duplicate operationId keys to be overwritten when populating the
operations Map; modify the loop that processes source.operations so it checks if
operations.has(operation.operationId) and, on duplicate, throws an Error (or
process.exit with a clear message) that includes the conflicting operationId and
the source/title (use source.title or sourcePath) to fail fast; update the block
that currently does operations.set(operation.operationId, operation) to perform
this existence check and raise the duplicate-operationId error instead of
silently replacing entries.
| return schema.allOf | ||
| .map(resolveAllOf) | ||
| .reduce((merged, entry) => mergeJsonSchema(merged, entry), withoutAllOf); |
There was a problem hiding this comment.
Preserve all required keys when flattening allOf.
Line 379 reduces allOf through mergeJsonSchema, but Lines 410-415 replace arrays instead of merging them. For object schemas that means earlier required entries are dropped, so the normalized schema becomes less strict than the contract.
Proposed fix
const merged = { ...base };
for (const [key, value] of Object.entries(override)) {
+ if (key === 'required' && Array.isArray(value)) {
+ merged.required = [
+ ...new Set([
+ ...(Array.isArray(base.required) ? base.required : []),
+ ...value,
+ ]),
+ ];
+ continue;
+ }
if (key === 'properties' && isPlainObject(value)) {
merged.properties = mergeJsonSchema(base.properties ?? {}, value);
continue;
}Also applies to: 409-428
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/writeback-contracts.mjs` around lines 377 - 379, The reduce over
schema.allOf that calls mergeJsonSchema (via resolveAllOf) is replacing
array-valued keywords (notably "required") instead of merging them, which drops
earlier required fields; update mergeJsonSchema so that when both target and
source have array-valued schema keywords (e.g., "required", "enum", "required")
it performs a union/concatenation with deduplication rather than a simple
replace, and ensure resolveAllOf's reduce flow uses that behavior to preserve
all required entries when flattening allOf (also apply the same array-merge
logic to the code paths around the other merges referenced in the 409-428 area
to keep behavior consistent).
Summary
Verification
Follow-up