Skip to content

Commit 5bb637e

Browse files
sjarmakclaude
andcommitted
feat: US-013 - Starter tasks: Category B vulnerability remediation (2 tasks)
- CCX-vuln-remed-011: CVE-2024-47764 (cookie npm package), nodejs-web-stack fixture - Oracle: sg-benchmarks/expressjs-express package.json (cookie ^0.7.1 runtime dep) - eval.sh: file_set_match + keyword_presence - Validity gate: VALID (gold=1.0, empty=0.0) - CCX-vuln-remed-014: Missing auth middleware audit, grafana-observability fixture - Oracle: sg-benchmarks/grafana-loki pkg/dataobj/explorer/service.go - eval.sh: file_set_match + provenance - Validity gate: VALID (gold=1.0, empty=0.0) - Both tasks in benchmarks/ccb_mcp_security/, registered in selected_mcp_unique_tasks.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 0f5d9e3 commit 5bb637e

File tree

21 files changed

+1893
-59
lines changed

21 files changed

+1893
-59
lines changed
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
FROM ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Base tools
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
git \
8+
ca-certificates \
9+
curl \
10+
python3 \
11+
&& rm -rf /var/lib/apt/lists/*
12+
13+
WORKDIR /workspace
14+
15+
# Clone local checkout repos (baseline config: agent has local access to these)
16+
RUN git clone --depth 1 --branch v22.13.0 https://github.com/nodejs/node /workspace/node
17+
18+
# Initialize git identity for agent commits
19+
RUN git config --global user.email "agent@example.com" && \
20+
git config --global user.name "Agent" && \
21+
git config --global safe.directory '*'
22+
23+
# Create log directories
24+
RUN mkdir -p /logs/agent /logs/verifier
25+
26+
ENTRYPOINT []
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# CCX-vuln-remed-011 — sg_only variant
2+
# No local repo clone — agent uses Sourcegraph MCP exclusively for code access.
3+
4+
FROM ubuntu:22.04
5+
6+
ENV DEBIAN_FRONTEND=noninteractive
7+
8+
RUN apt-get update && apt-get install -y --no-install-recommends \
9+
git \
10+
ca-certificates \
11+
python3 \
12+
curl \
13+
&& rm -rf /var/lib/apt/lists/*
14+
15+
WORKDIR /workspace
16+
17+
# Empty workspace — agent discovers code via MCP tools only
18+
RUN git init && \
19+
git config user.email "agent@example.com" && \
20+
git config user.name "Agent" && \
21+
git config --global safe.directory '*'
22+
23+
# Create log directories
24+
RUN mkdir -p /logs/agent /logs/verifier
25+
26+
# Mark sg_only mode — verifiers and eval scripts check this flag
27+
RUN touch /tmp/.sg_only_mode
28+
29+
ENTRYPOINT []
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# CVE Remediation: Vulnerable `cookie` Package Dependency
2+
3+
## Your Task
4+
5+
Your security team has raised an alert about **CVE-2024-47764**, which affects the
6+
`cookie` npm package in versions prior to `0.7.0`. An attacker can exploit this
7+
vulnerability to send a crafted HTTP request that bypasses cookie security controls.
8+
9+
You need to identify all `package.json` files across your Node.js web stack repos
10+
that declare **`cookie` as a direct runtime dependency** (listed under
11+
`"dependencies"`, not `"devDependencies"`).
12+
13+
For each match, report:
14+
- The repository (`org/repo-name`)
15+
- The file path within the repository
16+
- The version constraint declared for `cookie`
17+
18+
## Context
19+
20+
You are doing a security audit of your Node.js web stack, which spans multiple repos
21+
across different organizations. The stack includes the runtime, the web framework, a
22+
utility library, and a database ORM.
23+
24+
This is a cross-org audit: you need to check all repos in the stack — not just the
25+
one cloned locally — to ensure no vulnerable dependency slips through.
26+
27+
## Available Resources
28+
29+
The local `/workspace/` directory contains: `nodejs/node`.
30+
31+
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
32+
- `sg-benchmarks/expressjs-express` (web-framework)
33+
- `sg-benchmarks/lodash` (utility-library)
34+
- `sg-benchmarks/prisma-prisma` (database-orm)
35+
36+
## Output Format
37+
38+
Create a file at `/workspace/answer.json` with your findings:
39+
40+
```json
41+
{
42+
"files": [
43+
{
44+
"repo": "org/repo-name",
45+
"path": "relative/path/to/package.json",
46+
"version": "the-version-constraint"
47+
}
48+
],
49+
"text": "Narrative explanation citing the repos and version constraints found."
50+
}
51+
```
52+
53+
Include only entries where `cookie` appears under `"dependencies"` (not `"devDependencies"`
54+
or `"scripts"`). Your answer is evaluated against a closed-world oracle — completeness matters.
55+
56+
## Evaluation
57+
58+
Your answer will be scored on:
59+
- **File recall and precision**: Did you find all package.json files that declare `cookie` as a runtime dependency?
60+
- **Keyword presence**: Does your answer include the exact version constraint found?
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
name = "CCX-vuln-remed-011"
5+
description = "CVE remediation: find package.json files declaring vulnerable cookie dependency"
6+
license = "Apache-2.0"
7+
8+
[task]
9+
id = "CCX-vuln-remed-011"
10+
repo = "nodejs/node"
11+
category = "vuln-remediation"
12+
language = "javascript"
13+
difficulty = "medium"
14+
time_limit_sec = 900
15+
mcp_suite = "ccb_mcp_security"
16+
use_case_id = 11
17+
repo_set_id = "nodejs-web-stack"
18+
mcp_unique = true
19+
20+
[verification]
21+
type = "eval"
22+
command = "bash /tests/eval.sh"
23+
24+
reward_type = "score"
25+
description = "CVE remediation: find package.json files declaring vulnerable cookie dependency"
26+
27+
[environment]
28+
build_timeout_sec = 600.0
Binary file not shown.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/bash
2+
# eval.sh — MCP-unique benchmark evaluator for CCX-vuln-remed-011
3+
# Exit-code-first (SWE-Factory pattern):
4+
# exit 0 — agent produced useful output (composite score > 0)
5+
# exit 1 — total failure (composite score == 0 or missing answer)
6+
#
7+
# Writes /logs/verifier/reward.txt with the composite score [0.0, 1.0]
8+
9+
set -euo pipefail
10+
11+
TASK_ID="CCX-vuln-remed-011"
12+
ANSWER_PATH="/workspace/answer.json"
13+
TASK_SPEC_PATH="/tests/task_spec.json"
14+
ORACLE_CHECKS="/tests/oracle_checks.py"
15+
REWARD_PATH="/logs/verifier/reward.txt"
16+
17+
mkdir -p /logs/verifier
18+
19+
echo "=== CCX-vuln-remed-011 evaluator ==="
20+
echo "Task spec: $TASK_SPEC_PATH"
21+
echo "Answer: $ANSWER_PATH"
22+
echo ""
23+
24+
# sg_only mode guard: restore full repo if verifier wrapper exists
25+
if [ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ]; then
26+
echo "sg_only mode: sourcing verifier wrapper..."
27+
source /tests/sgonly_verifier_wrapper.sh
28+
fi
29+
30+
# Verify answer file exists
31+
if [ ! -f "$ANSWER_PATH" ]; then
32+
echo "ERROR: answer.json not found at $ANSWER_PATH"
33+
echo "0.0" > "$REWARD_PATH"
34+
exit 1
35+
fi
36+
37+
# Validate answer is valid JSON
38+
if ! python3 -c "import json; json.load(open('$ANSWER_PATH'))" 2>/dev/null; then
39+
echo "ERROR: answer.json is not valid JSON"
40+
echo "0.0" > "$REWARD_PATH"
41+
exit 1
42+
fi
43+
44+
echo "answer.json found and valid JSON"
45+
46+
# Run oracle checks
47+
if [ ! -f "$ORACLE_CHECKS" ]; then
48+
echo "ERROR: oracle_checks.py not found at $ORACLE_CHECKS"
49+
echo "0.0" > "$REWARD_PATH"
50+
exit 1
51+
fi
52+
53+
echo "Running oracle checks..."
54+
SCORE=$(python3 "$ORACLE_CHECKS" --answer "$ANSWER_PATH" --spec "$TASK_SPEC_PATH" --verbose 2>&1 | tee /dev/stderr | tail -1)
55+
56+
# Validate score is a number
57+
if ! echo "$SCORE" | python3 -c "import sys; float(sys.stdin.read().strip())" 2>/dev/null; then
58+
echo "ERROR: oracle_checks.py did not return a valid score: $SCORE"
59+
echo "0.0" > "$REWARD_PATH"
60+
exit 1
61+
fi
62+
63+
echo ""
64+
echo "Composite score: $SCORE"
65+
echo "$SCORE" > "$REWARD_PATH"
66+
67+
# Exit based on score (SWE-Factory exit-code-first pattern)
68+
python3 -c "import sys; sys.exit(0 if float('$SCORE') > 0 else 1)"
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"files": [
3+
{"repo": "sg-benchmarks/expressjs-express", "path": "package.json", "version": "^0.7.1"}
4+
],
5+
"text": "Found 1 package.json file across the nodejs-web-stack repos that declares cookie as a direct runtime dependency: sg-benchmarks/expressjs-express package.json specifies \"cookie\": \"^0.7.1\" under dependencies. The lodash and prisma repos do not declare the cookie package as a runtime dependency. The nodejs/node repository does not declare cookie as a top-level package dependency.",
6+
"_metadata": {
7+
"oracle_type": "file_set_match",
8+
"discovery_method": "sourcegraph_keyword_search",
9+
"query": "repo:^github.com/sg-benchmarks/expressjs-express$ file:package.json \"cookie\"",
10+
"version": "^0.7.1",
11+
"cve": "CVE-2024-47764"
12+
}
13+
}

0 commit comments

Comments
 (0)