Skip to content

feat(challenges): add term-challenge WASM module with scoring and evaluation#37

Open
echobt wants to merge 3 commits intomainfrom
feat/term-challenge-wasm-module
Open

feat(challenges): add term-challenge WASM module with scoring and evaluation#37
echobt wants to merge 3 commits intomainfrom
feat/term-challenge-wasm-module

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 17, 2026

Summary

Adds the term-challenge crate under challenges/term-challenge/, a terminal benchmark challenge compiled as a cdylib WASM module. This implements deterministic task evaluation and scoring for AI agents across multiple difficulty levels.

Changes

  • New crate term-challenge (challenges/term-challenge/)

    • no_std-compatible library with cdylib+rlib crate types for WASM output
    • Core types: ChallengeId, ChallengeInfo, EvaluateRequest, ValidateRequest, AgentInfo, WeightAssignment, and ChallengeEvaluationResult
    • Task system with Task, TaskConfig, TaskResult, Difficulty (Easy/Medium/Hard), and configurable timeouts, Docker images, and validation types
    • Scoring engine (ScoreCalculator) with difficulty-weighted aggregation, per-difficulty stats, and normalized weight output
    • WASM exports: challenge_info, evaluate, validate, allocate, deallocate with JSON serialization over shared memory
    • Host function imports from platform_network (HTTP) and platform_storage (key-value storage) namespaces
    • Custom WASM allocator and panic handler gated behind cfg(target_arch = "wasm32")
  • Workspace integration

    • Added challenges/term-challenge and crates/challenge-orchestrator to workspace members
  • platform-core additions

    • ChallengeContainerConfig struct with validation for Docker-based challenge orchestration
    • ALLOWED_DOCKER_PREFIXES constant for image registry whitelisting

Introduce the term-challenge-wasm crate at challenges/term-challenge-wasm/,
implementing the Challenge trait from platform-challenge-sdk-wasm as a
no_std-compatible WASM module targeting wasm32-unknown-unknown (cdylib).

The crate contains:
- lib.rs: TermChallenge unit struct implementing Challenge with name()
  returning "term-challenge", version() returning "0.2.3", evaluate()
  deserializing EvalParams via bincode and computing aggregate scores,
  and validate() for input validation. Registered via register_challenge!.
- evaluation.rs: Core evaluation logic with EvalParams deserialization,
  agent data size validation (1MB limit), score calculation delegation,
  and no_std-compatible numeric formatting for result messages.
- scoring.rs: ScoreCalculator with aggregate scoring (pass/fail counting,
  pass rate, normalized scores), DifficultyStats, and AggregateScore
  types with score-to-i64 conversion (0-10000 range).
- tasks.rs: TaskDefinition and TaskResult types with Difficulty enum
  (Easy/Medium/Hard with weights), serde support, and convenience
  constructors for success/failure results.

Also updates the register_challenge! macro in challenge-sdk-wasm to accept
an explicit initializer expression instead of requiring Default, enabling
unit structs to be used directly without a Default impl. The macro
signature changes from register_challenge!(Type) to
register_challenge!(Type, Expr).

The crate is added to the workspace Cargo.toml members list and Cargo.lock
is updated accordingly.
@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

Warning

Rate limit exceeded

@echobt has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/term-challenge-wasm-module

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…asm-module

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	crates/challenge-sdk-wasm/src/lib.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant