feat(challenges): add term-challenge WASM module with scoring and evaluation#37
feat(challenges): add term-challenge WASM module with scoring and evaluation#37
Conversation
Introduce the term-challenge-wasm crate at challenges/term-challenge-wasm/, implementing the Challenge trait from platform-challenge-sdk-wasm as a no_std-compatible WASM module targeting wasm32-unknown-unknown (cdylib). The crate contains: - lib.rs: TermChallenge unit struct implementing Challenge with name() returning "term-challenge", version() returning "0.2.3", evaluate() deserializing EvalParams via bincode and computing aggregate scores, and validate() for input validation. Registered via register_challenge!. - evaluation.rs: Core evaluation logic with EvalParams deserialization, agent data size validation (1MB limit), score calculation delegation, and no_std-compatible numeric formatting for result messages. - scoring.rs: ScoreCalculator with aggregate scoring (pass/fail counting, pass rate, normalized scores), DifficultyStats, and AggregateScore types with score-to-i64 conversion (0-10000 range). - tasks.rs: TaskDefinition and TaskResult types with Difficulty enum (Easy/Medium/Hard with weights), serde support, and convenience constructors for success/failure results. Also updates the register_challenge! macro in challenge-sdk-wasm to accept an explicit initializer expression instead of requiring Default, enabling unit structs to be used directly without a Default impl. The macro signature changes from register_challenge!(Type) to register_challenge!(Type, Expr). The crate is added to the workspace Cargo.toml members list and Cargo.lock is updated accordingly.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…asm-module # Conflicts: # Cargo.lock # Cargo.toml # crates/challenge-sdk-wasm/src/lib.rs
Summary
Adds the
term-challengecrate underchallenges/term-challenge/, a terminal benchmark challenge compiled as acdylibWASM module. This implements deterministic task evaluation and scoring for AI agents across multiple difficulty levels.Changes
New crate
term-challenge(challenges/term-challenge/)no_std-compatible library withcdylib+rlibcrate types for WASM outputChallengeId,ChallengeInfo,EvaluateRequest,ValidateRequest,AgentInfo,WeightAssignment, andChallengeEvaluationResultTask,TaskConfig,TaskResult,Difficulty(Easy/Medium/Hard), and configurable timeouts, Docker images, and validation typesScoreCalculator) with difficulty-weighted aggregation, per-difficulty stats, and normalized weight outputchallenge_info,evaluate,validate,allocate,deallocatewith JSON serialization over shared memoryplatform_network(HTTP) andplatform_storage(key-value storage) namespacescfg(target_arch = "wasm32")Workspace integration
challenges/term-challengeandcrates/challenge-orchestratorto workspace membersplatform-coreadditionsChallengeContainerConfigstruct with validation for Docker-based challenge orchestrationALLOWED_DOCKER_PREFIXESconstant for image registry whitelisting