Add Sounio (.sio) as a target language#191
Draft
agourakis82 wants to merge 3 commits into
Draft
Conversation
Sounio (https://github.com/Sounio-lang/sounio) is an epistemic systems language with Rust-shaped type annotations. This commit lands the translator + evaluation backend so MultiPL-E can grade code-generation models on .sio prompts. Translator (dataset_builder/humaneval_to_sounio.py) - Subclasses LanguageTranslator. - Maps int/float/bool/str -> i64/f64/bool/String; List[T]/Dict[K,V]/Tuple[..]/Optional[T] -> Vec/HashMap/tuple/Option. - Expands negative literals to (0 - n) (no unary minus in Sounio). - Emits a generous `with Mut, Panic, Div` effect set on user fns and `with IO, Mut, Panic, Div` on the test-harness main. - 148/161 HumanEval and 356/400 MBPP-typed prompts translate cleanly (parity with Rust). Evaluation backend (evaluation/src/eval_sounio.py) - `eval_script(path)` returning the standard MultiPL-E verdict shape (OK / SyntaxError / Exception / Timeout). - Uses souc's raw pass-through (`souc <src> <out>`) because the `compile -o` subcommand is broken in 1.0.0-beta.5 — fix tracked upstream. - chmod +x on the produced ELF (binary writer omits the bit). Container (evaluation/Dockerfile.sounio) - debian:bookworm-slim + python3 + a pinned souc-linux-x86_64 (SHA256-verified). SOUNIO_VERSION / SOUNIO_BIN_SHA256 are the reproducibility contract. Terms / docs - dataset_builder/terms.csv: Sounio row. - dataset_builder/sounio_translator_notes.md: type-mapping rationale, six handled edge-cases, known limitations. Validation - references/{hand,auto}/ + references/validate.py: 20/20 pairs PASS structurally (body region stripped — hand bodies are the human's contribution, the translator's contract is the prompt header + test harness). - agent_logs/Cx2_acceptance.md: T1/T1.5/T3/T4/T6 PASS; T2/T5 deferred with explicit operator-action follow-ups. - agent_logs/Cx2_convergence.md: three iterative-convergence cycles + an adversarial-self-critique table for ten random translations. Generated with Claude Code (Opus 4.7) as Cx-2 under operator supervision (GAIDeT / ICMJE-2025 agent-assisted authorship disclosure).
Cycle 4 of iterative convergence: ran souc 1.0.0-beta.5 on five trivial hand bodies. Three (gcd, largest_divisor, is_prime) compile and pass all asserts end-to-end — real evidence the translator output is consumable. Two (strlen, triangle_area) failed typecheck because the chosen Sounio surface ops (String.len, 'as f64' cast) aren't yet stable; those bodies are now panic stubs so validate.py PASS stays honest. Log captured in agent_logs/Cx2_convergence.md.
Adds the first published Sounio MultiPL-E baseline. This is the floor of the floor — a 1.3B base model with no Sounio in its training mix cannot produce syntactically valid Sounio (143/148 SyntaxError, 5/148 Exception, 0/148 OK). The point is provenance: every future Sounio number now has this reference. Spec deviation: spec calls for deepseek-coder-6.7b-base. The 6.7B safetensors mmap (9.97GiB) exceeds the eval host's vmem ulimit (24GiB); fully documented in results/sounio_README.md. scripts/run_baseline.py supports --model so the operator can rerun on a larger host without code changes. Artifacts: results/sounio_deepseek-coder-1.3b.jsonl (per-problem + completions) results/sounio_deepseek-coder-1.3b.summary.json (machine-readable) results/sounio.csv (upstream lang,problem,verdict format) results/sounio_README.md (methodology + reproduction) scripts/run_baseline.py (generation + grading driver) Compute: single NVIDIA L4 (23GiB), CUDA 13.2, wall-clock 785s.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds Sounio (Sounio-lang/sounio,
souc 1.0.0-beta.5) to MultiPL-E. Submitted as a draft so maintainers can shape the integration before baseline numbers are added.dataset_builder/humaneval_to_sounio.py— translator (LanguageTranslatorsubclass).dataset_builder/terms.csv— Sounio row.dataset_builder/sounio_translator_notes.md— type-mapping rationale + edge-case log.evaluation/src/eval_sounio.py— backend that runssoucand returns the MultiPL-E verdict shape.evaluation/Dockerfile.sounio— SHA-pinned, single-language reproducible image.references/{hand,auto}/+references/validate.py— 20 hand-validated translations.prompts/humaneval-sio-reworded.jsonl,prompts/mbpp-sio-reworded.jsonl— generated dataset slices (regenerable fromdataset_builder/prepare_prompts_for_hfhub.py).agent_logs/Cx2_acceptance.md,agent_logs/Cx2_convergence.md— acceptance-test log + 3-cycle iterative-convergence log.Why
Sounio is a typed, effect-tracked systems language with Rust-shaped syntax. Without a MultiPL-E entry it cannot be compared against other languages in cross-language code-generation evaluations, which keeps it out of the standard literature comparisons. The translator follows the existing patterns established by
humaneval_to_rs.pyso reviewers see a small, familiar surface.How
Translator
Subclasses
LanguageTranslator. Type mapping:inti64floatf64boolboolstrStringList[T]Vec<T>Dict[K, V]HashMap<K, V>Tuple[…](…)Optional[T]Option<T>Sounio-specific rules baked into the translator:
(0 - n)(and similarly forf64).with Mut, Panic, Div; the test-harnessmainemitswith IO, Mut, Panic, Div.vec![]→Vec::<T>::new(),HashMap::from([])→HashMap::<K,V>::new()(same approach Rust uses).Optional-typed call arguments are wrapped inSome(…)at the call site, mirroringhumaneval_to_rs.coerce.Union/untyped/Anyprompts are skipped exactly as Rust does. Translation rates: 148/161 HumanEval (0.92), 356/400 MBPP-typed (0.89) — parity with Rust.Evaluation backend
Standard
eval_script(path)returning{status, exit_code, stdout, stderr}. Two integration quirks worth flagging:souc 1.0.0-beta.5'scompile -osubcommand path is broken — the raw pass-throughsouc <src> <out>works. The eval backend uses the raw form; a fix is tracked upstream.eval_scriptchmod 0o755s before invoking.Container
evaluation/Dockerfile.souniois a small single-language image:debian:bookworm-slim+python3+ a pinnedsouc-linux-x86_64. Reproducibility is enforced viaSOUNIO_VERSIONandSOUNIO_BIN_SHA256(sha256sum -cfails the build on drift). This sits alongside the main multi-languageDockerfilerather than perturbing it.Baseline numbers
Pending. The translator + harness land in this PR so the contract can be reviewed first. A follow-up commit will add
results/sounio_deepseek-coder-6.7b.jsonlwith pass@1 / pass@10 againstdeepseek-ai/deepseek-coder-6.7b-base(temp 0.2, n=10). Expected pass@1 range: 5–15% (Sounio is a low-resource language, so the point is to publish the floor).Validation
references/hand/spanning the spec-required categories: 5 trivial, 5 list/iter, 5 control/recursion, 3 dict/set, 2 edge-case.references/validate.pystrips the function body region and compares the translator-produced surface (signature, types, literals, asserts, harness). 20/20 PASS.agent_logs/Cx2_convergence.md— no semantic divergences found.Reproducibility
soucpin:v1.0.0-beta.5, SHA2563cbea2b475e79737046f8ccf463c07d22cd5fb678fd479a032ee04bd8e19da93.datasets/originals/(161 HumanEval) anddatasets/mbpp-typed/(400 MBPP).agent_logs/Cx2_acceptance.md).CI status
references/validate.pyis the structural gate the PR commits to; T1 / T1.5 / T3 / T4 / T6 are passing locally. T2 (full Docker build ofDockerfile.sounio) requires thesouc-linux-x86_64asset to be attached to the upstream Sounio release — coordinated outside this PR and documented inagent_logs/Cx2_acceptance.md. Happy to wire whichever CI workflow the maintainers prefer.Open questions for maintainers
evaluation/Dockerfile, or keepDockerfile.sounioseparate? The current approach favors isolation; happy to merge into the main image if you'd rather.["\n}"](matches Rust). If Sounio later acquires a function syntax that legitimately closes on}mid-expression, we'd want a tighter stop — not an issue today.Union/Anyskip. Let me know if you'd prefer a different default policy (e.g., silently emitting apanic!for those prompts so the count matches).Disclosure
Authored with Claude Code (Opus 4.7) operating as agent Cx-2 under direct human supervision (operator: @agourakis82). All translator decisions, edge-case handling, and acceptance criteria were reviewed and approved by the operator before push. This disclosure satisfies the ICMJE 2025 / GAIDeT contributor-statement requirement for agent-assisted authorship.