4.0 by jsuarez5341 · Pull Request #402 · PufferAI/PufferLib

jsuarez5341 · 2025-10-24T20:01:27Z

This PR will not be merged. We are targeting EoY and 4.0 will just become master. Key goals:

Sweeps for all envs, largest ever public dataset of RL experiments
Constellation
Major perf enhancements

TBD: cpp/barracuda, final constellation features, xlstm, advantage calc tweaks

Rewrites craftax_spawn_mobs_native to strip JAX-isms that are pointless on CPU: - bool[48][48] validity mask -> compact (int16, int16) coord list collected in one pass over the bounding box around the player - bounding-box scan: mobs can only spawn within MOB_DESPAWN_DISTANCE=14, so we only visit the up-to-27x27 window instead of the full 48x48 map - early return when can_spawn is already false from the mob-cap or probability roll, skipping the scan + choice - merged count_mobs3 + first_empty_mobs3 into a single loop - inlined the block-type and distance checks Choice arithmetic uses the same FP expressions as baseline so the selected cell is bitwise-identical for any given (valid_count, rng_key) pair. The baseline quirk of writing type_id[level][slot] unconditionally even when no mob spawns is preserved. Phase timing (single-thread, random actions): craftax_spawn_mobs_native: 17.06 us -> 0.30 us (57x) full c_step: 29.6 us -> 12.3 us (2.4x) Verified bitwise-equal to the prior implementation over 1.28M paired steps (128 envs x 10000 steps, random actions, reset exercised).

c_reset and the c_step auto-reset path now optionally memcpy from a pre-generated pool of worlds instead of running generate_world each episode. Pool size is a runtime kwarg (reset_pool_size) read by my_init, default 1024 via config/ocean/craftax.ini. Set to 0 to disable and regenerate every reset (required for strict per-seed determinism in tests/craftax_parity.py). Trade: at most reset_pool_size unique maps are seen per process. With 1024 and ~270-step random-action episodes, diversity is plentiful for training. Memory cost: 1024 * sizeof(CraftaxState) ~= 267 MB once at startup. Two reset entry points are now distinguished: - craftax_reset_state_from_reset_key: direct (used by parity harness), always calls generate_state_from_world_key, pool-free for exact per-key determinism. - craftax_reset_state_on_done: hot-path used by c_step on terminal, consults the pool when enabled, falls through to generate_world otherwise. Pool index derived from reset_key.word[0]. tests/craftax_parity.py picks up raylib's include path since craftax.h now pulls raylib.h (from the shared renderer). Measurements (single-thread, random actions): worldgen: 2.69 ms -> 6.9 us memcpy (~390x) full c_step: 12.3 us -> 2.35 us (5.25x) training SPS: 450K -> 506K (+12%) 1-thread sim SPS: 81K -> 425K (5.25x) 16-thread sim SPS: 1.14M -> 5.53M (4.85x)

The five move_* helpers (melee/passive/ranged mobs + mob/player projectiles) now return immediately when mask=false. JAX's branchless "compute-then-mask" pattern is pointless on CPU: dead slots' output never feeds observations, rewards, or mob_map, so skipping the body and the RNG draws is semantically equivalent. Defining CRAFTAX_JAX_PARITY at build time restores the branchless slow path for bitwise replay against JAX (required by tests/craftax_parity.py). Default build uses the early-out. Also drops craftax_step_jax_index(player_level, NUM_LEVELS) clamps at the top of each move_* -- state->player_level is maintained in [0, NUM_LEVELS-1] by change_floor_native (explicit bounds checks) and by the worldgen init. Six redundant clamps per step eliminated. Measurements (single-thread, random actions, pool=1024): update_mobs phase: 1.392 us -> 0.285 us (4.88x) full c_step: 2.35 us -> 1.22 us 1-thread sim SPS: 425K -> 819K (1.93x) 16-thread sim SPS: 5.53M -> 10.04M (1.82x) training SPS: 506K -> 544K (+7%) Parity test with CRAFTAX_JAX_PARITY defined passes 8 seeds * 1000 steps over 27 terminals. Without the flag, parity diverges at the first mob death -- by design.

These 10 tests were written incrementally as each subsystem (noise, threefry, worldgen, 7 step subsystems) was ported from JAX, to catch divergence at each layer. Now that tests/craftax_parity.py passes end-to-end against the JAX reference, they are redundant: any bug they'd catch also breaks the integration parity test. Dropping ~5400 LOC of scaffolding. Kept: - craftax_parity.py (JAX<->C integration parity harness) - craftax_state_fixtures.py (state-flattening helpers used by parity) - craftax_parity_stress.py (adversarial action sequences) - craftax_step_full_test.py (pytest wrapper -> parity.run)

The dashboard and CSV logger only need to surface a handful of milestones along the tech/exploration curve, not every achievement. The env still tracks all 67 internally for reward computation and for the normalized 'perf' aggregate -- we just stop shipping every one through the log Dict each episode flush. Checkpoints chosen to span the learning curve: collect_wood first resource (tier 1) make_wood_pickaxe first tool make_stone_pickaxe stone tier collect_iron iron tier resource make_iron_pickaxe iron tier tool (major milestone) collect_diamond diamond tier resource enter_gnomish_mines first dungeon (exploration) defeat_necromancer endgame boss Log Dict now carries 4 meta + 8 achievements + 1 n = 13 fields, well under the stock create_dict(32) capacity. Releases the need for the capacity bump in src/bindings* (reverted in the following commit).

This reverts commit 9396e79.

- config/ocean/craftax.ini -> config/craftax.ini - config/ocean/craftax_classic.ini -> config/craftax_classic.ini - ocean/craftax/textures.bin -> resources/craftax/textures.bin - scripts/craftax_convergence_bench.py -> tests/craftax_convergence_bench.py - drop empty scripts/ directory - pack_textures.py: write to resources/craftax/textures.bin - craftax.h / craftax_classic.h: fopen textures from resources/craftax/

Used by the craftax parity harness to compile with -DCRAFTAX_JAX_PARITY, which disables the update_mobs early-out so the C env replays bitwise against JAX. Default training builds leave EXTRA_CFLAGS empty and keep the ~2x sim-SPS early-out enabled.

Craftax Full: native C port + optimizations + renderer

Fix heap overflow in constallation

KTibow · 2026-04-27T02:22:47Z

Can probably be closed :)

Optimize Craftax CPU rollout

Dr Mario Env v2

Dino Env ported to 4.0

Add laser puzzle environment Merging w/ request for cleanups made on stream

puffer aim trainer hehe

jsuarez5341 and others added 30 commits March 23, 2026 20:39

Remove trash

2d70d9b

remove definitely dead tests

2ce8c42

Delete torch ext crap

11586a2

dead scripts

d2471de

setup cleanup

5200492

cleanup torch models

4ba3ba4

small fixes

5c811c6

cleanups

e72370b

delete more

a3c1a90

minor

a030af8

drop no build isolation

fbd52ce

uh forgot src

58b3fc9

toml license

1292b81

fix ocean

032a61a

pybind11?

41469f3

khr compile fix

0aa5bdf

build fixes for ocean

f51cb87

Update manifest

e137d95

fuck you setup.py!

a320b24

Nice simple build script!

88d0e20

single build script

f931f3f

Some refactors, needs more work

5b5c217

Old extensions

5345067

.ini

c3c0de8

adjust scoring metrics

ff8e6c6

adjust scoring metrics

a1f84ab

remove locally changed files

3cc928d

revert nmmo3.ini

f95a39b

revert toml changes

9caed86

threads

1266be1

Infatoshi and others added 14 commits April 20, 2026 15:26

Revert "src: raise log Dict capacity from 32 to 256"

30fbea8

This reverts commit 9396e79.

Merge pull request #537 from Infatoshi/craftax-full-pr

f9f7155

Craftax Full: native C port + optimizations + renderer

tried to change rewards to improve scoring, didnt work

aee7839

summed up rewards in spawn new cap

3c53eb8

cleaned up some code as joseph wanted and rewards changed by robin

482483f

Merge pull request #533 from PLAZMAMA/fix-constellation-heap-overflow

83e9fae

Fix heap overflow in constallation

Optimize Craftax CPU rollout

b286fc4

jsuarez5341 and others added 15 commits April 27, 2026 14:16

Merge pull request #540 from Infatoshi/craftax-direct-pr

18ec52b

Optimize Craftax CPU rollout

Merge branch '4.0' into 4.0

bb0793d

Merge pull request #544 from doofenshirmtz/4.0

5fd4b42

Dr Mario Env v2

added dino env

92b680f

dino env is training. added weight to resources.

11489cc

deleted dino

55e2bf6

Merge pull request #546 from doofenshirmtz/dino

7dd6c75

Dino Env ported to 4.0

Add laser puzzle environment

98dc62d

Merge pull request #547 from Matthew-Neba/laser-puzzle

ddd759f

Add laser puzzle environment Merging w/ request for cleanups made on stream

Minimal env

49e7a4c

merge

2dd9f0e

merge

de49645

puffer aim trainer hehe

8f79561

Update whackamole.ini

1227d74

Merge pull request #549 from doofenshirmtz/whackamole-env

69fcbcf

puffer aim trainer hehe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.0#402

4.0#402
jsuarez5341 wants to merge 703 commits into
3.0from
4.0

jsuarez5341 commented Oct 24, 2025

Uh oh!

KTibow commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

jsuarez5341 commented Oct 24, 2025

Uh oh!

KTibow commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants