Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
f44146c
add OSRS environments: inferno, zulrah, PvP
valtterivalo Apr 12, 2026
4aee0ba
wire OSRS visual asset download into build.sh
valtterivalo Apr 12, 2026
4dac224
fix config env_names, test includes, remove dead code
valtterivalo Apr 12, 2026
f003055
wire up c_render and fix PvP binding
valtterivalo Apr 12, 2026
497e6bc
suppress unused-function warnings in binding TU
valtterivalo Apr 12, 2026
4945f3c
extract OSRS assets to data/ at repo root, not ocean/osrs/data
valtterivalo Apr 12, 2026
5d96c9a
move shared OSRS headers from src/osrs to ocean/osrs
valtterivalo Apr 12, 2026
272715d
fix eval rendering, camera pan, sprite filenames
valtterivalo Apr 12, 2026
1c83c13
eval tick pacing + spell-target ground-click walks
valtterivalo Apr 12, 2026
cca1529
OSRS viewer: aspect ratio + inventory fixes
valtterivalo Apr 12, 2026
0a8334d
export_items.sh: self-contained Java item sprite exporter
valtterivalo Apr 12, 2026
0d5c980
spell highlight + ancient spell order
valtterivalo Apr 12, 2026
380998c
add PLAY_REPLAY env var to play back a recorded episode in eval
valtterivalo Apr 12, 2026
106ca9a
OSRS UI authenticity: prayer 5x6, ancient spellbook, weapon styles
valtterivalo Apr 12, 2026
1e9b1ef
bump osrs-assets to v5 (adds smoke/shadow spell sprites)
valtterivalo Apr 12, 2026
8e44cc8
fix eval render: load assets in c_render, pace ticks at 600ms, 1.5x w…
valtterivalo Apr 13, 2026
d09a435
drop osrs_inferno_zuk config, window comment fix
valtterivalo Apr 14, 2026
3cbc200
bump dict log sizes to 64
jsuarez5341 Apr 14, 2026
1f8abba
fix BFS fallback stale-cost bug + NPC rendering in eval
valtterivalo Apr 14, 2026
f7072fc
minify obs, tweak rewards, fast config. Wave 33 in 2:17
jsuarez5341 Apr 14, 2026
f015179
merge
jsuarez5341 Apr 14, 2026
ef47f13
inferno: net damage reward + jad healer fix + zuk healer parity
valtterivalo Apr 15, 2026
e3b4132
puffer eval: auto-load latest checkpoint by default
valtterivalo Apr 15, 2026
5957f0e
inferno: smooth 60fps animation in puffer eval render
valtterivalo Apr 15, 2026
eb86e94
inferno: fix boost potion sprites (bastion/stamina)
valtterivalo Apr 15, 2026
14d8313
Temp changes - fast zuk
jsuarez5341 Apr 16, 2026
91fbe22
inferno: fix shield barrage, mager nibbler resurrect, eval render reset
valtterivalo Apr 16, 2026
5394670
merge joseph_edits: obs restructure, phased reward, dict log bump
valtterivalo Apr 16, 2026
8f30e21
docs: use build.sh instead of setup.py in osrs tools README
valtterivalo Apr 16, 2026
214847c
inferno: healers heal once before tag, rapid BP/tbow speed, meteor he…
valtterivalo Apr 17, 2026
3eacab0
osrs: FightStyle stance as first-class combat concept
valtterivalo Apr 17, 2026
7426c33
inferno: remove oracle prayer scaffolding, agent controls overhead
valtterivalo Apr 17, 2026
5360c66
osrs: migrate prayers to toggle semantics + activation-tick drain skip
valtterivalo Apr 17, 2026
8b741fe
osrs bindings: sync ACT_SIZES with new prayer action head layout
valtterivalo Apr 17, 2026
d2e4322
Fix camera
jsuarez5341 Apr 17, 2026
ec3c293
osrs pvp: opp_emit_prayer handles OVERHEAD_NONE deactivate
valtterivalo Apr 18, 2026
6d38fd2
osrs anim: fix Y-rotation sign (arms/limbs yaw direction was mirrored)
valtterivalo Apr 18, 2026
a02191d
osrs: drain clears prayer enum at pp<=0 entry
valtterivalo Apr 18, 2026
4ffa25e
osrs inferno/zulrah: recompute loadout cache after prayer drain
valtterivalo Apr 18, 2026
171640c
osrs: NPCs no longer stall in attack range when LOS is blocked
valtterivalo Apr 18, 2026
85d6858
osrs inferno: fix wave 67/68 jad spawns, pillar cleanup, triple-jad s…
valtterivalo Apr 18, 2026
1ecf3f6
osrs inferno: gate barrages on magic level, fix jad healer spawn stun
valtterivalo Apr 18, 2026
d11fde9
osrs inferno: jad prayer check at T+3, not at projectile land
valtterivalo Apr 18, 2026
4f04f8e
osrs inferno: jad hit delay is fixed at 4 ticks, not distance-based
valtterivalo Apr 18, 2026
a174425
osrs inferno: min-hp progress reward (no farming via heal)
valtterivalo Apr 18, 2026
e1a9832
osrs inferno: exclude jad healers from min-hp reward
valtterivalo Apr 18, 2026
319713b
osrs inferno: zuk hit is no-tick-eat
valtterivalo Apr 18, 2026
acb9e7a
osrs inferno: revert zuk no-tick-eat clamp to simple delayed-hit model
valtterivalo Apr 18, 2026
0c2a0cd
osrs inferno: lethal npc damage kills the tick, eating can't revive
valtterivalo Apr 18, 2026
3b4496f
osrs inferno: fix zuk healer spark coords, projectile model, mager/ra…
valtterivalo Apr 18, 2026
84ff5df
osrs inferno: LOS-stop uses actual target, not hardcoded player
valtterivalo Apr 18, 2026
de25561
osrs passive item effects + inferno gear
valtterivalo Apr 21, 2026
9e95dbc
sync inferno env parity fixes
valtterivalo Apr 22, 2026
c6d72b1
fix jad preview and inferno visuals
valtterivalo Apr 22, 2026
ab66c82
fix inferno human mode
valtterivalo Apr 22, 2026
821e26c
fix inferno mage rez and potion reset ui
valtterivalo Apr 22, 2026
80cd4a3
switch inferno to healer-tag rewards
valtterivalo Apr 22, 2026
a422c73
sync osrs human mode and inferno pathing
valtterivalo Apr 26, 2026
cc56028
scale inferno supplies for late starts
valtterivalo Apr 27, 2026
d8b2948
validate inferno start waves
valtterivalo Apr 27, 2026
d0142ea
Clean OSRS PR noise
valtterivalo Apr 28, 2026
16acdd2
Simplify OSRS PR code
valtterivalo Apr 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 30 additions & 4 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ for arg in "$@"; do
--debug) DEBUG=1 ;;
--local) MODE=local ;;
--fast) MODE=fast ;;
--gprof) MODE=fast; GPROF=1 ;;
--web) MODE=web ;;
--profile) MODE=profile ;;
--cpu) MODE=cpu; PRECISION="-DPRECISION_FLOAT" ;;
Expand Down Expand Up @@ -55,7 +56,8 @@ PLATFORM="$(uname -s)"
if [ "$PLATFORM" = "Linux" ]; then
RAYLIB_NAME='raylib-5.5_linux_amd64'
OMP_LIB=-lomp5
SANITIZE_FLAGS=(-fsanitize=address,undefined,bounds,pointer-overflow,leak -fno-omit-frame-pointer)
# SANITIZE_FLAGS=(-fsanitize=address,undefined,bounds,pointer-overflow,leak -fno-omit-frame-pointer)
SANITIZE_FLAGS=()
STANDALONE_LDFLAGS=(-lGL)
SHARED_LDFLAGS=(-Bsymbolic-functions)
else
Expand Down Expand Up @@ -116,6 +118,18 @@ elif [ "$ENV" = "impulse_wars" ]; then
download "$BOX2D_NAME" "$BOX2D_URL/$BOX2D_NAME.tar.gz"
INCLUDES+=(-I./$BOX2D_NAME/include -I./$BOX2D_NAME/src)
LINK_ARCHIVES+=("./$BOX2D_NAME/libbox2d.a")
elif [[ "$ENV" == osrs_* ]]; then
SRC_DIR="ocean/$ENV"
INCLUDES+=(-I./ocean/osrs)
# download visual assets to data/ at repo root (where the binary looks).
# eval via puffer renders through the same _C.so, so assets must be present
# for any osrs build, not just --local.
if [ ! -f "data/equipment.models" ]; then
echo "Downloading OSRS visual assets..."
OSRS_ASSETS_URL="https://github.com/valtterivalo/PufferLib/releases/download/osrs-assets-v8/osrs-assets-v8.tar.gz"
mkdir -p data
curl -sL "$OSRS_ASSETS_URL" | tar xz --strip-components=1 -C data
fi
elif [ -d "ocean/$ENV" ]; then
SRC_DIR="ocean/$ENV"
else
Expand All @@ -131,17 +145,28 @@ if [ -n "$DEBUG" ] || [ "$MODE" = "local" ]; then
LINK_OPT="-g"
else
CLANG_OPT=(-O2 -DNDEBUG "${CLANG_WARN[@]}")
if [ -n "$GPROF" ]; then
CLANG_OPT+=(-pg)
fi
NVCC_OPT="-O2 --threads 0"
LINK_OPT="-O2"
fi
if [ "$MODE" = "local" ] || [ "$MODE" = "fast" ]; then
# OSRS envs share a single visual binary
if [[ "$ENV" == osrs_* ]]; then
VISUAL_SRC="ocean/osrs/osrs_visual.c"
VISUAL_DEFS="-DOSRS_VISUAL"
else
VISUAL_SRC="$SRC_DIR/$ENV.c"
VISUAL_DEFS=""
fi
FLAGS=(
"${INCLUDES[@]}"
"$SRC_DIR/$ENV.c" $EXTRA_SRC -o "$OUTPUT_NAME"
"$VISUAL_SRC" $EXTRA_SRC -o "$OUTPUT_NAME"
"${LINK_ARCHIVES[@]}"
"${STANDALONE_LDFLAGS[@]}"
-lm -lpthread -fopenmp
-DPLATFORM_DESKTOP
-DPLATFORM_DESKTOP $VISUAL_DEFS
)
echo "Compiling $ENV..."
${CC:-clang} "${CLANG_OPT[@]}" "${FLAGS[@]}"
Expand Down Expand Up @@ -216,8 +241,9 @@ fi

echo "Compiling static library for $ENV..."
${CC:-clang} -c "${CLANG_OPT[@]}" \
"${INCLUDES[@]}" \
-I. -Isrc -I$SRC_DIR -Ivendor \
-I./$RAYLIB_NAME/include -I$CUDA_HOME/include \
-I$CUDA_HOME/include \
-DPLATFORM_DESKTOP \
-fno-semantic-interposition -fvisibility=hidden \
-fPIC -fopenmp \
Expand Down
184 changes: 184 additions & 0 deletions config/osrs_inferno.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# OSRS Inferno encounter.
# 8 action heads (79 logits), 1058 obs, long episodes (300-8000+ ticks).

[base]
env_name = osrs_inferno
score_metric = episode_return

[env]
start_wave = 69
damage_reward_coeff = 0.01
shield_penalty_coeff = 0.01
tag_reward_coeff = 0.25
late_start_supply_profile_scale = 1.0
mask_in_obs = 1.0
record_best_replay_path = ""
play_replay_path = ""
# curriculum: fraction of agents starting at later waves (rest at start_wave)
curriculum_wave_1 = 20
curriculum_frac_1 = 0.00
curriculum_wave_2 = 40
curriculum_frac_2 = 0.00
curriculum_wave_3 = 60
curriculum_frac_3 = 0.00

[vec]
total_agents = 8192
num_buffers = 4

[policy]
hidden_size = 128
num_layers = 3

[train]
total_timesteps = 400_000_000

[sweep]
min_sps = 50000
max_suggestion_cost = 3600
metric = episode_return
metric_distribution = linear

[sweep.train.total_timesteps]
distribution = log_normal
min = 500000000
max = 3200000000
scale = time

[sweep.train.horizon]
distribution = uniform_pow2
min = 32
max = 256
scale = auto

[sweep.train.learning_rate]
distribution = log_normal
min = 0.0003
max = 0.01
scale = 0.5

[sweep.train.ent_coef]
distribution = log_normal
min = 0.001
max = 0.1
scale = auto

[sweep.train.gamma]
distribution = logit_normal
min = 0.99
max = 0.999999
scale = auto

[sweep.train.min_lr_ratio]
distribution = uniform
min = 0.0
max = 0.5
scale = auto

[sweep.train.beta1]
distribution = uniform
min = 0.8
max = 0.99
scale = auto

[sweep.train.eps]
distribution = log_normal
min = 1e-6
max = 1e-4
scale = auto

[sweep.train.gae_lambda]
distribution = logit_normal
min = 0.5
max = 0.999
scale = auto

[sweep.train.vtrace_rho_clip]
distribution = uniform
min = 1.0
max = 3.0
scale = auto

[sweep.train.vtrace_c_clip]
distribution = uniform
min = 1.0
max = 2.5
scale = auto

[sweep.train.prio_alpha]
distribution = logit_normal
min = 0.0
max = 0.999
scale = auto

[sweep.train.prio_beta0]
distribution = logit_normal
min = 0.01
max = 0.8
scale = auto

[sweep.train.clip_coef]
distribution = uniform
min = 0.05
max = 1.5
scale = auto

[sweep.train.vf_coef]
distribution = log_normal
min = 0.005
max = 0.5
scale = auto

[sweep.train.vf_clip_coef]
distribution = uniform
min = 0.1
max = 2.0
scale = auto

[sweep.train.max_grad_norm]
distribution = uniform
min = 0.5
max = 3.0
scale = auto

[sweep.train.replay_ratio]
distribution = uniform
min = 0.1
max = 2.0
scale = auto

[sweep.train.weight_decay]
distribution = log_normal
min = 0.001
max = 1.0
scale = auto

[sweep.train.minibatch_size]
distribution = uniform_pow2
min = 2048
max = 8192
scale = auto

[sweep.vec.num_buffers]
distribution = uniform_pow2
min = 1
max = 4
scale = auto

[sweep.vec.total_agents]
distribution = uniform_pow2
min = 128
max = 4096
scale = auto

[sweep.policy.hidden_size]
distribution = uniform_pow2
min = 128
max = 1024
scale = auto

[sweep.policy.num_layers]
distribution = uniform
min = 2
max = 5.0
scale = auto
37 changes: 37 additions & 0 deletions config/osrs_pvp.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# OSRS NH PvP encounter.
# 7 action heads (39 logits), 334 obs + 39 mask = 373 total, short episodes (~300 ticks).

[base]
env_name = osrs_pvp
policy_name = MinGRU
rnn_name = Recurrent
score_metric = episode_return

[env]
mask_in_obs = 1.0

[vec]
total_agents = 512
num_buffers = 2

[policy]
hidden_size = 256
num_layers = 2

[train]
total_timesteps = 500000000
horizon = 32
learning_rate = 0.003
beta1 = 0.95
eps = 0.00001
ent_coef = 0.01
gamma = 0.997
gae_lambda = 0.95
clip_coef = 0.2
vf_coef = 0.5
vf_clip_coef = 0.5
max_grad_norm = 1.0
replay_ratio = 0.25
minibatch_size = 4096
ns_iters = 5
weight_decay = 0.001
38 changes: 38 additions & 0 deletions config/osrs_zulrah.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# OSRS Zulrah encounter.
# 6 action heads (41 logits), 81 obs + 41 mask = 122 total, medium episodes (~600 ticks max).

[base]
env_name = osrs_zulrah
policy_name = MinGRU
rnn_name = Recurrent
score_metric = episode_return

[env]
mask_in_obs = 1.0
gear_tier = 2.0

[vec]
total_agents = 512
num_buffers = 2

[policy]
hidden_size = 256
num_layers = 2

[train]
total_timesteps = 500000000
horizon = 32
learning_rate = 0.003
beta1 = 0.95
eps = 0.00001
ent_coef = 0.01
gamma = 0.999
gae_lambda = 0.95
clip_coef = 0.2
vf_coef = 0.5
vf_clip_coef = 0.5
max_grad_norm = 1.0
replay_ratio = 0.25
minibatch_size = 4096
ns_iters = 5
weight_decay = 0.001
35 changes: 35 additions & 0 deletions ocean/osrs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# osrs envs

build the compiled backend with the normal puffer flow:

```bash
./build.sh osrs_inferno
./build.sh osrs_zulrah
./build.sh osrs_pvp
```

the active python entrypoint is `puffer`, backed by `pufferlib/pufferl.py`.

```bash
puffer train osrs_inferno
puffer sweep osrs_inferno
puffer eval osrs_inferno
puffer eval osrs_inferno --load-model-path /path/to/checkpoint.bin
```

env configs live in `config/ocean/<env>.ini`.

inferno best replay recording is opt-in:

```bash
puffer train osrs_inferno --env.record-best-replay-path checkpoints/osrs_inferno/best.replay
puffer eval osrs_inferno --env.play-replay-path checkpoints/osrs_inferno/best.replay
```

the standalone visual binary still exists for direct rendering:

```bash
cd ocean/osrs
make visual
./osrs_visual --encounter inferno --replay /path/to/best.replay
```
Loading