Skip to content

feat(weights): add exact weight footprint from safetensors header#64

Merged
inureyes merged 2 commits into
mainfrom
feature/issue-53-exact-weight-footprint
May 21, 2026
Merged

feat(weights): add exact weight footprint from safetensors header#64
inureyes merged 2 commits into
mainfrom
feature/issue-53-exact-weight-footprint

Conversation

@inureyes
Copy link
Copy Markdown
Member

Summary

Adds byte-accurate weight-size accounting from safetensors metadata so --recommend-quant reports the exact model footprint before any tensor data is loaded, with the analytical config.json estimate as a fallback.

What changed

  • src/lib/mlxcel-core/src/weights.rs: added weight_footprint_bytes(model_dir) -> Option<u64> (public), parse_shard_index_with_total_size (public) that exposes the previously discarded metadata.total_size field, parse_shard_index_inner (private shared implementation), extract_shards_and_total_size (private), read_safetensors_header_bytes (private), safetensors_dtype_itemsize (private), and the ShardIndexResult type alias. The original parse_shard_index and extract_shards_from_index_json are unchanged in public behavior.
  • src/execution/quant_advisor.rs: QuantAdvice gains exact_weight_bytes: Option<u64>; advise_quantization calls weight_footprint_bytes and converts exact bytes to a billions-of-parameters signal (takes precedence over the analytical estimate); print_quant_advice shows exact GiB/MiB with source tag and shows the analytical estimate as a reference note. New format_bytes helper. Imports mlxcel_core::weights::weight_footprint_bytes.

Test plan

  • cargo test -p mlxcel-core --lib weights::tests — all 22 tests pass (9 new: sharded index with/without total_size, single-file binary header, scalar tensor, dtype itemsize, missing case)
  • cargo test -p mlxcel --lib execution::quant_advisor::tests — all 11 tests pass (4 new: exact_weight_bytes field, index wiring, format_bytes helpers)
  • cargo clippy -p mlxcel-core --lib --tests -- -D warnings — clean
  • cargo clippy -p mlxcel --lib --tests -- -D warnings — clean

Closes #53

Add `weight_footprint_bytes(model_dir) -> Option<u64>` to `mlxcel-core::weights` that returns the byte-accurate weight size before any tensors are loaded.

Resolution order:
1. `metadata.total_size` from `model.safetensors.index.json` (sharded models — already parsed by `parse_shard_index`, now also extracts the discarded field)
2. Safetensors binary header of a single `model.safetensors` — reads 8-byte LE header-length prefix plus the JSON header object, sums dtype × shape-product per tensor entry without touching tensor data
3. Returns `None` when neither is available; callers fall back to analytical estimate

`parse_shard_index` is unchanged in return type; new `parse_shard_index_with_total_size` exposes the extended result via the `ShardIndexResult` type alias (added to silence clippy::type_complexity).

Wire exact footprint into `quant_advisor.rs`:
- `QuantAdvice` gains `exact_weight_bytes: Option<u64>`
- `advise_quantization` calls `weight_footprint_bytes` and converts exact bytes to a billions-of-parameters estimate (bytes / 2 / 1e9, FP16 reference), which supersedes the analytical config.json estimate when present
- `print_quant_advice` shows the exact GiB/MiB figure and source tag when available; analytical estimate is shown as a reference note

New unit tests: 9 in `weights::tests` (sharded index with/without total_size, single-file binary header, scalar tensor, dtype itemsize table, missing case) and 4 in `quant_advisor::tests` (exact_weight_bytes field, index wiring, format_bytes helpers).
@inureyes inureyes added status:review Under review status:done Completed and removed status:review Under review labels May 21, 2026
@inureyes inureyes merged commit 3b1e2b3 into main May 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status:done Completed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: exact weight-byte accounting from safetensors metadata

1 participant