MLX-native 3D and spatial inference tooling for Apple Silicon.
mlx-spatial is a practical runtime package for running modern 3D
reconstruction and image-to-3D pipelines locally with MLX. The package is
intentionally focused: keep weights outside the wheel, validate the assets you
downloaded, then run clear command-line paths that produce inspectable outputs.
This is not a training framework, and it does not bundle model weights.
The package covers five model families:
| Pipeline | Input | Output | Weight setup |
|---|---|---|---|
| SAM 3D Objects | image + object mask | Gaussian PLY, optional GLB | appautomaton MLX bundle |
| TRELLIS.2 | object-centric RGB/RGBA image | shape OBJ or textured GLB | downloaded safetensors directly |
| HY-WorldMirror 2.0 | scene image or image frames | camera, depth, normals, point-cloud PLY | downloaded safetensors directly |
| LiTo | object-centric RGB/RGBA image | 3D Gaussian Splat PLY | appautomaton research MLX bundle |
| MapAnything | scene image views | scene .npz with depth, cameras, and world points |
downloaded safetensors directly |
Choose by job:
- Use SAM3D when you have an object image plus an exact mask and want object reconstruction with Gaussian PLY output.
- Use TRELLIS.2 when you have an object-centric image and want a shape OBJ or textured GLB.
- Use HY-WorldMirror when the input is a scene or frame set and you need camera, depth, normal, or point-cloud outputs.
- Use LiTo when you want Apple's research image-to-3DGS path and can work with Gaussian splat PLY output instead of a mesh.
- Use MapAnything when you have related scene views and want image-only depth, confidence, masks, camera parameters, and dense world points.
Honest status:
- SAM3D is the strongest object reconstruction path in this package. It uses the public
appautomaton/sam-3d-objects-mlxbundle. - TRELLIS.2 generation works, including textured GLB export. The export path is usable, but still an area we keep improving for texture and mesh quality.
- HY-WorldMirror works for scene reconstruction with
camera,depth,normal,points. The optional Gaussian head is not part of the release-ready path yet. - LiTo runs checkpoint-backed image-to-3DGS inference with the public
appautomaton/lito-research-mlxbundle. Outputs are Gaussian splat PLY files, not meshes; use a 3DGS-aware viewer. - MapAnything runs checkpoint-backed scene generation with the public
facebook/map-anythingweights. The supported artifact is a scene.npztensor bundle, not a mesh or Gaussian splat export.
For local development from this repo:
uv sync
uv run pytest -qFor package consumers:
uv add mlx-spatial
# or
pip install mlx-spatialRequirements:
- Python 3.11+
- Apple Silicon recommended
- MLX installed through the package dependencies
- model weights downloaded separately under
weights/
The package installs five CLIs:
uv run mlx-spatial-sam3d --help
uv run mlx-spatial-trellis2 --help
uv run mlx-spatial-hyworld2 --help
uv run mlx-spatial-lito --help
uv run mlx-spatial-mapanything --helpThe repository also includes readable script wrappers under scripts/. These
are the easiest starting point because they encode recommended settings.
Weights are intentionally not committed and not shipped in the wheel. Keep them under ignored local folders:
weights/sam-3d-objects-mlx/
weights/lito-research-mlx/
weights/trellis2/
weights/rmbg2/
weights/dinov3-vitl16-pretrain-lvd1689m/
weights/hy-world-2/
weights/map-anything/
SAM3D uses the converted appautomaton/sam-3d-objects-mlx runtime bundle:
uv run hf download appautomaton/sam-3d-objects-mlx \
--local-dir weights/sam-3d-objects-mlx
uv run mlx-spatial-sam3d validate weights/sam-3d-objects-mlxLiTo uses the converted appautomaton/lito-research-mlx research bundle:
uv run hf download appautomaton/lito-research-mlx \
--local-dir weights/lito-research-mlx
uv run mlx-spatial-lito validate weights/lito-research-mlxTRELLIS.2, HY-WorldMirror, and MapAnything do not need SAM3D-style conversion. They load the downloaded safetensors and JSON configs directly:
uv run mlx-spatial-trellis2 download-command --root weights/trellis2
uv run mlx-spatial-trellis2 rmbg-download-command --root weights/rmbg2
uv run mlx-spatial-trellis2 dinov3-download-command weights/dinov3-vitl16-pretrain-lvd1689m
uv run mlx-spatial-hyworld2 download-command weights/hy-world-2
uv run mlx-spatial-mapanything download-command weights/map-anythingRun the printed hf download ... commands, then validate:
uv run mlx-spatial-trellis2 validate --root weights/trellis2
uv run mlx-spatial-trellis2 rmbg-validate --root weights/rmbg2
uv run mlx-spatial-trellis2 dinov3-validate weights/dinov3-vitl16-pretrain-lvd1689m
uv run mlx-spatial-hyworld2 validate weights/hy-world-2
uv run mlx-spatial-mapanything validate weights/map-anythingRespect the licenses and access terms of the upstream model providers. The Python package only provides runtime code.
Use an image and the exact object mask you want reconstructed:
python scripts/sam3d/reconstruct.py inputs/sam3d/living-room/image.png \
--mask inputs/sam3d/living-room/mask-3.png \
--output-dir outputs/sam3d/living-room-scriptExpected output:
outputs/sam3d/living-room-script/
gaussians.ply
trace.json
Inspect the trace:
python scripts/sam3d/inspect_trace.py outputs/sam3d/living-room-script/trace.jsonUse an object-centric image. RGBA images use their alpha channel directly; RGB images use RMBG to estimate the foreground:
python scripts/trellis2/generate_textured.py inputs/trellis2/cup-of-tea.jpg \
--output-dir outputs/trellis2/cup-of-tea-scriptExpected output:
outputs/trellis2/cup-of-tea-script/
model.glb
trace.json
The default settings are quality-oriented for Apple Silicon: 512 pipeline, model-config sampler steps, 1024 texture, 200k GLB face target, global xatlas unwrap, and kdtree texture baking. Low-step runs are useful for smoke tests, but they are not representative of output quality.
Use a scene image or a directory of scene frames. This pipeline does not take an object mask:
python scripts/hyworld2/generate_scene.py inputs/sam3d/kidsroom/image.png \
--output-dir outputs/hyworld2/kidsroom-scene-scriptExpected output:
outputs/hyworld2/kidsroom-scene-script/
camera_params.json
depth/
normal/
points/points.ply
trace.json
The script uses the verified release path: real Tencent safetensors, large
memory profile, and camera,depth,normal,points heads. For frame directories,
use --memory-profile balanced when the large profile hits the attention
guard.
Use an object-centric image with a useful alpha mask when possible:
python scripts/lito/generate.py inputs/lito/sample.png \
--weights-root weights/lito-research-mlx \
--output outputs/lito/sample.ply \
--memory-profile balanced \
--print-metricsExpected output:
outputs/lito/sample.ply
outputs/lito/sample.safetensors
LiTo writes a Gaussian Splat PLY, not a mesh. Blender's native PLY importer can read the container, but it does not render the 3DGS fields correctly. Use a Gaussian-splat-aware viewer such as KIRI's Blender 3DGS add-on.
Use a directory of related scene views. The Desk example is a two-image scene:
python scripts/mapanything/generate_scene.py inputs/map-anything/desk \
--output-dir outputs/mapanything/desk-scriptExpected output:
outputs/mapanything/desk-script/
scene.npz
trace.json
The script uses the upstream image-only inference settings: fixed_mapping
preprocessing, stride 1, checkpoint-derived patch size, DINOv2
normalization, and mask/edge-mask postprocessing. scene.npz matches the original Torch scene layout
semantically: images, depth, confidence, masks, intrinsics, camera poses, and
world points. The MLX file uses clean top-level keys and also records
extrinsics.
src/mlx_spatial/ package code
scripts/ readable user and maintainer wrappers
docs/ deeper setup, release, and architecture notes
tests/ unit and parity-oriented coverage
weights/ ignored local model assets
inputs/ ignored local sample inputs
outputs/ ignored generated results
vendors/ ignored upstream checkouts
- docs/README.md: documentation map and reader contract.
- scripts/README.md: recommended inference scripts and their defaults.
- docs/sam3d.md: SAM3D setup, inference, quality gates, PLY expectations, and coordinate notes.
- docs/trellis2.md: TRELLIS.2 asset layout, no-conversion note, scripts, and export caveats.
- docs/hyworld2.md: HY-WorldMirror asset layout, scene inputs, memory profiles, and outputs.
- docs/lito.md: LiTo setup, research-weight bundle, image-to-3DGS CLI, memory profiles, and PLY viewing notes.
- docs/mapanything.md: MapAnything asset layout, scene
.npzschema, parity notes, and viewer/export boundary. - docs/architecture.md: module map and pipeline boundaries.
- docs/development.md: tests, local asset rules, and contribution constraints.
- docs/model-publishing.md: model bundles and model-card rules.
- docs/release.md: release checklist.
Before publishing, build and inspect the artifacts:
uv run pytest -q
rm -rf dist
uv build
python scripts/packaging/check_release_artifacts.py \
dist/mlx_spatial-*.tar.gz \
dist/mlx_spatial-*-py3-none-any.whl
python scripts/packaging/check_release_artifacts.py --git-hygieneThe build must not include local weights, generated outputs, inputs, vendor checkouts, caches, or agent state.
Publishing is handled by the trusted-publishing workflow in
.github/workflows/workflow.yaml. Do not publish from local shell credentials.