Package SST for PyPI as sstrack (transformers backend)#26
Conversation
Replace the placeholder uv_build config with a hatchling src-layout build, real project metadata, a lean runtime dependency set, and a raw extra for rawpy. Drop the stale pinned requirements.txt and uv.lock, which no longer match the package, and ignore build artifacts plus the local model-repo clones. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rewrite sam_utils around a Sam2Tracker wrapper that drives HuggingFace Sam2VideoModel/Sam2VideoProcessor, with text detection on AutoModelForZeroShotObjectDetection and automatic masks via the mask-generation pipeline. Weights now download from the Hub, removing the manual checkpoint step and the hardcoded scratch path. Device and dtype are resolved per host (bfloat16 on CUDA, float32 on CPU) and torch and transformers are imported lazily so the pure helpers stay light. Pin __version__ to 2.0.0. Per-frame masks are mapped through output.object_ids rather than a cached id list, since add_inputs_to_inference_session consumes the obj_ids it is given and SAM2 can omit objects it considers absent from a frame. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The transformers backend makes the vendored model copies and local Hydra configs dead weight in the wheel, so remove them. Move the OC-CCL finetuning scripts to experiments/, which still depend on the vendored SAM2 and are out of scope for the v2.0.0 package; experiments/README.md records how to run them from the pre-migration v1.1.0 tag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Introduce sst.__main__ with a two-phase argparse dispatch so the top-level help lists subcommands without importing torch and only the chosen subcommand's module is loaded. Each script now exposes add_arguments and run instead of parsing at import time, and uses the Sam2Tracker API and a --model/--device surface. Fold the RAM-heavy per-image segmentation into segment-and-crop via --per-image with lazy rawpy. Add tests for the CLI dispatch, the light-import contract, the numpy helpers, and the mask tools. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Build and twine-check the distribution on every pull request, and publish to PyPI via OIDC trusted publishing when a GitHub Release is published, so no API token is stored in the repository. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document pip install sstrack and the sst subcommands in place of the uv run python script invocations, explain the first-run HuggingFace weight download and cache behavior, and note that OC-CCL finetuning moved to experiments. Bump the citation to v2.0.0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bare sst previously parsed an empty argv and exited silently, contradicting the documented behavior of listing subcommands. Print the top-level help instead, and cover it with a test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a check job that runs ruff and pytest, and make publish depend on it, so a packaging regression cannot reach PyPI. The job installs only the lightweight deps the torch-free test suite needs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| img = Image.open(args.mask_image_path).convert("L") | ||
| arr = np.array(img) | ||
| arr[arr != 255] = 0 | ||
| arr[arr == 255] = 5 |
There was a problem hiding this comment.
NOTE!
Similarly to the PR note, Claude Opus 4.8 posted this on my behalf. I still need to review it and the context it lies within.
Discussion anchor: the object-id convention.
This labels the single foreground region with object id 5. Downstream, segment/segment-and-crop decompose the support mask with range(1, mask.max()+1), so a lone region labeled 5 is expanded into 5 tracked objects (ids 1–5), of which 1–4 are empty and still propagated through SAM2 — roughly 5x the work for one specimen.
The range(1..max) convention itself is intentional and correct for multi-part masks: it mirrors data/neon_beetles/demo.py, where the NEON beetle masks encode the five parts (Head, Pronotum, Elytra, Antenna, Legs) as pixel values 1–5. The 5 here looks like a carry-over of that max part id into the single-object specimen helper (it was 5 in the original main too, no comment).
Question for this PR: should prepare-mask emit 1 for the single-object case (one tracked object, ~5x faster, same result), accepting that it changes the documented mask value from 5 to 1? Left as-is for now pending your call.
NOTE!
Everything in this comment and draft PR was generated and pushed by Claude Opus 4.8 without my explicit instruction. I am still evaluating the code and commentary generated and pushed by Claude on my behalf here and didn't intend for it to go up yet. I'm not immediately removing it though to get a chance to review it.
Closes #22 (proposed). Packages SST for PyPI as
sstrack(import staysfrom sst import ...).What this does
src/layout, dynamic version fromsrc/sst/__init__.py, lean deps,rawextra for rawpy, singlesstconsole script. Drops the stalerequirements.txt/uv.lock.transformers(Sam2VideoModel/Sam2VideoProcessor,AutoModelForZeroShotObjectDetection). Weights auto-download from the Hub; no manual checkpoint step. Device/dtype resolved per host (bf16 on CUDA, fp32 on CPU).sstwith subcommandssegment,segment-and-crop,retrieve,prepare-mask,mask-from-crop; lazy imports soimport sstandsst --helpstay light.experiments/(still on the pre-migration vendored path, documented inexperiments/README.md).Verification
ruffclean;pytest10/10;uv buildproduces a clean wheel + sdist (no vendored code, weights, experiments, gui, or data);twine checkpasses; a real CPU smoke test ofSam2Tracker.segmentpasses.Open discussion points (deliberately left as-is; want input)
These are pre-existing v1 behaviors preserved verbatim through the migration, flagged by review:
prepare-masklabels the foreground5(src/sst/prepare_starter_mask.py). Combined with therange(1, max+1)decomposition insegment*, a single specimen becomes 5 tracked objects (4 empty), ~5x slower. Rationale traced to the 5-part NEON beetle id scheme. Should this emit1for the single-object case?retrieveranks ascending (src/sst/trait_retrieval.py), so--top_kreturns the lowest cycle-consistency matches. Likely a latent bug.retrieveclose step keeps the original as support both passes; may not match the intended target->orig cycle.Release-time housekeeping (not in this PR)
Tag scripts-era as
v1.1.0; setCITATION.cffdate-released; configure PyPI trusted publishing forsstrack; port OC-CCL to transformers (follow-up).