linear_quantize_activations leaks ~1 temp .mlpackage + .mlmodelc per calibration op-group (tens of GB on realistic models)
Environment
coremltools==9.0 (pip)
- macOS 15 (Darwin 25.0.0), Apple Silicon
- Python 3.12
Summary
During activation-statistics collection in
coremltools.optimize.coreml.experimental.linear_quantize_activations,
_ModelDebugger.predict_intermediate_outputs builds a fresh temporary
MLModel per op-group batch (loop at
coremltools/optimize/coreml/experimental/_model_debugger.py:315-320).
Each of those MLModel(spec, weights_dir=…) constructions allocates a
new .mlpackage under $TMPDIR via
_create_mlpackage → tempfile.mkdtemp(suffix=".mlpackage")
(coremltools/models/utils.py:114).
The only cleanup is the atexit.register(cleanup, self.package_path)
at coremltools/models/model.py:569, so nothing is reclaimed until
the process exits. The compiled .mlmodelc that CoreML creates when
the model is loaded (skip_model_load=False, which is the default for
calibration) is likewise retained.
Measured on a medium-sized transformer (Gemma 4 E2B chunk2, ~300 MB
mlpackage): a single calibration run with 8 calibration samples
accumulates ~80 temp directories totaling ~38 GB in $TMPDIR
before the process exits. With 49 GB free on the dev disk, the run
fills the partition and the calibration aborts.
Confirmed on a minimal self-contained reproducer (reproduce_leak.py,
attached) — 4 calibration samples against an 8-layer 2048-dim
nn.Linear stack leaks +6 temporary .mlpackage dirs, +402.9 MB:
$ TMPDIR=/tmp/leak-test python reproduce_leak.py
...
Before: 0 mlpackage, 0 mlmodelc, 0.0 MB
Running linear_quantize_activations (activation calibration) ...
After: 6 mlpackage, 0 mlmodelc, 402.9 MB
Leaked: +6 mlpackage, +0 mlmodelc, +402.9 MB
That linearly scales with calibration samples × op-group batches × model size.
On Gemma 4 chunk2 (300 MB × ~128 batches) the 38 GB figure above
matches perfectly.
With a MLModel.__del__/close() fix (proposed patch attached)
on exactly the same reproducer:
After: 0 mlpackage, 0 mlmodelc, 0.0 MB
Leaked: +0 mlpackage, +0 mlmodelc, +0.0 MB
Reproducer
Self-contained, no external weights needed (attached as
reproduce_leak.py):
import coremltools as ct
from coremltools.optimize.coreml.experimental import (
OpLinearQuantizerConfig, OptimizationConfig, linear_quantize_activations,
)
# … build a small MLModel from a torch.nn.Linear stack …
quantized = linear_quantize_activations(
mlmodel, OptimizationConfig(global_config=OpLinearQuantizerConfig(
mode="linear_symmetric", dtype="int8")),
sample_data=[{"x": np.random.randn(1, 2048).astype(np.float32)} for _ in range(4)],
)
del quantized
# $TMPDIR now contains leftover tmp*.mlpackage and tmp*.mlmodelc directories.
Expected output (counts and bytes retained) is printed by the script
before and after the call.
Expected vs observed
- Expected: each temporary
MLModel used only for prediction
during calibration releases its backing directory when it goes out
of scope.
- Observed: the backing directory survives until the Python
process exits.
Root cause
MLModel.__init__ ties the temp directory lifetime to an atexit
hook instead of the Python object's lifetime. There is no
__del__ / close() path. For long-running calibration loops this
is exactly the wrong lifetime.
Proposed fix
Two options, both small:
-
Per-call site — wrap the MLModel in
_model_debugger.predict_intermediate_outputs with a try/finally
that rmtrees model.package_path (and drops model.__proxy__)
after each iteration. Minimal surface, only touches the
quantization path that actually hits the problem.
-
MLModel-level — add MLModel.close() / __del__ that release
the temp package eagerly. Fixes every caller that constructs
MLModel from a spec (ct.optimize helpers, ct.convert internals,
custom pipelines). The atexit hook stays as a safety net.
Patch for option 2 attached (fix_mlmodel_tmp_leak.patch).
Happy to open a PR with a regression test that asserts $TMPDIR count
does not grow across a calibration loop, if that would be useful.
linear_quantize_activationsleaks ~1 temp.mlpackage+.mlmodelcper calibration op-group (tens of GB on realistic models)Environment
coremltools==9.0(pip)Summary
During activation-statistics collection in
coremltools.optimize.coreml.experimental.linear_quantize_activations,_ModelDebugger.predict_intermediate_outputsbuilds a fresh temporaryMLModelper op-group batch (loop atcoremltools/optimize/coreml/experimental/_model_debugger.py:315-320).Each of those
MLModel(spec, weights_dir=…)constructions allocates anew
.mlpackageunder$TMPDIRvia_create_mlpackage→tempfile.mkdtemp(suffix=".mlpackage")(
coremltools/models/utils.py:114).The only cleanup is the
atexit.register(cleanup, self.package_path)at
coremltools/models/model.py:569, so nothing is reclaimed untilthe process exits. The compiled
.mlmodelcthat CoreML creates whenthe model is loaded (
skip_model_load=False, which is the default forcalibration) is likewise retained.
Measured on a medium-sized transformer (Gemma 4 E2B chunk2, ~300 MB
mlpackage): a single calibration run with 8 calibration samples
accumulates ~80 temp directories totaling ~38 GB in
$TMPDIRbefore the process exits. With 49 GB free on the dev disk, the run
fills the partition and the calibration aborts.
Confirmed on a minimal self-contained reproducer (
reproduce_leak.py,attached) — 4 calibration samples against an 8-layer 2048-dim
nn.Linearstack leaks +6 temporary.mlpackagedirs, +402.9 MB:That linearly scales with calibration samples × op-group batches × model size.
On Gemma 4 chunk2 (300 MB × ~128 batches) the 38 GB figure above
matches perfectly.
With a
MLModel.__del__/close()fix (proposed patch attached)on exactly the same reproducer:
Reproducer
Self-contained, no external weights needed (attached as
reproduce_leak.py):Expected output (counts and bytes retained) is printed by the script
before and after the call.
Expected vs observed
MLModelused only for predictionduring calibration releases its backing directory when it goes out
of scope.
process exits.
Root cause
MLModel.__init__ties the temp directory lifetime to anatexithook instead of the Python object's lifetime. There is no
__del__/close()path. For long-running calibration loops thisis exactly the wrong lifetime.
Proposed fix
Two options, both small:
Per-call site — wrap the MLModel in
_model_debugger.predict_intermediate_outputswith a try/finallythat
rmtreesmodel.package_path(and dropsmodel.__proxy__)after each iteration. Minimal surface, only touches the
quantization path that actually hits the problem.
MLModel-level — add
MLModel.close()/__del__that releasethe temp package eagerly. Fixes every caller that constructs
MLModel from a spec (
ct.optimizehelpers,ct.convertinternals,custom pipelines). The
atexithook stays as a safety net.Patch for option 2 attached (
fix_mlmodel_tmp_leak.patch).Happy to open a PR with a regression test that asserts
$TMPDIRcountdoes not grow across a calibration loop, if that would be useful.