Output Directory Cleanup#
Many high‑level methods of ImpactModel stream results to disk (e.g., posterior predictive samples, predictions) to support large datasets and memory efficiency.
Each of these methods accepts an output_dir parameter (see Disk-Backed vs. On-Batch Methods for broader I/O behavior of them).
This page focuses on managing the temporary directory created when the user does not supply output_dir and on the cleanup() method, as well as the cleanup_models() class method, which removes temporary directories for all live model instances.
Creation Logic#
When a disk‑writing method is called with output_dir=None (the default), the model creates a process‑scoped temporary root directory (via tempfile.TemporaryDirectory) the first time such a call occurs.
Each invocation then writes to a timestamped subdirectory under that root, ensuring that earlier results are never overwritten.
Subdirectories follow the pattern <caller_name>_<UTC-timestamp>/, where <caller_name> is the name of the method that triggered the write operation.
This root directory is stored in the temp_dir attribute and reused for subsequent calls until the user invoke cleanup().
Example Layout (implicit temp root):
/tmp/tmpz00u5kxk/ # model.temp_dir (root, reused until cleanup)
sample_prior_predictive_20250926T185250223698Z/
log_likelihood_20250926T185359570134Z/
predict_20250926T185419208087Z/
If the user provides output_dir, that directory becomes the root, and it will be created if it does not already exist.
The same timestamped subdirectory pattern is used there (e.g., my_runs/20250917T013040Z).
An explicit output_dir is not deleted by cleanup(), since it is assumed that the user intends to manage its lifecycle manually.
Cleanup Behavior#
Removes only the internally created temporary root directory, including all its timestamped subdirectories. The path of the removed directory is logged for reference. If no temporary directory exists, or if it has already been removed, the method is a no-op.
Additional guarantees:
Safe to call multiple times; subsequent calls do nothing.
Explicitly provided
output_dirand its subdirectories are never deleted.Internal references are cleared so future implicit calls create a fresh temporary root.
Rationale for Explicit Calls#
Although tempfile.TemporaryDirectory attempts automatic removal upon garbage collection, the timing is nondeterministic—especially in notebooks or long-lived processes.
Large artifacts can accumulate quickly; calling cleanup() ensures prompt reclamation of disk space.
Accessing Output Directories#
Every disk-writing method returns an xarray.DataTree containing the paths where results are stored as attributes:
tree.attrs["output_dir"]– the root directory (either the temporary root or the user-provided directory).tree[<group>].attrs["output_dir"]– the timestamped subdirectory containing the Zarr data for that group.
The output below shows an example xarray.DataTree illustrating the output directory paths.
<xarray.DataTree 'root'>
Group: /
│ Attributes:
│ output_dir: /tmp/tmpin_6ch1u
└── Group: /posterior_predictive
Dimensions: (chain: 1, draw: 1000, y_dim_0: 100)
Coordinates:
* chain (chain) int64 8B 0
* draw (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999
* y_dim_0 (y_dim_0) int64 800B 0 1 2 3 4 5 6 7 8 ... 92 93 94 95 96 97 98 99
Data variables:
y (chain, draw, y_dim_0) float32 400kB dask.array<chunksize=(1, 1000, 100), meta=np.ndarray>
Attributes:
created_at: 2026-05-24T03:01:36.635128+00:00
aimz_version: 0.12.0
output_dir: /tmp/tmpin_6ch1u/20260524T030135510066Z_predictNote
Even after the output_dir is deleted, the returned xarray.DataTree and all its group entries remain accessible.
However, any arrays that were stored on disk have all values set to zero, since the underlying data files have been removed.
Users can still inspect the structure and metadata of the xarray.DataTree, but the original disk-backed values are no longer available.
Cleaning Multiple Models#
When a process creates multiple ImpactModel instances, it can be useful to clean up all their temporary directories in a single call.
The class method cleanup_models() iterates over all live model instances and calls their cleanup() method.
For example, this can be used as a pipeline hook after a run to clean up all temporary directories without tracking individual model instances.
Example:
from aimz.model import ImpactModel
# Create multiple instances and write to temporary directories
im1 = ImpactModel(...).fit(...).predict(...)
im2 = ImpactModel(...).fit(...).predict(...)
# Clean temporary directories for all active instances
ImpactModel.cleanup_models()
print(im1.temp_dir) # None
print(im2.temp_dir) # None
Typical Usage Pattern#
A typical workflow is to run these methods without specifying output_dir (using a temporary root), optionally access the results via the temp_dir attribute or the returned xarray.DataTree, and then free disk space with cleanup() or cleanup_models().
Tips for safe use:
Use
cleanup()at the end of a notebook or in afinallyblock.Use
cleanup_models()to remove temporary directories for all live model instances at once.Copy any results you want to keep before
cleanup()orcleanup_models().In tests, check that temporary directories are removed to avoid disk bloat.
Avoid leaving long sessions with un-cleaned temporary directories.