Output Directory Cleanup#

Many high‑level methods of ImpactModel stream results to disk (e.g., posterior predictive samples, predictions) to support large datasets and memory efficiency. Each of these methods accepts an output_dir parameter (see Disk-Backed vs. On-Batch Methods for broader I/O behavior of them). This page focuses on managing the temporary directory created when the user does not supply output_dir and on the cleanup() method, as well as the cleanup_models() class method, which removes temporary directories for all live model instances.

Creation Logic#

When a disk‑writing method is called with output_dir=None (the default), the model creates a process‑scoped temporary root directory (via tempfile.TemporaryDirectory) the first time such a call occurs. Each invocation then writes to a timestamped subdirectory under that root, ensuring that earlier results are never overwritten. Subdirectories follow the pattern <UTC-timestamp>_<caller_name>/, where <caller_name> is the name of the method that triggered the write operation. This root directory is stored in the temp_dir attribute and reused for subsequent calls until the user invokes cleanup().

Example Layout (implicit temp root):

/tmp/tmpz00u5kxk/       # model.temp_dir (root, reused until cleanup)
    20250926T185250223698Z_sample_prior_predictive/
    20250926T185359570134Z_log_likelihood/
    20250926T185419208087Z_predict/

If the user provides output_dir, that directory becomes the root, and it will be created if it does not already exist. The same timestamped subdirectory pattern is used there (e.g., my_runs/20250917T013040123456Z_predict). An explicit output_dir is not deleted by cleanup(), since it is assumed that the user intends to manage its lifecycle manually.

Cleanup Behavior#

Removes only the internally created temporary root directory, including all its timestamped subdirectories. The path of the removed directory is logged for reference. If no temporary directory exists, or if it has already been removed, the method is a no-op.

Additional guarantees:

Safe to call multiple times; subsequent calls do nothing.
Explicitly provided output_dir and its subdirectories are never deleted.
Internal references are cleared so future implicit calls create a fresh temporary root.

Rationale for Explicit Calls#

Although tempfile.TemporaryDirectory attempts automatic removal upon garbage collection, the timing is nondeterministic—especially in notebooks or long-lived processes. Large artifacts can accumulate quickly; calling cleanup() ensures prompt reclamation of disk space.

Accessing Artifact Paths#

Every disk-writing method records the call’s artifact path — the timestamped subdirectory holding the Zarr store with the results — in the artifact_path attribute, set on both the root tree and the group node (tree.attrs["artifact_path"] and tree[<group>].attrs["artifact_path"]). The enclosing base directory is simply Path(artifact_path).parent, and the temporary root (when no output_dir was given) is also available via temp_dir. estimate_effect() records the artifact paths of its two scenarios under artifact_path_baseline and artifact_path_intervention when the corresponding outputs were streamed to disk. The output below shows an example xarray.DataTree illustrating the artifact paths.

<xarray.DataTree 'root'>
Group: /
│   Attributes:
│       artifact_path:  /tmp/tmpo_kz7u4p/20260720T202819765490Z_predict
└── Group: /posterior_predictive
        Dimensions:  (chain: 1, draw: 1000, y_dim_0: 100)
        Coordinates:
          * chain    (chain) int64 8B 0
          * draw     (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999
          * y_dim_0  (y_dim_0) int64 800B 0 1 2 3 4 5 6 7 8 ... 92 93 94 95 96 97 98 99
        Data variables:
            y        (chain, draw, y_dim_0) float32 400kB dask.array<chunksize=(1, 1000, 100), meta=np.ndarray>
        Attributes:
            created_at:     2026-07-20T20:28:21.039678+00:00
            aimz_version:   0.14.dev0
            artifact_path:  /tmp/tmpo_kz7u4p/20260720T202819765490Z_predict

Note

A temporary directory is reclaimed when cleanup() or cleanup_models() is called, or when the model is garbage-collected. Afterwards the returned xarray.DataTree and all its group entries remain accessible, but any arrays that were stored on disk read back with all values set to zero, since the underlying data files are gone. A temporary result is therefore valid only while its model is alive; pass an explicit output_dir to keep results beyond the model’s lifetime.

Cleaning Multiple Models#

When a process creates multiple ImpactModel instances, it can be useful to clean up all their temporary directories in a single call. The class method cleanup_models() iterates over all live model instances and calls their cleanup() method. For example, this can be used as a pipeline hook after a run to clean up all temporary directories without tracking individual model instances.

Example:

from aimz.model import ImpactModel

# Create multiple instances and write to temporary directories
im1 = ImpactModel(...)
im1.fit(...)
im1.predict(...)

im2 = ImpactModel(...)
im2.fit(...)
im2.predict(...)

# Clean temporary directories for all active instances
ImpactModel.cleanup_models()

print(im1.temp_dir)  # None
print(im2.temp_dir)  # None

Typical Usage Pattern#

A typical workflow is to run these methods without specifying output_dir (using a temporary root), optionally access the results via the temp_dir attribute or the returned xarray.DataTree, and then free disk space with cleanup() or cleanup_models().

Tips for safe use:

Use cleanup() at the end of a notebook or in a finally block.
Use cleanup_models() to remove temporary directories for all live model instances at once.
Copy any results you want to keep before cleanup() or cleanup_models().
In tests, check that temporary directories are removed to avoid disk bloat.
Avoid leaving long sessions with un-cleaned temporary directories.