Interventions & Effect Estimation#

This guide covers two closely related functionalities:

The intervention argument available in predictive methods like predict(), predict_on_batch(), and their posterior predictive counterparts. Internally, this enables hard (do) interventions on specified sample sites using NumPyro’s do effect handler to generate counterfactual draws without rewriting the model.
The estimate_effect() method, which computes the elementwise difference between an intervention (counterfactual) scenario and a baseline (factual) scenario to quantify causal or policy impact.

Typical workflow:

Generate one predictive result under factual conditions (optionally also using intervention if you want to hold certain sites at specific values).
Generate another predictive result under a modified intervention mapping.
Pass both results (or the argument dictionaries to generate them lazily) to estimate_effect() to obtain the effect output.

Each scenario is a DataTree produced by the prediction API or materialized on-demand via argument dictionaries.

Interventions#

The intervention argument is a mapping (dict[str, ArrayLike]) from sample site name to a replacement value; during predictive sampling each listed site is fixed, enabling counterfactual or policy analysis. Values must broadcast to the site’s per‑observation shape (e.g., intervening on a length‑N vector site generally requires shape (N,)). You can modify multiple sites at once; any not specified follow their posterior (or prior) distribution.

Setting in_sample=True stores draws under posterior_predictive while in_sample=False stores them under predictions—the group must match between baseline and intervention scenarios when computing effects. Deterministic downstream sites automatically reflect the intervened values.

# Minimal sketch of a model exposing a stochastic site 'z'
def model(X, Z, y=None):
    ...
    # site we may choose to override at prediction time
    z = numpyro.sample("z", ...)
    ...

# Fit (details elided);
im = ImpactModel(model, ...).fit_on_batch(...)

# Baseline scenario: set 'z' to its observed/factual value Z
baseline = im.predict_on_batch(X, intervention={"z": Z})

# Modified scenario: counterfactual where we overwrite 'z' with zeros
modified = im.predict_on_batch(
    X,
    intervention={"z": jnp.zeros_like(Z)},
)

Effect Estimation#

The estimate_effect() method computes an elementwise difference between two predictive scenarios (intervention - baseline) and returns a single-group DataTree that preserves sampling dimensions.

One baseline and one intervention scenario must be provided, either eagerly (output_baseline / output_intervention) or lazily through argument dictionaries (args_baseline / args_intervention). Mixing is allowed; for example, a precomputed baseline can be supplied with output_baseline while the intervention is generated lazily with args_intervention (or the reverse). Both scenarios must come from the same predictive group (both posterior_predictive or both predictions) with matching variable sets and shapes.

The result contains that shared group name and each variable is the elementwise difference

\[\text{intervention} - \text{baseline}\]

retaining leading draw / chain dimensions.

Eager (precomputed scenarios):

effect = im.estimate_effect(
    output_baseline=baseline,
    output_intervention=modified,
)

Lazy (defer prediction):

effect = im.estimate_effect(
    args_baseline={
        "X": X,
        "intervention": {"z": Z},
        "in_sample": False,
    },
    args_intervention={
        "X": X,
        "intervention": {"z": jnp.zeros_like(Z)},
        "in_sample": False,
    },
)

Mixed (precomputed baseline, lazy intervention):

effect = im.estimate_effect(
    output_baseline=baseline,
    args_intervention={
        "X": X,
        "intervention": {"z": jnp.zeros_like(Z)},
        "in_sample": False,
    },
)

Note

A lazily generated scenario (args_baseline / args_intervention) runs the disk-backed predict() internally, writing its artifacts to disk (under the model’s temporary directory unless an output_dir entry is included in the argument dictionary). Because the intermediate trees are not returned, the effect tree’s artifact_path_baseline / artifact_path_intervention attributes are the only handle to those artifacts; see Output Directory Cleanup for managing them. Pass on_batch=True to compute both scenarios in memory without writing to disk.

The returned DataTree captures the elementwise difference for every variable present in the predictive group. Any subsequent summary (e.g. mean, intervals) can be computed using Xarray, ArviZ, or standard NumPy / JAX utilities.

Note

estimate_effect() computes the posterior predictive contrast between two scenarios under structural interventions, propagating full posterior uncertainty through the difference. Whether this contrast admits a causal interpretation depends on the structural assumptions encoded in the model (the kernel): causal identification is a property of the model specification, not the estimation procedure. When the user-defined model encodes appropriate causal assumptions—such as conditioning on confounders and specifying correct functional relationships—this contrast corresponds to a causal effect estimate.

Example: Causal Network with Confounder#

This example illustrates a simple causal network. The variable Z has a direct causal effect on the outcome Y, while both are influenced by a shared confounder, C. An additional variable, X, is an observed exogenous factor that influences Z but has no direct effect on Y.

Our objective is to estimate the causal effect of Z (or alternatively X) on Y, while properly accounting for the confounding influence of C. We assume the following generative model for the observed data:

Model#

import logging

import jax.numpy as jnp
import numpyro.distributions as dist
from jax import nn, random
from jax.typing import ArrayLike
from numpyro import optim, plate, sample
from numpyro.infer import SVI, Trace_ELBO, init_to_feasible
from numpyro.infer.autoguide import AutoNormal

from aimz import ImpactModel

logging.basicConfig(level=logging.INFO, force=True)


def model(X: ArrayLike, C: ArrayLike, y: ArrayLike | None = None) -> None:
    # Observed confounder
    c = sample("c", dist.Exponential(), obs=C)

    # Priors for coefficients in the structural model
    # C -> Z and C -> Y
    beta_cz = sample("beta_cz", dist.Normal())
    beta_cy = sample("beta_cy", dist.Normal())

    # X -> Z and Z -> Y
    beta_xz = sample("beta_xz", dist.Normal())
    beta_zy = sample("beta_zy", dist.Normal())

    # Intercepts
    beta_z = sample("beta_z", dist.Normal())
    beta_y = sample("beta_y", dist.Normal())

    # Observation noise for Z
    sigma = sample("sigma", dist.Exponential())

    # Plate over data
    with plate("data", X.shape[0]):
        mu_z = beta_z + beta_cz * c + beta_xz * X.squeeze(axis=1)
        z = sample("z", dist.LogNormal(mu_z, sigma))

        logits = beta_y + beta_cy * c + beta_zy * z
        sample("y", dist.Bernoulli(logits=logits), obs=y)

Simulating Data under a Known Structural Model#

We generate synthetic data consistent with the assumed structure:

C is drawn from an exponential distribution.
X is a count variable from a Poisson distribution.
Z is generated as a noisy exponential function of C and X.
Y is a binary outcome influenced by both C and Z through a logistic model.

# Create a pseudo-random number generator key for JAX
rng_key = random.key(42)

# Sample C from an Exponential distribution
rng_key, rng_subkey = random.split(rng_key)
C = random.exponential(rng_subkey, shape=(100,))

# Sample X from a Poisson distribution
rng_key, rng_subkey = random.split(rng_key)
X = random.poisson(rng_subkey, lam=1, shape=(100, 1))

# Generate Z influenced by C and X
rng_key, rng_subkey = random.split(rng_key)
mu_z = -1.0 + 0.5 * C - 1.5 * X.squeeze()
sigma_z = 10.0  # Add substantial noise to reduce correlation between C and Z
Z = jnp.exp(random.normal(rng_subkey, shape=(100,)) * sigma_z + mu_z)

# Generate Y from a logistic regression on C and Z
rng_key, rng_subkey = random.split(rng_key)
logits = -2.0 + 5.0 * C + 0.1 * Z
p = nn.sigmoid(logits)
y = random.bernoulli(rng_subkey, p=p).astype(jnp.int32)

Fitting the Model and Estimating Effects#

We fit the model using stochastic variational inference. Once trained, we perform a counterfactual analysis to isolate the effect of Z on Y.

dt_factual represents predictions under the factual setting (with observed Z).
dt_counterfactual represents predictions under a counterfactual intervention where Z is set to zero.

Note

This model contains a local latent variable, which requires predict_on_batch() here. Prefer predict() whenever it is compatible with the model. See model compatibility for details.

Comparing these two distributions allows us to estimate the effect of Z on Y, adjusted for the influence of C.

im = ImpactModel(
    model,
    rng_key=rng_key,
    inference=SVI(
        model,
        guide=AutoNormal(model, init_loc_fn=init_to_feasible()),
        optim=optim.Adam(step_size=1e-3),
        loss=Trace_ELBO(),
    ),
)
im.fit_on_batch(X, y, C=C)

# Predict under factual (Z) and counterfactual (zeroed Z) scenarios
dt_factual = im.predict_on_batch(X, C=C, intervention={"z": Z})
dt_counterfactual = im.predict_on_batch(
    X,
    C=C,
    intervention={"z": jnp.zeros_like(Z)},
)

# Estimate effect of intervening on Z while conditioning on C
effect = im.estimate_effect(
    output_baseline=dt_factual,
    output_intervention=dt_counterfactual,
)
effect