BuildModelFromDAG#

class pymc_marketing.mmm.causal.BuildModelFromDAG(*, dag=FieldInfo(annotation=NoneType, required=True, description='DAG in DOT string format or A->B list'), df=FieldInfo(annotation=NoneType, required=True, description='DataFrame containing all DAG node columns'), target=FieldInfo(annotation=NoneType, required=True, description='Target node name present in DAG and df'), dims=FieldInfo(annotation=NoneType, required=True, description='Dims for observed/likelihood variables'), coords=FieldInfo(annotation=NoneType, required=True, description='Required coords mapping for dims and priors. All coord keys must exist as columns in df.'), model_config=FieldInfo(annotation=NoneType, required=False, default=None, description="Optional model config with Priors for 'intercept', 'slope' and 'likelihood'. Keys not supplied fall back to defaults."))[source]#

Build a PyMC probabilistic model directly from a Causal DAG and a tabular dataset.

The class interprets a Directed Acyclic Graph (DAG) where each node is a column in the provided df. For every edge A -> B it creates a slope prior for the contribution of A into the mean of B. Each node receives a likelihood prior. Dims and coords are used to align and index observed data via pm.Data and xarray.

Parameters:
dagstr

DAG in DOT format (e.g. digraph { A -> B; B -> C; }) or as a simple comma/newline separated list of edges (e.g. "A->B, B->C").

dfpandas.DataFrame

DataFrame that contains a column for every node present in the DAG and all columns named by the provided dims.

targetstr

Name of the target node present in both the DAG and df. This is not used to restrict modeling but is validated to exist in the DAG.

dimstuple[str, …]

Dims for the observed variables and likelihoods (e.g. ("date", "channel")).

coordsdict

Mapping from dim names to coordinate values. All coord keys must exist as columns in df and will be used to pivot the data to match dims.

model_configdict, optional

Optional configuration with priors for keys "intercept", "slope" and "likelihood". Values should be pymc_extras.prior.Prior instances. Missing keys fall back to :pyattr:`default_model_config`.

Examples

Minimal example using DOT format:

import numpy as np
import pandas as pd

from pymc_marketing.mmm.causal import BuildModelFromDAG

dates = pd.date_range("2024-01-01", periods=5, freq="D")
df = pd.DataFrame(
    {
        "date": dates,
        "X": np.random.normal(size=5),
        "Y": np.random.normal(size=5),
    }
)

dag = "digraph { X -> Y; }"
dims = ("date",)
coords = {"date": dates}

builder = BuildModelFromDAG(
    dag=dag, df=df, target="Y", dims=dims, coords=coords
)
model = builder.build()

Edge-list format and custom likelihood prior:

from pymc_extras.prior import Prior

dag = "X->Y"  # equivalent to the DOT example above
model_config = {
    "likelihood": Prior(
        "StudentT", nu=5, sigma=Prior("HalfNormal", sigma=1), dims=("date",)
    ),
}

builder = BuildModelFromDAG(
    dag=dag,
    df=df,
    target="Y",
    dims=("date",),
    coords={"date": dates},
    model_config=model_config,
)
model = builder.build()

Methods

BuildModelFromDAG.__init__(*[, dag, df, ...])

BuildModelFromDAG.build()

Construct and return the PyMC model implied by the DAG and data.

BuildModelFromDAG.dag_graph()

Return a copy of the parsed DAG as a NetworkX directed graph.

BuildModelFromDAG.model_graph()

Return a Graphviz visualization of the built PyMC model.

Attributes

default_model_config

Default priors for intercepts, slopes and likelihood using pymc_extras.Prior.