TBFPC#

class pymc_marketing.mmm.causal.TBFPC(target, *, target_edge_rule='any', bf_thresh=1.0, forbidden_edges=None)[source]#

Target-first Bayes Factor PC (TBF-PC) causal discovery algorithm.

This algorithm is a target-oriented variant of the Peter–Clark (PC) algorithm, using Bayes factors (via ΔBIC approximation) as the conditional independence test.

For each conditional independence test of the form

\[H_0 : Y \perp X \mid S \quad \text{vs.} \quad H_1 : Y \not\!\perp X \mid S\]

we compare two linear models:

\[\begin{split}M_0 : Y \sim S \\ M_1 : Y \sim S + X\end{split}\]

where \(S\) is a conditioning set of variables.

The Bayesian Information Criterion (BIC) is defined as

\[\mathrm{BIC}(M) = n \log\!\left(\frac{\mathrm{RSS}}{n}\right) + k \log(n),\]

with residual sum of squares \(\mathrm{RSS}\), sample size \(n\), and number of parameters \(k\).

The Bayes factor is approximated by

\[\log \mathrm{BF}_{10} \approx -\tfrac{1}{2} \left[ \mathrm{BIC}(M_1) - \mathrm{BIC}(M_0) \right].\]

Independence is declared if \(\mathrm{BF}_{10} < \tau\), where \(\tau\) is set via the bf_thresh parameter.

References

  • Spirtes, Glymour, Scheines (2000). Causation, Prediction, and Search. MIT Press. [PC algorithm]

  • Spirtes & Glymour (1991). “An Algorithm for Fast Recovery of Sparse Causal Graphs.”

  • Kass, R. & Raftery, A. (1995). “Bayes Factors.”

Examples

1. Basic usage with full conditioning set

import numpy as np, pandas as pd

rng = np.random.default_rng(7)
n = 2000
C = rng.gamma(2,1,n)
A = 0.7*C + rng.gamma(2,1,n)
D = 0.5*C + rng.gamma(2,1,n)
B = 0.8*A + rng.gamma(2,1,n)
Y = 0.9*B + 0.6*D + 0.7*C + rng.gamma(2,1,n)

df = pd.DataFrame({"A":A,"B":B,"C":C,"D":D,"Y":Y})
df = (df - df.mean())/df.std()  # recommended scaling

model = TBFPC(target="Y", target_edge_rule="fullS")
model.fit(df, drivers=["A","B","C","D"])

print(model.get_directed_edges())
print(model.get_undirected_edges())
print(model.to_digraph())

2. Using forbidden edges

You can specify edges that must not be tested or included (prior knowledge about the domain).

model = TBFPC(
    target="Y",
    target_edge_rule="any",
    forbidden_edges=[("A","C")]  # forbid A--C
)
model.fit(df, drivers=["A","B","C","D"])
print(model.to_digraph())

3. Conservative rule

Keeps driver → target edges if any conditioning set shows dependence.

model = TBFPC(target="Y", target_edge_rule="conservative")
model.fit(df, drivers=["A","B","C","D"])
print(model.to_digraph())

Methods

TBFPC.__init__(target, *[, ...])

Create a new TBFPC causal discovery model.

TBFPC.fit(df, drivers)

Fit the TBFPC procedure to the supplied dataframe.

TBFPC.get_directed_edges()

Return directed edges learned by the algorithm.

TBFPC.get_test_results(x, y)

Return ΔBIC diagnostics for the unordered pair (x, y).

TBFPC.get_undirected_edges()

Return undirected edges remaining after orientation.

TBFPC.summary()

Render a text summary of the learned graph and test count.

TBFPC.to_digraph()

Return the learned graph encoded in DOT format.