Stats Engine#
Statistical metrics and confidence interval computations for Monte Carlo simulation results.
Quick Start#
from mcframework.stats_engine import DEFAULT_ENGINE, StatsContext
import numpy as np
# Your simulation results
data = np.random.normal(100, 15, size=10_000)
# Configure and compute
ctx = StatsContext(n=len(data), confidence=0.95)
result = DEFAULT_ENGINE.compute(data, ctx)
print(f"Mean: {result.metrics['mean']:.2f}")
print(f"95% CI: [{result.metrics['ci_mean']['low']:.2f}, {result.metrics['ci_mean']['high']:.2f}]")
Configuration#
StatsContext#
The StatsContext dataclass configures all metric computations:
ctx = StatsContext(
n=10_000, # Sample size (required)
confidence=0.95, # CI confidence level
ci_method="auto", # "auto", "z", "t", or "bootstrap"
percentiles=(5, 50, 95),
nan_policy="propagate", # or "omit"
)
Shared, explicit configuration for statistic and CI computations. |
See StatsContext for full attribute documentation with examples.
Quick Reference:
Field |
Default |
Description |
|---|---|---|
|
(required) |
Declared sample size |
|
0.95 |
Confidence level ∈ (0, 1) |
|
“auto” |
CI method: “auto”, “z”, “t”, “bootstrap” |
|
(5,25,50,75,95) |
Quantiles to compute |
|
“propagate” |
“propagate” or “omit” non-finite values |
|
1 |
Degrees of freedom for std |
|
None |
Target value for bias/MSE metrics |
|
None |
Error tolerance for Chebyshev/Markov |
|
10,000 |
Bootstrap resamples |
|
“percentile” |
Bootstrap method: “percentile” or “bca” |
|
None |
Seed or |
Helper Properties:
ctx.alpha # Tail probability: 1 - confidence
ctx.q_bound() # Percentile bounds: (2.5, 97.5) for 95% CI
ctx.eff_n(len(x)) # Effective sample size
ctx.with_overrides(confidence=0.99) # Create modified copy
Metrics#
Descriptive Statistics#
Sample mean \(\bar X = \frac{1}{n}\sum_{i=1}^n x_i\). |
|
Sample standard deviation with Bessel correction. |
|
Empirical percentiles evaluated on the cleaned sample. |
|
Unbiased sample skewness (Fisher–Pearson standardized third central moment). |
|
Unbiased sample excess kurtosis (Fisher definition). |
Usage:
from mcframework.stats_engine import mean, std, percentiles
m = mean(data, ctx) # Sample mean
s = std(data, ctx) # Sample std (ddof=1)
pcts = percentiles(data, ctx) # {5: ..., 25: ..., 50: ..., 75: ..., 95: ...}
Confidence Intervals#
Three methods for computing CIs on the mean:
Method |
Function |
Best For |
|---|---|---|
Parametric |
Large samples, normal-ish data |
|
Bootstrap |
Non-normal distributions |
|
Chebyshev |
Distribution-free guarantees |
Parametric CI for \(\mathbb{E}[X]\) using z/t critical values. |
|
Bootstrap confidence interval for \(\mathbb{E}[X]\) via resampling. |
|
Distribution-free CI for \(\mathbb{E}[X]\) via Chebyshev's inequality. |
Parametric CI (z or t):
from mcframework.stats_engine import ci_mean
result = ci_mean(data, ctx)
# {'confidence': 0.95, 'method': 'z', 'low': 99.5, 'high': 100.5, 'se': 0.15, 'crit': 1.96}
Bootstrap CI:
from mcframework.stats_engine import ci_mean_bootstrap
ctx = StatsContext(n=len(data), bootstrap="bca", rng=42)
result = ci_mean_bootstrap(data, ctx)
# {'confidence': 0.95, 'method': 'bootstrap-bca', 'low': 99.4, 'high': 100.6}
Target-Based Metrics#
Metrics that compare results to a known target value:
Bias of the sample mean relative to a target \(\theta\). |
|
Mean squared error of \(\bar X\) relative to a target \(\theta\). |
|
Markov bound on error probability for target \(\theta\). |
|
Required \(n\) to achieve Chebyshev CI half-width \(\le \varepsilon\). |
Usage (requires target and/or eps in context):
ctx = StatsContext(n=len(data), target=100.0, eps=0.5)
bias = bias_to_target(data, ctx) # Mean - target
mse = mse_to_target(data, ctx) # Mean squared error
prob = markov_error_prob(data, ctx) # P(|mean - target| >= eps)
req_n = chebyshev_required_n(data, ctx) # Required n for precision
Engine#
The StatsEngine orchestrates multiple metrics at once:
from mcframework.stats_engine import StatsEngine, FnMetric, mean, std, ci_mean
engine = StatsEngine([
FnMetric("mean", mean),
FnMetric("std", std),
FnMetric("ci_mean", ci_mean),
])
result = engine.compute(data, ctx)
print(result.metrics) # {'mean': 100.1, 'std': 15.2, 'ci_mean': {...}}
print(result.skipped) # Metrics skipped due to missing context
print(result.errors) # Metrics that raised errors
Orchestrator that evaluates a set of metrics over an input array. |
|
Lightweight adapter that binds a human-readable |
|
Result from |
Default Engine:
A pre-configured engine with all standard metrics:
from mcframework.stats_engine import DEFAULT_ENGINE
result = DEFAULT_ENGINE.compute(data, ctx)
Includes: mean, std, percentiles, skew, kurtosis, ci_mean, ci_mean_bootstrap, ci_mean_chebyshev, chebyshev_required_n, markov_error_prob, bias_to_target, mse_to_target.
Construct a |
Enumerations#
Parametric strategies for selecting confidence-interval critical values. |
|
Strategies for handling the propagation of non-finite values. |
|
Supported bootstrap confidence-interval flavors. |
from mcframework.stats_engine import CIMethod, NanPolicy, BootstrapMethod
ctx = StatsContext(
n=100,
ci_method=CIMethod.auto, # or "auto"
nan_policy=NanPolicy.omit, # or "omit"
bootstrap=BootstrapMethod.bca # or "bca"
)
Exceptions#
Raised when a required context field is missing. |
|
Raised when insufficient data is available for computation. |
from mcframework.stats_engine import MissingContextError
try:
bias_to_target(data, StatsContext(n=100)) # Missing 'target'
except MissingContextError as e:
print(f"Missing field: {e}")
Custom Metrics#
Create custom metrics with FnMetric:
import numpy as np
from mcframework.stats_engine import FnMetric, StatsEngine, StatsContext
def coefficient_of_variation(x, ctx):
"""CV = std / mean"""
m, s = float(np.mean(x)), float(np.std(x, ddof=1))
return s / m if m != 0 else float('nan')
def interquartile_range(x, ctx):
"""IQR = Q3 - Q1"""
q1, q3 = np.percentile(x, [25, 75])
return float(q3 - q1)
# Build custom engine
engine = StatsEngine([
FnMetric("cv", coefficient_of_variation, doc="Coefficient of variation"),
FnMetric("iqr", interquartile_range, doc="Interquartile range"),
])
result = engine.compute(data, StatsContext(n=len(data)))
See Also#
Core Module — Simulation classes that use this engine
Utilities Module — Critical value utilities (z_crit, t_crit)