Utilities Module#

The utilities module provides critical value functions for constructing confidence intervals. These are the building blocks used by the Stats Engine.

Overview#

When constructing a two-sided confidence interval for the mean:

\[\bar{X} \pm c \cdot \frac{s}{\sqrt{n}}\]

the critical value \(c\) determines the interval width. This module provides:

z_crit(): Normal distribution critical values
t_crit(): Student’s t-distribution critical values
autocrit(): Automatic selection based on sample size

Critical Values#

z Critical Value (Normal)#

For large samples (\(n \ge 30\)), use the normal approximation:

\[z_{\alpha/2} = \Phi^{-1}\left(1 - \frac{\alpha}{2}\right)\]

where \(\Phi^{-1}\) is the inverse standard normal CDF and \(\alpha = 1 - \text{confidence}\).

from mcframework.utils import z_crit

z_crit(0.95)   # 1.96 (95% CI)
z_crit(0.99)   # 2.576 (99% CI)
z_crit(0.90)   # 1.645 (90% CI)

Common Values:

Confidence	α	z-critical
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576

t Critical Value (Student’s t)#

For small samples or when population variance is unknown, use the t-distribution:

\[t_{\alpha/2, \text{df}} = T_{\text{df}}^{-1}\left(1 - \frac{\alpha}{2}\right)\]

where \(\text{df} = n - 1\) degrees of freedom.

from mcframework.utils import t_crit

t_crit(0.95, df=9)    # 2.262 (n=10)
t_crit(0.95, df=29)   # 2.045 (n=30)
t_crit(0.95, df=99)   # 1.984 (n=100, approaches z)

The t critical value is always larger than z for finite df, yielding wider (more conservative) intervals.

Automatic Selection#

The autocrit() function chooses between z and t based on sample size:

from mcframework.utils import autocrit

# Small sample → use t
crit, method = autocrit(0.95, n=15)
print(f"{method}: {crit:.3f}")  # t: 2.145

# Large sample → use z
crit, method = autocrit(0.95, n=100)
print(f"{method}: {crit:.3f}")  # z: 1.960

# Force specific method
crit, method = autocrit(0.95, n=100, method="t")
print(f"{method}: {crit:.3f}")  # t: 1.984

Selection Rules:

method="auto" (default): Use t if \(n < 30\), otherwise z
method="z": Always use normal critical value
method="t": Always use t with \(\text{df} = \max(1, n-1)\)

Usage with Stats Engine#

The stats engine uses these utilities internally:

from mcframework.stats_engine import ci_mean

# ci_method controls which critical value is used
ci_mean(data, {"n": 25, "ci_method": "auto"})   # Uses t (n < 30)
ci_mean(data, {"n": 25, "ci_method": "z"})      # Forces z
ci_mean(data, {"n": 100, "ci_method": "auto"})  # Uses z (n ≥ 30)

Mathematical Background#

Why the n < 30 threshold?

The threshold comes from the convergence of the t-distribution to normal:

At df=29, the 97.5th percentile differs from z by only ~4%
Below df=10, the difference exceeds 10%
The t-distribution accounts for additional uncertainty in estimating variance

Coverage Probability:

A confidence interval has “coverage” \(1 - \alpha\) if:

\[\Pr\left(\mu \in \left[\bar{X} - c \cdot \text{SE}, \bar{X} + c \cdot \text{SE}\right]\right) = 1 - \alpha\]

Using t critical values with small samples ensures proper coverage even when the population variance is unknown.

Module Reference#

mcframework.utils#

Utility functions for critical values and CI selection.

This module provides z/t critical values and a tiny helper, autocrit(), that chooses between normal and t criticals in a reproducible way.

Functions#

`z_crit`	Two-sided normal critical value \(z_{\alpha/2}\).
`t_crit`	Two-sided Student t critical value \(t_{\alpha/2,\;\mathrm{df}}\).
`autocrit`	Select a critical value (z or t) for two-sided CIs.
`_validate_confidence`	Validate the confidence level.