Utilities Module#
The utilities module provides critical value functions for constructing confidence intervals. These are the building blocks used by the Stats Engine.
Overview#
When constructing a two-sided confidence interval for the mean:
the critical value \(c\) determines the interval width. This module provides:
z_crit()— Normal distribution critical valuest_crit()— Student’s t-distribution critical valuesautocrit()— Automatic selection based on sample size
Critical Values#
z Critical Value (Normal)#
For large samples (\(n \ge 30\)), use the normal approximation:
where \(\Phi^{-1}\) is the inverse standard normal CDF and \(\alpha = 1 - \text{confidence}\).
from mcframework.utils import z_crit
z_crit(0.95) # 1.96 (95% CI)
z_crit(0.99) # 2.576 (99% CI)
z_crit(0.90) # 1.645 (90% CI)
Common Values:
Confidence |
α |
z-critical |
|---|---|---|
90% |
0.10 |
1.645 |
95% |
0.05 |
1.960 |
99% |
0.01 |
2.576 |
t Critical Value (Student’s t)#
For small samples or when population variance is unknown, use the t-distribution:
where \(\text{df} = n - 1\) degrees of freedom.
from mcframework.utils import t_crit
t_crit(0.95, df=9) # 2.262 (n=10)
t_crit(0.95, df=29) # 2.045 (n=30)
t_crit(0.95, df=99) # 1.984 (n=100, approaches z)
The t critical value is always larger than z for finite df, yielding wider (more conservative) intervals.
Automatic Selection#
The autocrit() function chooses between z and t based on sample size:
from mcframework.utils import autocrit
# Small sample → use t
crit, method = autocrit(0.95, n=15)
print(f"{method}: {crit:.3f}") # t: 2.145
# Large sample → use z
crit, method = autocrit(0.95, n=100)
print(f"{method}: {crit:.3f}") # z: 1.960
# Force specific method
crit, method = autocrit(0.95, n=100, method="t")
print(f"{method}: {crit:.3f}") # t: 1.984
Selection Rules:
method="auto"(default): Use t if \(n < 30\), otherwise zmethod="z": Always use normal critical valuemethod="t": Always use t with \(\text{df} = \max(1, n-1)\)
Usage with Stats Engine#
The stats engine uses these utilities internally:
from mcframework.stats_engine import ci_mean
# ci_method controls which critical value is used
ci_mean(data, {"n": 25, "ci_method": "auto"}) # Uses t (n < 30)
ci_mean(data, {"n": 25, "ci_method": "z"}) # Forces z
ci_mean(data, {"n": 100, "ci_method": "auto"}) # Uses z (n ≥ 30)
Mathematical Background#
Why the n < 30 threshold?
The threshold comes from the convergence of the t-distribution to normal:
At df=29, the 97.5th percentile differs from z by only ~4%
Below df=10, the difference exceeds 10%
The t-distribution accounts for additional uncertainty in estimating variance
Coverage Probability:
A confidence interval has “coverage” \(1 - \alpha\) if:
Using t critical values with small samples ensures proper coverage even when the population variance is unknown.
Module Reference#
mcframework.utils#
Utility functions for critical values and CI selection.
This module provides z/t critical values and a tiny helper, autocrit(),
that chooses between normal and t criticals in a reproducible way.
Functions#
Two-sided normal critical value \(z_{\alpha/2}\). |
|
Two-sided Student t critical value \(t_{\alpha/2,\;\mathrm{df}}\). |
|
Select a critical value (z or t) for two-sided CIs. |
|
Validate the confidence level. |
See Also#
Stats Engine — Uses these utilities for confidence intervals
ci_mean()— Parametric CI for the meanci_mean_chebyshev()— Distribution-free alternative