Backends Module#
The backends module provides pluggable execution strategies for Monte Carlo
simulations, from single-threaded CPU to GPU-accelerated batch processing.
Overview#
Backend |
Class |
Use Case |
|---|---|---|
Sequential |
Single-threaded, debugging, small jobs |
|
Thread |
NumPy-heavy code (releases GIL) |
|
Process |
Python-bound code, Windows |
|
Torch CPU |
Vectorized CPU batch execution |
|
Torch MPS |
Apple Silicon GPU (M1/M2/M3/M4) |
|
Torch CUDA |
NVIDIA GPU acceleration |
Quick Start#
CPU Backends:
from mcframework import PiEstimationSimulation
sim = PiEstimationSimulation()
sim.set_seed(42)
# Sequential (single-threaded)
result = sim.run(10_000, backend="sequential")
# Thread-based parallelism (default on POSIX)
result = sim.run(100_000, backend="thread", n_workers=8)
# Process-based parallelism (default on Windows)
result = sim.run(100_000, backend="process", n_workers=4)
# Auto-selection based on platform and job size
result = sim.run(100_000, backend="auto")
GPU Backends (requires PyTorch):
# Torch CPU (vectorized, no GPU required)
result = sim.run(1_000_000, backend="torch", torch_device="cpu")
# Apple Silicon GPU (M1/M2/M3/M4 Macs)
result = sim.run(1_000_000, backend="torch", torch_device="mps")
# NVIDIA CUDA GPU
result = sim.run(1_000_000, backend="torch", torch_device="cuda")
CPU Backends#
SequentialBackend#
Single-threaded execution for debugging and small jobs.
Sequential (single-threaded) execution backend. |
When to use:
Debugging and testing
Jobs with < 20,000 simulations
When reproducibility debugging is needed
result = sim.run(1000, backend="sequential")
ThreadBackend#
Thread-based parallelism using ThreadPoolExecutor.
Thread-based parallel execution backend. |
When to use:
NumPy-heavy code that releases the GIL
POSIX systems (macOS, Linux)
When process spawn overhead is significant
result = sim.run(100_000, backend="thread", n_workers=8)
ProcessBackend#
Process-based parallelism using ProcessPoolExecutor.
Process-based parallel execution backend. |
When to use:
Python-bound code that doesn’t release the GIL
Windows (threads serialize under GIL)
CPU-intensive pure Python calculations
result = sim.run(100_000, backend="process", n_workers=4)
Torch GPU Backends#
The Torch backends enable GPU-accelerated batch execution for simulations that
implement the torch_batch() method.
Note
Installation: GPU backends require PyTorch. Install with:
pip install mcframework[gpu]
TorchBackend (Unified)#
Factory class that auto-selects the appropriate device-specific backend.
Factory class that creates and wraps the appropriate device-specific backend. |
Usage:
from mcframework.backends import TorchBackend
# Auto-creates TorchCPUBackend
backend = TorchBackend(device="cpu")
# Auto-creates TorchMPSBackend (Apple Silicon)
backend = TorchBackend(device="mps")
# Auto-creates TorchCUDABackend (NVIDIA)
backend = TorchBackend(device="cuda")
# Run simulation
results = backend.run(sim, n_simulations=1_000_000, seed_seq=sim.seed_seq)
TorchCPUBackend#
Vectorized batch execution on CPU using PyTorch Tensor.
Torch CPU batch execution backend. |
When to use:
Baseline testing before GPU deployment
Systems without GPU acceleration
Debugging vectorized code
Small to medium simulation sizes
from mcframework.backends import TorchCPUBackend
backend = TorchCPUBackend()
results = backend.run(sim, 100_000, sim.seed_seq, progress_callback=None)
TorchMPSBackend#
Apple Silicon GPU acceleration via Metal Performance Shaders (MPS).
Torch MPS batch execution backend for Apple Silicon GPUs. |
Requirements:
macOS 12.3+ with Apple Silicon (M1/M2/M3/M4)
PyTorch with MPS support
Dtype Policy:
Metal Performance Shaders only supports up to float32 on GPU.
Therefore, the framework promotes the results to float64 on CPU (see to())
for stats engine precision.
Warning
MPS Determinism Caveat
Apple’s documentation confirms the lack of float64 support: MPSDataType.
Also, other issues on other projects have reported a similar problem:
Torch MPS preserves RNG stream structure but does not guarantee bitwise
reproducibility due to Metal backend scheduling and float32 arithmetic.
Statistical properties (mean, variance, CI coverage) remain correct.
(see TestMPSDeterminism in tests/test_torch_backend.py for actual tests)
from mcframework.backends import TorchMPSBackend, is_mps_available
if is_mps_available():
backend = TorchMPSBackend()
results = backend.run(sim, 1_000_000, sim.seed_seq, None)
TorchCUDABackend#
NVIDIA GPU acceleration with adaptive batching and CUDA streams.
Torch CUDA batch execution backend for NVIDIA GPUs. |
Features:
Adaptive batch sizing based on GPU memory
CUDA stream support for async execution
Native float64 support (no precision loss)
Optional cuRAND integration for maximum performance
Configuration Options:
Parameter |
Default |
Description |
|---|---|---|
|
0 |
CUDA device index for multi-GPU systems |
|
False |
Use cuRAND instead of torch.Generator |
|
None |
Fixed batch size (None = adaptive) |
|
True |
Enable CUDA streams for async execution |
from mcframework.backends import TorchCUDABackend, is_cuda_available
if is_cuda_available():
# Basic usage
backend = TorchCUDABackend()
# Advanced configuration
backend = TorchCUDABackend(
device_id=0,
use_curand=False,
batch_size=None, # Adaptive
use_streams=True,
)
results = backend.run(sim, 10_000_000, sim.seed_seq, progress_callback)
Implementing Torch Support#
To enable GPU acceleration for your simulation, implement torch_batch():
from mcframework import MonteCarloSimulation
class MySimulation(MonteCarloSimulation):
supports_batch = True # Required flag
def single_simulation(self, _rng=None, **kwargs):
rng = self._rng(_rng, self.rng)
return float(rng.normal())
def torch_batch(self, n, *, device, generator):
"""Vectorized Torch implementation."""
import torch
# Use explicit generator for reproducibility
samples = torch.randn(n, device=device, generator=generator)
# Return float32 for MPS compatibility
# Framework promotes to float64 on CPU
return samples.float()
Key Requirements:
Set
supports_batch = Trueas a class attributeAll random sampling must use the provided
generatorNever use global RNG (
torch.manual_seed())Return float32 for MPS compatibility
RNG Architecture#
The framework uses explicit PyTorch Generator objects seeded from NumPy’s
SeedSequence to maintain reproducible parallel streams:
from mcframework.backends import make_torch_generator
import numpy as np
# Create seed sequence
seed_seq = np.random.SeedSequence(42)
# Create explicit generator (spawns child seed)
generator = make_torch_generator(torch.device("cpu"), seed_seq)
# Use in sampling
samples = torch.rand(1000, generator=generator)
Why explicit generators?
manual_seed()is global state that breaks parallel compositionExplicit generators enable deterministic multi-stream MC
Mirrors NumPy’s
spawn()semantics
Utility Functions#
Partition an integer range \([0, n)\) into half-open blocks \((i, j)\). |
|
Execute a small batch of single simulations in a separate worker. |
|
Return True when running on a Windows platform. |
|
Validate that the requested Torch device is available. |
|
Create an explicit Torch generator seeded from a SeedSequence. |
|
Check if MPS (Metal Performance Shaders) is available. |
|
Check if CUDA is available. |
Availability Checks:
from mcframework.backends import is_mps_available, is_cuda_available
print(f"MPS available: {is_mps_available()}")
print(f"CUDA available: {is_cuda_available()}")
Device Validation:
from mcframework.backends import validate_torch_device
validate_torch_device("cpu") # Always passes
validate_torch_device("mps") # Raises RuntimeError if unavailable
validate_torch_device("cuda") # Raises RuntimeError if unavailable
Backend Protocol#
All backends implement the ExecutionBackend protocol:
Protocol defining the interface for execution backends. |
Protocol Definition:
from typing import Protocol, Callable
import numpy as np
class ExecutionBackend(Protocol):
def run(
self,
sim: "MonteCarloSimulation",
n_simulations: int,
seed_seq: np.random.SeedSequence | None,
progress_callback: Callable[[int, int], None] | None = None,
**simulation_kwargs,
) -> np.ndarray:
"""Execute simulations and return results array."""
...
Note
Torch backends achieve massive speedups through vectorization, not just parallelization. The entire batch executes as tensor operations.
Module Reference#
Base Classes and Utilities#
Base classes and utilities for execution backends.
This module provides:
- Protocol
ExecutionBackend— Interface for simulation execution strategies- Functions
make_blocks()— Chunking helper for parallel work distributionworker_run_chunk()— Top-level worker for process-based parallelism- Helpers
is_windows_platform()— Platform detection for backend selection
- class mcframework.backends.base.ExecutionBackend[source]#
Bases:
ProtocolProtocol defining the interface for execution backends.
Backends are responsible for executing simulation draws and returning results. They handle the details of sequential vs parallel execution, thread vs process pools, and progress reporting.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#
Run simulation draws and return results.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run.
- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **simulation_kwargs
Any Additional keyword arguments passed to
single_simulation.
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,).
- __init__(*args, **kwargs)#
- mcframework.backends.base.make_blocks(n: int, block_size: int = 10000) list[tuple[int, int]][source]#
Partition an integer range \([0, n)\) into half-open blocks \((i, j)\).
- Parameters:
- Returns:
Examples
>>> make_blocks(5, block_size=2) [(0, 2), (2, 4), (4, 5)]
- mcframework.backends.base.worker_run_chunk(sim: MonteCarloSimulation, chunk_size: int, seed_seq: np.random.SeedSequence, simulation_kwargs: dict[str, Any]) list[float][source]#
Execute a small batch of single simulations in a separate worker.
- Parameters:
- sim
Simulation instance to call (
MonteCarloSimulation.single_simulation()). Must be pickleable when used with a process backend.- chunk_size
int Number of draws to compute in this worker.
- seed_seq
numpy.random.SeedSequence Seed sequence for creating an independent RNG stream in the worker.
- simulation_kwargs
dict Keyword arguments forwarded to
MonteCarloSimulation.single_simulation().
- Returns:
Notes
Uses
numpy.random.Philoxto spawn a deterministic, independent stream per worker chunk.
Sequential Backend#
Sequential execution backend for Monte Carlo simulations.
This module provides a single-threaded execution strategy that runs simulations sequentially with optional progress reporting.
- class mcframework.backends.sequential.SequentialBackend[source]#
Bases:
objectSequential (single-threaded) execution backend.
Executes simulation draws one at a time on the main thread. Suitable for small simulations or debugging.
Examples
>>> backend = SequentialBackend() >>> results = backend.run(sim, n_simulations=1000, seed_seq=None, progress_callback=None)
- run(sim: MonteCarloSimulation, n_simulations: int, _seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#
Run simulations sequentially on a single thread.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run.
- n_simulations
int Number of simulation draws to perform.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **simulation_kwargs
Any Additional keyword arguments passed to
single_simulation.
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,).
Parallel Backends#
Parallel execution backends for Monte Carlo simulations.
This module provides:
- Classes
ThreadBackend— Thread-based parallelism using ThreadPoolExecutorProcessBackend— Process-based parallelism using ProcessPoolExecutor
- class mcframework.backends.parallel.ThreadBackend[source]#
Bases:
objectThread-based parallel execution backend.
Uses
concurrent.futures.ThreadPoolExecutorfor parallel execution. Effective when NumPy releases the GIL (most numerical operations).- Parameters:
Examples
>>> backend = ThreadBackend(n_workers=4) >>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq, progress_callback=None)
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#
Run simulations in parallel using threads.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run.
- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for spawning independent RNG streams per chunk.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **simulation_kwargs
Any Additional keyword arguments passed to
single_simulation.
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,).
- class mcframework.backends.parallel.ProcessBackend[source]#
Bases:
objectProcess-based parallel execution backend.
Uses
concurrent.futures.ProcessPoolExecutorwith spawn context for parallel execution. Required on Windows or when thread-safety is a concern.- Parameters:
Notes
The simulation instance must be pickleable for process-based execution.
Examples
>>> backend = ProcessBackend(n_workers=4) >>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq, progress_callback=None)
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#
Run simulations in parallel using processes.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must be pickleable.
- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for spawning independent RNG streams per chunk.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **simulation_kwargs
Any Additional keyword arguments passed to
single_simulation.
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,).
Torch Backend (Unified)#
Torch execution backend for GPU-accelerated Monte Carlo simulations.
This module provides a unified interface for Torch-based backends:
- Classes
TorchBackend— Factory that selects appropriate device backend- Device-Specific Backends
TorchCPUBackend— CPU execution (torch_cpu.py)TorchMPSBackend— Apple Silicon GPU (torch_mps.py)TorchCUDABackend— NVIDIA GPU (torch_cuda.py, stub)- Utilities
validate_torch_device()— Validate device availabilitymake_torch_generator()— Create explicit RNG generatorsVALID_TORCH_DEVICES— Supported device types- Device Support
cpu— Safe default, works everywheremps— Apple Metal Performance Shaders (M1/M2/M3/M4 Macs)cuda— NVIDIA Compute Unified Device Architecture (CUDA 12.x with CuPy for CuRAND)
Notes#
Use TorchBackend as the main entry point—it automatically
selects the appropriate device-specific backend based on the device
parameter.
Example#
>>> from mcframework.backends import TorchBackend
>>> backend = TorchBackend(device="mps") # Auto-selects TorchMPSBackend
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
- class mcframework.backends.torch.TorchBackend[source]#
Bases:
objectFactory class that creates and wraps the appropriate device-specific backend.
This is a factory class that creates and wraps the appropriate device-specific backend (
TorchCPUBackend,TorchMPSBackend, orTorchCUDABackend) based on thedeviceparameter.- Parameters:
- device{“cpu”, “mps”, “cuda”}, default
"cpu" Torch device for computation:
"cpu"— UsesTorchCPUBackend"mps"— UsesTorchMPSBackend(Apple Silicon)"cuda"— UsesTorchCUDABackend(NVIDIA, stub)
- device{“cpu”, “mps”, “cuda”}, default
See also
TorchCPUBackendDirect CPU backend access.
TorchMPSBackendDirect MPS backend access.
TorchCUDABackendDirect CUDA backend access.
Notes
Delegation model. This class delegates all execution to the device-specific backend. It exists to provide a unified interface and for backward compatibility.
Device selection. The backend is selected at construction time based on the
deviceparameter. Device availability is validated during construction.Examples
>>> # CPU execution >>> backend = TorchBackend(device="cpu") >>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
>>> # Apple Silicon GPU >>> backend = TorchBackend(device="mps") >>> results = backend.run(sim, n_simulations=1000000, seed_seq=seed_seq)
>>> # NVIDIA GPU (CUDA 12.x with CuPy for CuRAND) >>> backend = TorchBackend(device="cuda")
- __init__(device: str = 'cpu', **device_kwargs: Any)[source]#
Initialize Torch backend with specified device.
- Parameters:
- device{“cpu”, “mps”, “cuda”}, default
"cpu" Torch device for computation.
- **device_kwargs
Any Device-specific configuration options:
CUDA options (ignored for cpu/mps):
device_id: int, default 0 — CUDA device indexuse_curand: bool, default False — Use cuRAND via CuPybatch_size: int or None — Fixed batch size (None = adaptive)use_streams: bool, default True — Enable CUDA streams
- device{“cpu”, “mps”, “cuda”}, default
- Raises:
ImportErrorIf PyTorch is not installed.
ValueErrorIf the device type is not recognized.
RuntimeErrorIf the requested device is not available.
Examples
>>> # CPU (no kwargs needed) >>> backend = TorchBackend(device="cpu")
>>> # MPS (no kwargs needed) >>> backend = TorchBackend(device="mps")
>>> # CUDA with default settings >>> backend = TorchBackend(device="cuda")
>>> # CUDA with custom settings >>> backend = TorchBackend( ... device="cuda", ... device_id=0, ... use_curand=True, ... batch_size=100_000, ... use_streams=True, ... )
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **simulation_kwargs: Any) np.ndarray[source]#
Run simulations using the device-specific Torch backend.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch = Trueand implementtorch_batch().- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations, ...).
- Raises:
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement
torch_batch().
- class mcframework.backends.torch.TorchCPUBackend[source]#
Bases:
objectTorch CPU batch execution backend.
Uses PyTorch for vectorized execution on CPU. Requires simulations to implement
torch_batch()and setsupports_batchtoTrue.Notes
RNG architecture. Uses explicit
torch.Generatorobjects seeded fromnumpy.random.SeedSequence.spawn(). This preserves:Deterministic parallel streams
Counter-based RNG (Philox) semantics
Identical statistical structure across backends
Never uses
torch.manual_seed()(global state).Examples
>>> backend = TorchCPUBackend() >>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
- __init__()[source]#
Initialize Torch CPU backend.
- Raises:
ImportErrorIf PyTorch is not installed.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#
Run simulations using Torch CPU batch execution.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch=Trueand implementtorch_batch().- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **_simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations, ...).
- Raises:
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement
torch_batch().
- class mcframework.backends.torch.TorchMPSBackend[source]#
Bases:
objectTorch MPS batch execution backend for Apple Silicon GPUs.
Uses PyTorch with MPS (Metal Performance Shaders) backend for GPU-accelerated execution on Apple Silicon Macs and leverage unified memory architecture. Requires simulations to implement
torch_batch()and setsupports_batchtoTrueto enable Metal Performance Shaders GPU-accelerated batch execution.See also
is_mps_available()Check MPS availability before instantiation.
TorchCPUBackendFallback for non-Apple systems.
Notes
RNG architecture. Uses explicit
Generatorobjects seeded fromSeedSequenceviaspawn(). This preserves:Deterministic parallel streams (best-effort on MPS)
Counter-based RNG (Philox) semantics
Correct statistical structure
Never uses
manual_seed()(global state).Dtype policy. MPS performs best with
float()(float32):Sampling uses
float()(float32) on deviceResults moved to CPU and promoted to
double()(float64).The framework converts the results to
numpy.ndarrayofnumpy.double(float64)
for stats engine compatibility.
MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to:
Metal backend scheduling variations
float32 arithmetic rounding
GPU kernel execution order
Statistical properties (mean, variance, CI coverage) remain correct despite potential bitwise differences between runs. (see
TestMPSDeterminismintests/test_torch_backend.pyfor actual tests)Examples
>>> if is_mps_available(): ... backend = TorchMPSBackend() ... results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq) ...
- __init__()[source]#
Initialize Torch MPS backend.
- Raises:
ImportErrorIf PyTorch is not installed.
RuntimeErrorIf MPS is not available on this system.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#
Run simulations using Torch MPS batch execution.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch=Trueand implementtorch_batch().- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **_simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,). Results are float64 despite MPS using float32 internally.
- Raises:
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement
torch_batch().
Notes
The dtype conversion flow is:
torch_batch()returnsfloat()(float32) on MPS device.
This ensures stats engine precision while maximizing MPS performance.
- class mcframework.backends.torch.TorchCUDABackend[source]#
Bases:
objectTorch CUDA batch execution backend for NVIDIA GPUs.
CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement
torch_batch()(orcupy_batch()for cuRAND mode) and setsupports_batch = True.- Parameters:
- device_id
int, default 0 CUDA device index to use. Use
torch.cuda.device_count()to check available devices.- use_curand
bool, defaultFalse Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement
cupy_batch().- batch_size
intorNone, defaultNone Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.
- use_streams
bool, defaultTrue Use CUDA streams for overlapped execution. Recommended for performance.
- device_id
- Attributes:
See also
is_cuda_available()Check CUDA availability before instantiation.
TorchMPSBackendApple Silicon alternative.
TorchCPUBackendCPU fallback.
Notes
RNG architecture: Uses explicit generators seeded from
numpy.random.SeedSequenceviaspawn(). Never uses global RNG state (torch.manual_seed()orcupy.random.RandomState.seed()).Adaptive batching: When
batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.Native float64: CUDA fully supports float64 tensors. If simulation’s
torch_batch()orcupy_batch()returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.CUDA streams: When
use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.Examples
>>> # Default configuration (adaptive batching, torch.Generator) >>> if is_cuda_available(): ... backend = TorchCUDABackend(device_id=0) ... results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq) ...
>>> # High-performance configuration (fixed batching, CuPy) >>> if is_cuda_available(): ... backend = TorchCUDABackend( ... device_id=0, ... use_curand=True, ... batch_size=100_000, ... use_streams=True ... ) ... results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq) ...
- __init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#
Initialize Torch CUDA backend with specified configuration.
- Parameters:
- Raises:
ImportErrorIf PyTorch is not installed, or if CuPy is required but not installed.
RuntimeErrorIf CUDA is not available or device index is invalid.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#
Run simulations using Torch CUDA batch execution with adaptive batching.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch = Trueand implementtorch_batch()(orcurand_batch()for cuRAND mode).- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **_simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,). Results are float64 regardless of internal tensor dtype.
- Raises:
AttributeErrorIf simulation class is missing ‘supports_batch’ attribute.
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement required batch method.
RuntimeErrorIf CUDA out-of-memory error occurs during execution.
Notes
Adaptive batching: When
batch_size=None(default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.
Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).
- mcframework.backends.torch.validate_torch_device(device_type: str) None[source]#
Validate that the requested Torch device is available.
- Parameters:
- device_type
str Device type to validate (
"cpu","mps","cuda").
- device_type
- Raises:
ValueErrorIf the device type is not recognized.
RuntimeErrorIf the device is not available on this system.
Examples
>>> validate_torch_device("cpu") # Always succeeds >>> validate_torch_device("mps") # Succeeds on Apple Silicon
- mcframework.backends.torch.is_mps_available() bool[source]#
Check if MPS (Metal Performance Shaders) is available.
- Returns:
boolTrue if MPS is available and PyTorch was built with MPS support.
Examples
>>> if is_mps_available(): ... backend = TorchMPSBackend()
- mcframework.backends.torch.is_cuda_available() bool[source]#
Check if CUDA is available.
- Returns:
boolTrue if CUDA is available and PyTorch was built with CUDA support.
Examples
>>> if is_cuda_available(): ... backend = TorchCUDABackend()
- mcframework.backends.torch.validate_mps_device() None[source]#
Validate that MPS device is available and usable.
- Raises:
ImportErrorIf PyTorch is not installed.
RuntimeErrorIf MPS is not available or not built into PyTorch.
Examples
>>> validate_mps_device()
- mcframework.backends.torch.validate_cuda_device(device_id: int = 0) None[source]#
Validate that CUDA device is available and usable.
- Parameters:
- device_id
int, default 0 CUDA device index to validate.
- device_id
- Raises:
ImportErrorIf PyTorch is not installed.
RuntimeErrorIf CUDA is not available or device index is invalid.
Examples
>>> validate_cuda_device() >>> validate_cuda_device(device_id=1) # Check second GPU
- mcframework.backends.torch.make_torch_generator(device: torch.device, seed_seq: np.random.SeedSequence | None) torch.Generator[source]#
Create an explicit Torch generator seeded from a SeedSequence.
This function spawns a child seed from the provided SeedSequence and uses it to initialize a Torch Generator. This preserves the hierarchical spawning model used by the NumPy backend.
- Parameters:
- device
torch.device Device for the generator (
"cpu","mps", or"cuda").- seed_seq
SeedSequenceorNone NumPy seed sequence to derive the Torch seed from.
- device
- Returns:
torch.GeneratorExplicitly seeded generator for reproducible sampling.
Notes
Why explicit generators?
torch.manual_seed()is global state that breaks parallel compositionExplicit generators enable deterministic multi-stream MC
This mirrors NumPy’s
SeedSequence.spawn()semantics
Seed derivation:
child_seed = seed_seq.spawn(1)[0] seed_int = child_seed.generate_state(1, dtype="uint64")[0] generator.manual_seed(seed_int)
This ensures each call with the same
seed_seqproduces identical results.Examples
>>> import numpy as np >>> import torch >>> seed_seq = np.random.SeedSequence(42) >>> gen = make_torch_generator(torch.device("cpu"), seed_seq)
Torch CPU Backend#
Torch CPU execution backend for Monte Carlo simulations.
This module provides:
- Classes
TorchCPUBackend— Torch-based batch execution on CPU
The CPU backend enables vectorized execution using PyTorch on CPU, providing a good balance of speed and compatibility.
Notes#
When to use CPU backend:
Baseline testing before GPU deployment
Systems without GPU acceleration
Debugging and validation
Small to medium simulation sizes
RNG discipline. Uses explicit torch.Generator objects seeded from
numpy.random.SeedSequence. Fully deterministic with same seed.
- class mcframework.backends.torch_cpu.TorchCPUBackend[source]#
Bases:
objectTorch CPU batch execution backend.
Uses PyTorch for vectorized execution on CPU. Requires simulations to implement
torch_batch()and setsupports_batchtoTrue.Notes
RNG architecture. Uses explicit
torch.Generatorobjects seeded fromnumpy.random.SeedSequence.spawn(). This preserves:Deterministic parallel streams
Counter-based RNG (Philox) semantics
Identical statistical structure across backends
Never uses
torch.manual_seed()(global state).Examples
>>> backend = TorchCPUBackend() >>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
- __init__()[source]#
Initialize Torch CPU backend.
- Raises:
ImportErrorIf PyTorch is not installed.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#
Run simulations using Torch CPU batch execution.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch=Trueand implementtorch_batch().- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **_simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations, ...).
- Raises:
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement
torch_batch().
Torch MPS Backend (Apple Silicon)#
Torch MPS (Metal Performance Shaders) backend for Apple Silicon.
This module provides:
- Classes
TorchMPSBackend— GPU-accelerated batch execution on Apple Silicon- Functions
is_mps_available()— Check MPS availabilityvalidate_mps_device()— Validate MPS is usable
The MPS backend enables GPU-accelerated Monte Carlo simulations on Apple Silicon Macs (M1/M2/M3/M4) using Metal Performance Shaders.
Notes#
MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to Metal backend scheduling and float32 arithmetic. Statistical properties (mean, variance, CI coverage) remain correct.
Dtype policy. MPS performs best with float32. Sampling uses float32, but results are promoted to float64 on CPU before returning to ensure stats engine precision.
System requirements: - macOS 12.3 (Monterey) or later - Apple Silicon (M1, M2, M3, M4 series) - PyTorch built with MPS support
- class mcframework.backends.torch_mps.TorchMPSBackend[source]#
Bases:
objectTorch MPS batch execution backend for Apple Silicon GPUs.
Uses PyTorch with MPS (Metal Performance Shaders) backend for GPU-accelerated execution on Apple Silicon Macs and leverage unified memory architecture. Requires simulations to implement
torch_batch()and setsupports_batchtoTrueto enable Metal Performance Shaders GPU-accelerated batch execution.See also
is_mps_available()Check MPS availability before instantiation.
TorchCPUBackendFallback for non-Apple systems.
Notes
RNG architecture. Uses explicit
Generatorobjects seeded fromSeedSequenceviaspawn(). This preserves:Deterministic parallel streams (best-effort on MPS)
Counter-based RNG (Philox) semantics
Correct statistical structure
Never uses
manual_seed()(global state).Dtype policy. MPS performs best with
float()(float32):Sampling uses
float()(float32) on deviceResults moved to CPU and promoted to
double()(float64).The framework converts the results to
numpy.ndarrayofnumpy.double(float64)
for stats engine compatibility.
MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to:
Metal backend scheduling variations
float32 arithmetic rounding
GPU kernel execution order
Statistical properties (mean, variance, CI coverage) remain correct despite potential bitwise differences between runs. (see
TestMPSDeterminismintests/test_torch_backend.pyfor actual tests)Examples
>>> if is_mps_available(): ... backend = TorchMPSBackend() ... results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq) ...
- __init__()[source]#
Initialize Torch MPS backend.
- Raises:
ImportErrorIf PyTorch is not installed.
RuntimeErrorIf MPS is not available on this system.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#
Run simulations using Torch MPS batch execution.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch=Trueand implementtorch_batch().- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **_simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,). Results are float64 despite MPS using float32 internally.
- Raises:
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement
torch_batch().
Notes
The dtype conversion flow is:
torch_batch()returnsfloat()(float32) on MPS device.
This ensures stats engine precision while maximizing MPS performance.
- mcframework.backends.torch_mps.is_mps_available() bool[source]#
Check if MPS (Metal Performance Shaders) is available.
- Returns:
boolTrue if MPS is available and PyTorch was built with MPS support.
Examples
>>> if is_mps_available(): ... backend = TorchMPSBackend()
- mcframework.backends.torch_mps.validate_mps_device() None[source]#
Validate that MPS device is available and usable.
- Raises:
ImportErrorIf PyTorch is not installed.
RuntimeErrorIf MPS is not available or not built into PyTorch.
Examples
>>> validate_mps_device()
Torch CUDA Backend (NVIDIA)#
Torch CUDA backend for NVIDIA GPU acceleration.
This module provides:
- Classes
TorchCUDABackend— GPU-accelerated batch execution on NVIDIA GPUs- Functions
is_cuda_available()— Check CUDA availabilityvalidate_cuda_device()— Validate CUDA is usable
Features#
Adaptive Batch Sizing: Automatically estimates optimal batch size based on available GPU memory to prevent OOM errors while maximizing throughput.
Dual RNG Modes:
- torch.Generator (default) — PyTorch’s Philox RNG, fully deterministic
- cuRAND (optional) — Native GPU RNG via CuPy, maximum performance
CUDA Optimizations: - CUDA streams for overlapped execution - Native float64 support (zero conversion overhead vs MPS) - Efficient memory management via PyTorch’s caching allocator
Defensive Validation: Comprehensive checks for supports_batch attribute
and required batch methods before execution.
Notes#
Native float64 support: Unlike MPS (Apple Silicon), CUDA fully supports float64 tensors. The backend intelligently handles both float32 and float64, promoting to float64 only when necessary.
Batch size estimation: Uses a probe run to estimate per-sample memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.
Examples#
>>> # Simple usage with defaults
>>> if is_cuda_available():
... sim.run(1_000_000, backend="torch", torch_device="cuda")
>>> # Advanced: Direct backend construction with custom settings
>>> if is_cuda_available():
... from mcframework.backends import TorchCUDABackend
... backend = TorchCUDABackend(device_id=0, batch_size=100_000, use_streams=True)
... results = backend.run(sim, n_simulations=10_000_000, seed_seq=sim.seed_seq)
...
- class mcframework.backends.torch_cuda.TorchCUDABackend[source]#
Bases:
objectTorch CUDA batch execution backend for NVIDIA GPUs.
CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement
torch_batch()(orcupy_batch()for cuRAND mode) and setsupports_batch = True.- Parameters:
- device_id
int, default 0 CUDA device index to use. Use
torch.cuda.device_count()to check available devices.- use_curand
bool, defaultFalse Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement
cupy_batch().- batch_size
intorNone, defaultNone Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.
- use_streams
bool, defaultTrue Use CUDA streams for overlapped execution. Recommended for performance.
- device_id
- Attributes:
See also
is_cuda_available()Check CUDA availability before instantiation.
TorchMPSBackendApple Silicon alternative.
TorchCPUBackendCPU fallback.
Notes
RNG architecture: Uses explicit generators seeded from
numpy.random.SeedSequenceviaspawn(). Never uses global RNG state (torch.manual_seed()orcupy.random.RandomState.seed()).Adaptive batching: When
batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.Native float64: CUDA fully supports float64 tensors. If simulation’s
torch_batch()orcupy_batch()returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.CUDA streams: When
use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.Examples
>>> # Default configuration (adaptive batching, torch.Generator) >>> if is_cuda_available(): ... backend = TorchCUDABackend(device_id=0) ... results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq) ...
>>> # High-performance configuration (fixed batching, CuPy) >>> if is_cuda_available(): ... backend = TorchCUDABackend( ... device_id=0, ... use_curand=True, ... batch_size=100_000, ... use_streams=True ... ) ... results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq) ...
- __init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#
Initialize Torch CUDA backend with specified configuration.
- Parameters:
- Raises:
ImportErrorIf PyTorch is not installed, or if CuPy is required but not installed.
RuntimeErrorIf CUDA is not available or device index is invalid.
- run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#
Run simulations using Torch CUDA batch execution with adaptive batching.
- Parameters:
- sim
MonteCarloSimulation The simulation instance to run. Must have
supports_batch = Trueand implementtorch_batch()(orcurand_batch()for cuRAND mode).- n_simulations
int Number of simulation draws to perform.
- seed_seq
SeedSequenceorNone Seed sequence for reproducible random streams.
- progress_callback
callable()orNone Optional callback
f(completed, total)for progress reporting.- **_simulation_kwargs
Any Ignored for Torch backend (batch method handles all parameters).
- sim
- Returns:
np.ndarrayArray of simulation results with shape
(n_simulations,). Results are float64 regardless of internal tensor dtype.
- Raises:
AttributeErrorIf simulation class is missing ‘supports_batch’ attribute.
ValueErrorIf the simulation does not support batch execution.
NotImplementedErrorIf the simulation does not implement required batch method.
RuntimeErrorIf CUDA out-of-memory error occurs during execution.
Notes
Adaptive batching: When
batch_size=None(default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.
Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).
- mcframework.backends.torch_cuda.is_cuda_available() bool[source]#
Check if CUDA is available.
- Returns:
boolTrue if CUDA is available and PyTorch was built with CUDA support.
Examples
>>> if is_cuda_available(): ... backend = TorchCUDABackend()
- mcframework.backends.torch_cuda.validate_cuda_device(device_id: int = 0) None[source]#
Validate that CUDA device is available and usable.
- Parameters:
- device_id
int, default 0 CUDA device index to validate.
- device_id
- Raises:
ImportErrorIf PyTorch is not installed.
RuntimeErrorIf CUDA is not available or device index is invalid.
Examples
>>> validate_cuda_device() >>> validate_cuda_device(device_id=1) # Check second GPU
See Also#
Core Module — Base simulation class and framework
Stats Engine — Statistical analysis of results
demos/demo_apple_silicon_benchmark.py— Benchmark script for Apple Silicon