Backends Module#

The backends module provides pluggable execution strategies for Monte Carlo simulations, from single-threaded CPU to GPU-accelerated batch processing.

Overview #

Backend	Class	Use Case
Sequential	`SequentialBackend`	Single-threaded, debugging, small jobs
Thread	`ThreadBackend`	NumPy-heavy code (releases GIL)
Process	`ProcessBackend`	Python-bound code, Windows
Torch CPU	`TorchCPUBackend`	Vectorized CPU batch execution
Torch MPS	`TorchMPSBackend`	Apple Silicon GPU (M1/M2/M3/M4)
Torch CUDA	`TorchCUDABackend`	NVIDIA GPU acceleration

Quick Start #

CPU Backends:

from mcframework import PiEstimationSimulation

sim = PiEstimationSimulation()
sim.set_seed(42)

# Sequential (single-threaded)
result = sim.run(10_000, backend="sequential")

# Thread-based parallelism (default on POSIX)
result = sim.run(100_000, backend="thread", n_workers=8)

# Process-based parallelism (default on Windows)
result = sim.run(100_000, backend="process", n_workers=4)

# Auto-selection based on platform and job size
result = sim.run(100_000, backend="auto")

GPU Backends (requires PyTorch):

# Torch CPU (vectorized, no GPU required)
result = sim.run(1_000_000, backend="torch", torch_device="cpu")

# Apple Silicon GPU (M1/M2/M3/M4 Macs)
result = sim.run(1_000_000, backend="torch", torch_device="mps")

# NVIDIA CUDA GPU
result = sim.run(1_000_000, backend="torch", torch_device="cuda")

CPU Backends #

SequentialBackend #

Single-threaded execution for debugging and small jobs.

SequentialBackend

Sequential (single-threaded) execution backend.

When to use:

Debugging and testing
Jobs with < 20,000 simulations
When reproducibility debugging is needed

result = sim.run(1000, backend="sequential")

ThreadBackend #

Thread-based parallelism using ThreadPoolExecutor.

ThreadBackend

Thread-based parallel execution backend.

When to use:

NumPy-heavy code that releases the GIL
POSIX systems (macOS, Linux)
When process spawn overhead is significant

result = sim.run(100_000, backend="thread", n_workers=8)

ProcessBackend #

Process-based parallelism using ProcessPoolExecutor.

ProcessBackend

Process-based parallel execution backend.

When to use:

Python-bound code that doesn’t release the GIL
Windows (threads serialize under GIL)
CPU-intensive pure Python calculations

result = sim.run(100_000, backend="process", n_workers=4)

Torch GPU Backends #

The Torch backends enable GPU-accelerated batch execution for simulations that implement the torch_batch() method.

Note

Installation: GPU backends require PyTorch. Install with:

pip install mcframework[torch]

TorchBackend (Unified)#

Factory class that auto-selects the appropriate device-specific backend.

TorchBackend

Factory class that creates and wraps the appropriate device-specific backend.

Usage:

from mcframework.backends import TorchBackend

# Auto-creates TorchCPUBackend
backend = TorchBackend(device="cpu")

# Auto-creates TorchMPSBackend (Apple Silicon)
backend = TorchBackend(device="mps")

# Auto-creates TorchCUDABackend (NVIDIA)
backend = TorchBackend(device="cuda")

# Run simulation
results = backend.run(sim, n_simulations=1_000_000, seed_seq=sim.seed_seq)

TorchCPUBackend #

Vectorized batch execution on CPU using PyTorch Tensor.

TorchCPUBackend

Torch CPU batch execution backend.

When to use:

Baseline testing before GPU deployment
Systems without GPU acceleration
Debugging vectorized code
Small to medium simulation sizes

from mcframework.backends import TorchCPUBackend

backend = TorchCPUBackend()
results = backend.run(sim, 100_000, sim.seed_seq, progress_callback=None)

TorchMPSBackend #

Apple Silicon GPU acceleration via Metal Performance Shaders (MPS).

TorchMPSBackend

Torch MPS batch execution backend for Apple Silicon GPUs.

Requirements:

macOS 12.3+ with Apple Silicon (M1/M2/M3/M4)
PyTorch with MPS support

Dtype Policy:

Metal Performance Shaders only supports up to float32 on GPU. Therefore, the framework promotes the results to float64 on CPU (see to()) for stats engine precision.

Warning

MPS Determinism Caveat

Apple’s documentation confirms the lack of float64 support: MPSDataType.

Also, other issues on other projects have reported a similar problem:

Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to Metal backend scheduling and float32 arithmetic. Statistical properties (mean, variance, CI coverage) remain correct. (see TestMPSDeterminism in tests/test_torch_backend.py for actual tests)

from mcframework.backends import TorchMPSBackend, is_mps_available

if is_mps_available():
    backend = TorchMPSBackend()
    results = backend.run(sim, 1_000_000, sim.seed_seq, None)

TorchCUDABackend #

NVIDIA GPU acceleration with adaptive batching and CUDA streams.

TorchCUDABackend

Torch CUDA batch execution backend for NVIDIA GPUs.

Features:

Adaptive batch sizing based on GPU memory
CUDA stream support for async execution
Native float64 support (no precision loss)
Optional cuRAND integration for maximum performance

Configuration Options:

Parameter	Default	Description
`device_id`	0	CUDA device index for multi-GPU systems
`use_curand`	False	Use cuRAND instead of torch.Generator
`batch_size`	None	Fixed batch size (None = adaptive)
`use_streams`	True	Enable CUDA streams for async execution

from mcframework.backends import TorchCUDABackend, is_cuda_available

if is_cuda_available():
    # Basic usage
    backend = TorchCUDABackend()

    # Advanced configuration
    backend = TorchCUDABackend(
        device_id=0,
        use_curand=False,
        batch_size=None,  # Adaptive
        use_streams=True,
    )

    results = backend.run(sim, 10_000_000, sim.seed_seq, progress_callback)

Implementing Torch Support #

To enable GPU acceleration for your simulation, implement torch_batch():

from mcframework import MonteCarloSimulation

class MySimulation(MonteCarloSimulation):
    supports_batch = True  # Required flag

    def single_simulation(self, _rng=None, **kwargs):
        rng = self._rng(_rng, self.rng)
        return float(rng.normal())

    def torch_batch(self, n, *, device, generator):
        """Vectorized Torch implementation."""
        import torch

        # Use explicit generator for reproducibility
        samples = torch.randn(n, device=device, generator=generator)

        # Return float32 for MPS compatibility
        # Framework promotes to float64 on CPU
        return samples.float()

Key Requirements:

Set supports_batch = True as a class attribute
All random sampling must use the provided generator
Never use global RNG (torch.manual_seed())
Return float32 for MPS compatibility

RNG Architecture #

The framework uses explicit PyTorch Generator objects seeded from NumPy’s SeedSequence to maintain reproducible parallel streams:

from mcframework.backends import make_torch_generator
import numpy as np

# Create seed sequence
seed_seq = np.random.SeedSequence(42)

# Create explicit generator (spawns child seed)
generator = make_torch_generator(torch.device("cpu"), seed_seq)

# Use in sampling
samples = torch.rand(1000, generator=generator)

Why explicit generators?

manual_seed() is global state that breaks parallel composition
Explicit generators enable deterministic multi-stream MC
Mirrors NumPy’s spawn() semantics

Utility Functions #

`make_blocks`	Partition an integer range \([0, n)\) into half-open blocks \((i, j)\).
`worker_run_chunk`	Execute a small batch of single simulations in a separate worker.
`is_windows_platform`	Return True when running on a Windows platform.
`validate_torch_device`	Validate that the requested Torch device is available.
`make_torch_generator`	Create an explicit Torch generator seeded from a SeedSequence.
`is_mps_available`	Check if MPS (Metal Performance Shaders) is available.
`is_cuda_available`	Check if CUDA is available.

Availability Checks:

from mcframework.backends import is_mps_available, is_cuda_available

print(f"MPS available: {is_mps_available()}")
print(f"CUDA available: {is_cuda_available()}")

Device Validation:

from mcframework.backends import validate_torch_device

validate_torch_device("cpu")   # Always passes
validate_torch_device("mps")   # Raises RuntimeError if unavailable
validate_torch_device("cuda")  # Raises RuntimeError if unavailable

Backend Protocol #

All backends implement the ExecutionBackend protocol:

ExecutionBackend

Protocol defining the interface for execution backends.

Protocol Definition:

from typing import Protocol, Callable
import numpy as np

class ExecutionBackend(Protocol):
    def run(
        self,
        sim: "MonteCarloSimulation",
        n_simulations: int,
        seed_seq: np.random.SeedSequence | None,
        progress_callback: Callable[[int, int], None] | None = None,
        **simulation_kwargs,
    ) -> np.ndarray:
        """Execute simulations and return results array."""
        ...

Note

Torch backends achieve massive speedups through vectorization, not just parallelization. The entire batch executes as tensor operations.

Module Reference #

Base Classes and Utilities #

Base classes and utilities for execution backends.

This module provides:

Protocol: ExecutionBackend — Interface for simulation execution strategies
Functions: make_blocks() — Chunking helper for parallel work distribution worker_run_chunk() — Top-level worker for process-based parallelism
Helpers: is_windows_platform() — Platform detection for backend selection

class mcframework.backends.base.ExecutionBackend[source]#

Bases: Protocol

Protocol defining the interface for execution backends.

Backends are responsible for executing simulation draws and returning results. They handle the details of sequential vs parallel execution, thread vs process pools, and progress reporting.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) → np.ndarray[source]#

Run simulation draws and return results.

Parameters:

simMonteCarloSimulation: The simulation instance to run.
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**simulation_kwargsAny: Additional keyword arguments passed to single_simulation.

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,).

__init__(*args, **kwargs)#

mcframework.backends.base.make_blocks(n: int, block_size: int = 10000) → list[tuple[int, int]][source]#

Partition an integer range \([0, n)\) into half-open blocks \((i, j)\).

Parameters:

nint: Total number of items.
block_sizeint, default: 10_000: Target block length.

Returns:

list of tuple[int, int]: List of (i, j) index pairs covering [0, n).

Examples

>>> make_blocks(5, block_size=2)
[(0, 2), (2, 4), (4, 5)]

mcframework.backends.base.worker_run_chunk(sim: MonteCarloSimulation, chunk_size: int, seed_seq: np.random.SeedSequence, simulation_kwargs: dict[str, Any]) → list[float][source]#

Execute a small batch of single simulations in a separate worker.

Parameters:

sim: Simulation instance to call (MonteCarloSimulation.single_simulation()). Must be pickleable when used with a process backend.
chunk_sizeint: Number of draws to compute in this worker.
seed_seqnumpy.random.SeedSequence: Seed sequence for creating an independent RNG stream in the worker.
simulation_kwargsdict: Keyword arguments forwarded to MonteCarloSimulation.single_simulation().

Returns:

list[float]: The simulated values.

Notes

Uses numpy.random.Philox to spawn a deterministic, independent stream per worker chunk.

mcframework.backends.base.is_windows_platform() → bool[source]#: Return True when running on a Windows platform.

Sequential Backend #

Sequential execution backend for Monte Carlo simulations.

This module provides a single-threaded execution strategy that runs simulations sequentially with optional progress reporting.

class mcframework.backends.sequential.SequentialBackend[source]#

Bases: object

Sequential (single-threaded) execution backend.

Executes simulation draws one at a time on the main thread. Suitable for small simulations or debugging.

Examples

>>> backend = SequentialBackend()
>>> results = backend.run(sim, n_simulations=1000, seed_seq=None, progress_callback=None)

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) → np.ndarray[source]#

Run simulations sequentially on a single thread.

Parameters:

simMonteCarloSimulation: The simulation instance to run.
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for creating a deterministic RNG stream. When provided, a dedicated RNG is spawned and passed to each single_simulation call via the _rng keyword, matching the reproducibility semantics of parallel backends.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**simulation_kwargsAny: Additional keyword arguments passed to single_simulation.

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,).

Parallel Backends #

Parallel execution backends for Monte Carlo simulations.

This module provides:

Classes: ThreadBackend — Thread-based parallelism using ThreadPoolExecutor ProcessBackend — Process-based parallelism using ProcessPoolExecutor

class mcframework.backends.parallel.ThreadBackend[source]#

Bases: object

Thread-based parallel execution backend.

Uses concurrent.futures.ThreadPoolExecutor for parallel execution. Effective when NumPy releases the GIL (most numerical operations).

Parameters:

n_workersint: Number of worker threads to use.
chunks_per_workerint, default 8: Number of work chunks per worker for load balancing.

Examples

>>> backend = ThreadBackend(n_workers=4)
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq, progress_callback=None)

__init__(n_workers: int, chunks_per_worker: int = 8)[source]#

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) → np.ndarray[source]#

Run simulations in parallel using threads.

Parameters:

simMonteCarloSimulation: The simulation instance to run.
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for spawning independent RNG streams per chunk.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**simulation_kwargsAny: Additional keyword arguments passed to single_simulation.

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,).

class mcframework.backends.parallel.ProcessBackend[source]#

Bases: object

Process-based parallel execution backend.

Uses concurrent.futures.ProcessPoolExecutor with spawn context for parallel execution. Required on Windows or when thread-safety is a concern.

Parameters:

n_workersint: Number of worker processes to use.
chunks_per_workerint, default 8: Number of work chunks per worker for load balancing.

Notes

The simulation instance must be pickleable for process-based execution.

Examples

>>> backend = ProcessBackend(n_workers=4)
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq, progress_callback=None)

__init__(n_workers: int, chunks_per_worker: int = 8)[source]#

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) → np.ndarray[source]#

Run simulations in parallel using processes.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must be pickleable.
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for spawning independent RNG streams per chunk.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**simulation_kwargsAny: Additional keyword arguments passed to single_simulation.

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,).

Torch Backend (Unified)#

Torch execution backend for GPU-accelerated Monte Carlo simulations.

This module provides a unified interface for Torch-based backends:

Classes

TorchBackend — Factory that selects appropriate device backend

Device-Specific Backends

TorchCPUBackend — CPU execution (torch_cpu.py) TorchMPSBackend — Apple Silicon GPU (torch_mps.py) TorchCUDABackend — NVIDIA GPU (torch_cuda.py, stub)

Utilities

validate_torch_device() — Validate device availability make_torch_generator() — Create explicit RNG generators VALID_TORCH_DEVICES — Supported device types

Device Support

cpu — Safe default, works everywhere
mps — Apple Metal Performance Shaders (M1/M2/M3/M4 Macs)
cuda — NVIDIA Compute Unified Device Architecture (CUDA 12.x with CuPy for CuRAND)

Notes#

Use TorchBackend as the main entry point—it automatically selects the appropriate device-specific backend based on the device parameter.

Example#

>>> from mcframework.backends import TorchBackend
>>> backend = TorchBackend(device="mps")  # Auto-selects TorchMPSBackend
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)

class mcframework.backends.torch.TorchBackend[source]#

Bases: object

Factory class that creates and wraps the appropriate device-specific backend.

This is a factory class that creates and wraps the appropriate device-specific backend (TorchCPUBackend, TorchMPSBackend, or TorchCUDABackend) based on the device parameter.

Parameters:

device{“cpu”, “mps”, “cuda”}, default "cpu"

Torch device for computation:

"cpu" — Uses TorchCPUBackend
"mps" — Uses TorchMPSBackend (Apple Silicon)
"cuda" — Uses TorchCUDABackend (NVIDIA, stub)

See also

TorchCPUBackend: Direct CPU backend access.
TorchMPSBackend: Direct MPS backend access.
TorchCUDABackend: Direct CUDA backend access.

Notes

Delegation model. This class delegates all execution to the device-specific backend. It provides a unified dispatch interface across CPU, MPS, and CUDA devices.

Device selection. The backend is selected at construction time based on the device parameter. Device availability is validated during construction.

Examples

>>> # CPU execution
>>> backend = TorchBackend(device="cpu")
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)

>>> # Apple Silicon GPU
>>> backend = TorchBackend(device="mps")
>>> results = backend.run(sim, n_simulations=1000000, seed_seq=seed_seq)

>>> # NVIDIA GPU (CUDA 12.x with CuPy for CuRAND)
>>> backend = TorchBackend(device="cuda")

__init__(device: str = 'cpu', **device_kwargs: Any)[source]#

Initialize Torch backend with specified device.

Parameters:

device{“cpu”, “mps”, “cuda”}, default "cpu"

Torch device for computation.

**device_kwargsAny

Device-specific configuration options:

CUDA options (ignored for cpu/mps):

device_id : int, default 0 — CUDA device index
use_curand : bool, default False — Use cuRAND via CuPy
batch_size : int or None — Fixed batch size (None = adaptive)
use_streams : bool, default True — Enable CUDA streams

Raises:

ImportError: If PyTorch is not installed.
ValueError: If the device type is not recognized.
RuntimeError: If the requested device is not available.

Examples

>>> # CPU (no kwargs needed)
>>> backend = TorchBackend(device="cpu")

>>> # MPS (no kwargs needed)
>>> backend = TorchBackend(device="mps")

>>> # CUDA with default settings
>>> backend = TorchBackend(device="cuda")

>>> # CUDA with custom settings
>>> backend = TorchBackend(
...     device="cuda",
...     device_id=0,
...     use_curand=True,
...     batch_size=100_000,
...     use_streams=True,
... )

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using the device-specific Torch backend.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch().
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations, ...).

Raises:

ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement torch_batch().

class mcframework.backends.torch.TorchCPUBackend[source]#

Bases: object

Torch CPU batch execution backend.

Uses PyTorch for vectorized execution on CPU. Requires simulations to implement torch_batch() and set supports_batch to True.

Notes

RNG architecture. Uses explicit torch.Generator objects seeded from numpy.random.SeedSequence.spawn(). This preserves:

Deterministic parallel streams
Counter-based RNG (Philox) semantics
Identical statistical structure across backends

Never uses torch.manual_seed() (global state).

Examples

>>> backend = TorchCPUBackend()
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)

__init__(max_batch_size: int | None = None)[source]#

Initialize Torch CPU backend.

Parameters:

max_batch_sizeint or None, default None: Maximum number of simulations per batch. If None the class default _MAX_BATCH (10 M) is used. Workloads larger than this are split into batches to keep memory bounded.

Raises:

ImportError: If PyTorch is not installed.

device_type: str = 'cpu'#

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch CPU batch execution.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch().
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations, ...).

Raises:

ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement torch_batch().

class mcframework.backends.torch.TorchMPSBackend[source]#

Bases: object

Torch MPS batch execution backend for Apple Silicon GPUs.

Uses PyTorch with MPS (Metal Performance Shaders) backend for GPU-accelerated execution on Apple Silicon Macs and leverage unified memory architecture. Requires simulations to implement torch_batch() and set supports_batch to True to enable Metal Performance Shaders GPU-accelerated batch execution.

See also

is_mps_available(): Check MPS availability before instantiation.
TorchCPUBackend: Fallback for non-Apple systems.

Notes

RNG architecture. Uses explicit Generator objects seeded from SeedSequence via spawn(). This preserves:

Deterministic parallel streams (best-effort on MPS)
Counter-based RNG (Philox) semantics
Correct statistical structure

Never uses manual_seed() (global state).

Dtype policy. MPS performs best with float() (float32):

Sampling uses float() (float32) on device
Results moved to CPU and promoted to double() (float64).
The framework converts the results to numpy.ndarray of numpy.double (float64)

for stats engine compatibility.

MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to:

Metal backend scheduling variations
float32 arithmetic rounding
GPU kernel execution order

Statistical properties (mean, variance, CI coverage) remain correct despite potential bitwise differences between runs (see tests/test_torch_mps.py)

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...

__init__(max_batch_size: int | None = None)[source]#

Initialize Torch MPS backend.

Parameters:

max_batch_sizeint or None, default None: Maximum number of simulations per batch. If None the class default _MAX_BATCH (10 M) is used. Workloads larger than this are split into batches to keep GPU memory bounded.

Raises:

ImportError: If PyTorch is not installed.
RuntimeError: If MPS is not available on this system.

device_type: str = 'mps'#

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch MPS batch execution.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch().
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,). Results are float64 despite MPS using float32 internally.

Raises:

ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement torch_batch().

class mcframework.backends.torch.TorchCUDABackend[source]#

Bases: object

Torch CUDA batch execution backend for NVIDIA GPUs.

CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement torch_batch() (or curand_batch() for cuRAND mode) and set supports_batch = True.

Parameters:

device_idint, default 0: CUDA device index to use. Use torch.cuda.device_count() to check available devices.
use_curandbool, default False: Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement curand_batch().
batch_sizeint or None, default None: Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.
use_streamsbool, default True: Use CUDA streams for overlapped execution. Recommended for performance.

Attributes:

device_typestr: Always "cuda".
devicetorch.device: CUDA device object for this backend.
device_idint: CUDA device index.
use_curandbool: Whether cuRAND mode is enabled.
batch_sizeint or None: Fixed batch size, or None for adaptive.
use_streamsbool: Whether CUDA streams are enabled.

See also

is_cuda_available(): Check CUDA availability before instantiation.
TorchMPSBackend: Apple Silicon alternative.
TorchCPUBackend: CPU fallback.

Notes

RNG architecture: Uses explicit generators seeded from numpy.random.SeedSequence via spawn(). Never uses global RNG state (torch.manual_seed() or cupy.random.RandomState.seed()).

Adaptive batching: When batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Native float64: CUDA fully supports float64 tensors. If simulation’s torch_batch() or curand_batch() returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.

CUDA streams: When use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.

Examples

>>> # Default configuration (adaptive batching, torch.Generator)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(device_id=0)
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...

>>> # High-performance configuration (fixed batching, CuPy)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(
...         device_id=0,
...         use_curand=True,
...         batch_size=100_000,
...         use_streams=True
...     )
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq)
...

__init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#

Initialize Torch CUDA backend with specified configuration.

Parameters:

device_idint, default 0: CUDA device index to use.
use_curandbool, default False: Use cuRAND via CuPy instead of torch.Generator.
batch_sizeint or None, default None: Fixed batch size (None = adaptive).
use_streamsbool, default True: Enable CUDA streams for overlapped execution.

Raises:

ImportError: If PyTorch is not installed, or if CuPy is required but not installed.
RuntimeError: If CUDA is not available or device index is invalid.

device_type: str = 'cuda'#

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch CUDA batch execution with adaptive batching.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch() (or curand_batch() for cuRAND mode).
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,). Results are float64 regardless of internal tensor dtype.

Raises:

AttributeError: If simulation class is missing ‘supports_batch’ attribute.
ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement required batch method.
RuntimeError: If CUDA out-of-memory error occurs during execution.

Notes

Adaptive batching: When batch_size=None (default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.

Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.

Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).

mcframework.backends.torch.validate_torch_device(device_type: str) → None[source]#

Validate that the requested Torch device is available.

Parameters:

device_typestr: Device type to validate ("cpu", "mps", "cuda").

Raises:

ValueError: If the device type is not recognized.
RuntimeError: If the device is not available on this system.

Examples

>>> validate_torch_device("cpu")  # Always succeeds
>>> validate_torch_device("mps")  # Succeeds on Apple Silicon

mcframework.backends.torch.is_mps_available() → bool[source]#

Check if MPS (Metal Performance Shaders) is available.

Returns:

bool: True if MPS is available and PyTorch was built with MPS support.

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()

mcframework.backends.torch.is_cuda_available() → bool[source]#

Check if CUDA is available.

Returns:

bool: True if CUDA is available and PyTorch was built with CUDA support.

Examples

>>> if is_cuda_available():
...     backend = TorchCUDABackend()

mcframework.backends.torch.validate_mps_device() → None[source]#

Validate that MPS device is available and usable.

Raises:

ImportError: If PyTorch is not installed.
RuntimeError: If MPS is not available or not built into PyTorch.

Examples

>>> validate_mps_device()

mcframework.backends.torch.validate_cuda_device(device_id: int = 0) → None[source]#

Validate that CUDA device is available and usable.

Parameters:

device_idint, default 0: CUDA device index to validate.

Raises:

ImportError: If PyTorch is not installed.
RuntimeError: If CUDA is not available or device index is invalid.

Examples

>>> validate_cuda_device()
>>> validate_cuda_device(device_id=1)  # Check second GPU

mcframework.backends.torch.make_torch_generator(device: torch.device, seed_seq: np.random.SeedSequence | None) → torch.Generator[source]#

Create an explicit Torch generator seeded from a SeedSequence.

This function spawns a child seed from the provided SeedSequence and uses it to initialize a Torch Generator. This preserves the hierarchical spawning model used by the NumPy backend.

Parameters:

devicetorch.device: Device for the generator ("cpu", "mps", or "cuda").
seed_seqSeedSequence or None: NumPy seed sequence to derive the Torch seed from.

Returns:

torch.Generator: Explicitly seeded generator for reproducible sampling.

Notes

Why explicit generators?

torch.manual_seed() is global state that breaks parallel composition
Explicit generators enable deterministic multi-stream MC
This mirrors NumPy’s SeedSequence.spawn() semantics

Seed derivation:

child_seed = seed_seq.spawn(1)[0]
seed_int = child_seed.generate_state(1, dtype="uint64")[0]
generator.manual_seed(seed_int)

This ensures each call with the same seed_seq produces identical results.

Examples

>>> import numpy as np
>>> import torch
>>> seed_seq = np.random.SeedSequence(42)
>>> gen = make_torch_generator(torch.device("cpu"), seed_seq)

Torch CPU Backend #

Torch CPU execution backend for Monte Carlo simulations.

This module provides:

Classes: TorchCPUBackend — Torch-based batch execution on CPU

The CPU backend enables vectorized execution using PyTorch on CPU, providing a good balance of speed and compatibility.

Notes#

When to use CPU backend:

Baseline testing before GPU deployment
Systems without GPU acceleration
Debugging and validation
Small to medium simulation sizes

RNG discipline. Uses explicit torch.Generator objects seeded from numpy.random.SeedSequence. Fully deterministic with same seed.

class mcframework.backends.torch_cpu.TorchCPUBackend[source]#

Bases: object

Torch CPU batch execution backend.

Uses PyTorch for vectorized execution on CPU. Requires simulations to implement torch_batch() and set supports_batch to True.

Notes

RNG architecture. Uses explicit torch.Generator objects seeded from numpy.random.SeedSequence.spawn(). This preserves:

Deterministic parallel streams
Counter-based RNG (Philox) semantics
Identical statistical structure across backends

Never uses torch.manual_seed() (global state).

Examples

>>> backend = TorchCPUBackend()
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)

device_type: str = 'cpu'#

__init__(max_batch_size: int | None = None)[source]#

Initialize Torch CPU backend.

Parameters:

max_batch_sizeint or None, default None: Maximum number of simulations per batch. If None the class default _MAX_BATCH (10 M) is used. Workloads larger than this are split into batches to keep memory bounded.

Raises:

ImportError: If PyTorch is not installed.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch CPU batch execution.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch().
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations, ...).

Raises:

ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement torch_batch().

Torch MPS Backend (Apple Silicon)#

Torch MPS (Metal Performance Shaders) backend for Apple Silicon.

This module provides:

Classes: TorchMPSBackend — GPU-accelerated batch execution on Apple Silicon
Functions: is_mps_available() — Check MPS availability validate_mps_device() — Validate MPS is usable

The MPS backend enables GPU-accelerated Monte Carlo simulations on Apple Silicon Macs (M1/M2/M3/M4) using Metal Performance Shaders.

Notes#

MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to Metal backend scheduling and float32 arithmetic. Statistical properties (mean, variance, CI coverage) remain correct.

Dtype policy. MPS performs best with float32. Sampling uses float32, but results are promoted to float64 on CPU before returning to ensure stats engine precision.

System requirements: - macOS 12.3 (Monterey) or later - Apple Silicon (M1, M2, M3, M4 series) - PyTorch built with MPS support

class mcframework.backends.torch_mps.TorchMPSBackend[source]#

Bases: object

Torch MPS batch execution backend for Apple Silicon GPUs.

Uses PyTorch with MPS (Metal Performance Shaders) backend for GPU-accelerated execution on Apple Silicon Macs and leverage unified memory architecture. Requires simulations to implement torch_batch() and set supports_batch to True to enable Metal Performance Shaders GPU-accelerated batch execution.

See also

is_mps_available(): Check MPS availability before instantiation.
TorchCPUBackend: Fallback for non-Apple systems.

Notes

RNG architecture. Uses explicit Generator objects seeded from SeedSequence via spawn(). This preserves:

Deterministic parallel streams (best-effort on MPS)
Counter-based RNG (Philox) semantics
Correct statistical structure

Never uses manual_seed() (global state).

Dtype policy. MPS performs best with float() (float32):

Sampling uses float() (float32) on device
Results moved to CPU and promoted to double() (float64).
The framework converts the results to numpy.ndarray of numpy.double (float64)

for stats engine compatibility.

MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to:

Metal backend scheduling variations
float32 arithmetic rounding
GPU kernel execution order

Statistical properties (mean, variance, CI coverage) remain correct despite potential bitwise differences between runs (see tests/test_torch_mps.py)

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...

device_type: str = 'mps'#

__init__(max_batch_size: int | None = None)[source]#

Initialize Torch MPS backend.

Parameters:

max_batch_sizeint or None, default None: Maximum number of simulations per batch. If None the class default _MAX_BATCH (10 M) is used. Workloads larger than this are split into batches to keep GPU memory bounded.

Raises:

ImportError: If PyTorch is not installed.
RuntimeError: If MPS is not available on this system.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch MPS batch execution.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch().
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,). Results are float64 despite MPS using float32 internally.

Raises:

ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement torch_batch().

mcframework.backends.torch_mps.is_mps_available() → bool[source]#

Check if MPS (Metal Performance Shaders) is available.

Returns:

bool: True if MPS is available and PyTorch was built with MPS support.

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()

mcframework.backends.torch_mps.validate_mps_device() → None[source]#

Validate that MPS device is available and usable.

Raises:

ImportError: If PyTorch is not installed.
RuntimeError: If MPS is not available or not built into PyTorch.

Examples

>>> validate_mps_device()

Torch CUDA Backend (NVIDIA)#

Torch CUDA backend for NVIDIA GPU acceleration.

This module provides:

Classes: TorchCUDABackend — GPU-accelerated batch execution on NVIDIA GPUs
Functions: is_cuda_available() — Check CUDA availability validate_cuda_device() — Validate CUDA is usable

Features#

Adaptive Batch Sizing: Automatically estimates optimal batch size based on available GPU memory to prevent OOM errors while maximizing throughput.

Dual RNG Modes: - torch.Generator (default) — PyTorch’s Philox RNG, fully deterministic - cuRAND (optional) — Native GPU RNG via CuPy, maximum performance

CUDA Optimizations: - CUDA streams for overlapped execution - Native float64 support (zero conversion overhead vs MPS) - Efficient memory management via PyTorch’s caching allocator

Defensive Validation: Comprehensive checks for supports_batch attribute and required batch methods before execution.

Notes#

Native float64 support: Unlike MPS (Apple Silicon), CUDA fully supports float64 tensors. The backend intelligently handles both float32 and float64, promoting to float64 only when necessary.

Batch size estimation: Uses a probe run to estimate per-sample memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Examples#

>>> # Simple usage with defaults
>>> if is_cuda_available():
...     sim.run(1_000_000, backend="torch", torch_device="cuda")

>>> # Advanced: Direct backend construction with custom settings
>>> if is_cuda_available():
...     from mcframework.backends import TorchCUDABackend
...     backend = TorchCUDABackend(device_id=0, batch_size=100_000, use_streams=True)
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=sim.seed_seq)
...

class mcframework.backends.torch_cuda.TorchCUDABackend[source]#

Bases: object

Torch CUDA batch execution backend for NVIDIA GPUs.

CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement torch_batch() (or curand_batch() for cuRAND mode) and set supports_batch = True.

Parameters:

device_idint, default 0: CUDA device index to use. Use torch.cuda.device_count() to check available devices.
use_curandbool, default False: Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement curand_batch().
batch_sizeint or None, default None: Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.
use_streamsbool, default True: Use CUDA streams for overlapped execution. Recommended for performance.

Attributes:

device_typestr: Always "cuda".
devicetorch.device: CUDA device object for this backend.
device_idint: CUDA device index.
use_curandbool: Whether cuRAND mode is enabled.
batch_sizeint or None: Fixed batch size, or None for adaptive.
use_streamsbool: Whether CUDA streams are enabled.

See also

is_cuda_available(): Check CUDA availability before instantiation.
TorchMPSBackend: Apple Silicon alternative.
TorchCPUBackend: CPU fallback.

Notes

RNG architecture: Uses explicit generators seeded from numpy.random.SeedSequence via spawn(). Never uses global RNG state (torch.manual_seed() or cupy.random.RandomState.seed()).

Adaptive batching: When batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Native float64: CUDA fully supports float64 tensors. If simulation’s torch_batch() or curand_batch() returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.

CUDA streams: When use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.

Examples

>>> # Default configuration (adaptive batching, torch.Generator)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(device_id=0)
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...

>>> # High-performance configuration (fixed batching, CuPy)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(
...         device_id=0,
...         use_curand=True,
...         batch_size=100_000,
...         use_streams=True
...     )
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq)
...

device_type: str = 'cuda'#

__init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#

Initialize Torch CUDA backend with specified configuration.

Parameters:

device_idint, default 0: CUDA device index to use.
use_curandbool, default False: Use cuRAND via CuPy instead of torch.Generator.
batch_sizeint or None, default None: Fixed batch size (None = adaptive).
use_streamsbool, default True: Enable CUDA streams for overlapped execution.

Raises:

ImportError: If PyTorch is not installed, or if CuPy is required but not installed.
RuntimeError: If CUDA is not available or device index is invalid.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch CUDA batch execution with adaptive batching.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch() (or curand_batch() for cuRAND mode).
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,). Results are float64 regardless of internal tensor dtype.

Raises:

AttributeError: If simulation class is missing ‘supports_batch’ attribute.
ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement required batch method.
RuntimeError: If CUDA out-of-memory error occurs during execution.

Notes

Adaptive batching: When batch_size=None (default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.

Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.

Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).

mcframework.backends.torch_cuda.is_cuda_available() → bool[source]#

Check if CUDA is available.

Returns:

bool: True if CUDA is available and PyTorch was built with CUDA support.

Examples

>>> if is_cuda_available():
...     backend = TorchCUDABackend()

mcframework.backends.torch_cuda.validate_cuda_device(device_id: int = 0) → None[source]#

Validate that CUDA device is available and usable.

Parameters:

device_idint, default 0: CUDA device index to validate.

Raises:

ImportError: If PyTorch is not installed.
RuntimeError: If CUDA is not available or device index is invalid.

Examples

>>> validate_cuda_device()
>>> validate_cuda_device(device_id=1)  # Check second GPU