Backends Module#

The backends module provides pluggable execution strategies for Monte Carlo simulations, from single-threaded CPU to GPU-accelerated batch processing.


Overview#

Backend

Class

Use Case

Sequential

SequentialBackend

Single-threaded, debugging, small jobs

Thread

ThreadBackend

NumPy-heavy code (releases GIL)

Process

ProcessBackend

Python-bound code, Windows

Torch CPU

TorchCPUBackend

Vectorized CPU batch execution

Torch MPS

TorchMPSBackend

Apple Silicon GPU (M1/M2/M3/M4)

Torch CUDA

TorchCUDABackend

NVIDIA GPU acceleration


Quick Start#

CPU Backends:

from mcframework import PiEstimationSimulation

sim = PiEstimationSimulation()
sim.set_seed(42)

# Sequential (single-threaded)
result = sim.run(10_000, backend="sequential")

# Thread-based parallelism (default on POSIX)
result = sim.run(100_000, backend="thread", n_workers=8)

# Process-based parallelism (default on Windows)
result = sim.run(100_000, backend="process", n_workers=4)

# Auto-selection based on platform and job size
result = sim.run(100_000, backend="auto")

GPU Backends (requires PyTorch):

# Torch CPU (vectorized, no GPU required)
result = sim.run(1_000_000, backend="torch", torch_device="cpu")

# Apple Silicon GPU (M1/M2/M3/M4 Macs)
result = sim.run(1_000_000, backend="torch", torch_device="mps")

# NVIDIA CUDA GPU
result = sim.run(1_000_000, backend="torch", torch_device="cuda")

CPU Backends#

SequentialBackend#

Single-threaded execution for debugging and small jobs.

SequentialBackend

Sequential (single-threaded) execution backend.

When to use:

  • Debugging and testing

  • Jobs with < 20,000 simulations

  • When reproducibility debugging is needed

result = sim.run(1000, backend="sequential")

ThreadBackend#

Thread-based parallelism using ThreadPoolExecutor.

ThreadBackend

Thread-based parallel execution backend.

When to use:

  • NumPy-heavy code that releases the GIL

  • POSIX systems (macOS, Linux)

  • When process spawn overhead is significant

result = sim.run(100_000, backend="thread", n_workers=8)

ProcessBackend#

Process-based parallelism using ProcessPoolExecutor.

ProcessBackend

Process-based parallel execution backend.

When to use:

  • Python-bound code that doesn’t release the GIL

  • Windows (threads serialize under GIL)

  • CPU-intensive pure Python calculations

result = sim.run(100_000, backend="process", n_workers=4)

Torch GPU Backends#

The Torch backends enable GPU-accelerated batch execution for simulations that implement the torch_batch() method.

Note

Installation: GPU backends require PyTorch. Install with:

pip install mcframework[gpu]

TorchBackend (Unified)#

Factory class that auto-selects the appropriate device-specific backend.

TorchBackend

Factory class that creates and wraps the appropriate device-specific backend.

Usage:

from mcframework.backends import TorchBackend

# Auto-creates TorchCPUBackend
backend = TorchBackend(device="cpu")

# Auto-creates TorchMPSBackend (Apple Silicon)
backend = TorchBackend(device="mps")

# Auto-creates TorchCUDABackend (NVIDIA)
backend = TorchBackend(device="cuda")

# Run simulation
results = backend.run(sim, n_simulations=1_000_000, seed_seq=sim.seed_seq)

TorchCPUBackend#

Vectorized batch execution on CPU using PyTorch Tensor.

TorchCPUBackend

Torch CPU batch execution backend.

When to use:

  • Baseline testing before GPU deployment

  • Systems without GPU acceleration

  • Debugging vectorized code

  • Small to medium simulation sizes

from mcframework.backends import TorchCPUBackend

backend = TorchCPUBackend()
results = backend.run(sim, 100_000, sim.seed_seq, progress_callback=None)

TorchMPSBackend#

Apple Silicon GPU acceleration via Metal Performance Shaders (MPS).

TorchMPSBackend

Torch MPS batch execution backend for Apple Silicon GPUs.

Requirements:

  • macOS 12.3+ with Apple Silicon (M1/M2/M3/M4)

  • PyTorch with MPS support

Dtype Policy:

Metal Performance Shaders only supports up to float32 on GPU. Therefore, the framework promotes the results to float64 on CPU (see to()) for stats engine precision.

Warning

MPS Determinism Caveat

Apple’s documentation confirms the lack of float64 support: MPSDataType.

Also, other issues on other projects have reported a similar problem:

Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to Metal backend scheduling and float32 arithmetic. Statistical properties (mean, variance, CI coverage) remain correct. (see TestMPSDeterminism in tests/test_torch_backend.py for actual tests)

from mcframework.backends import TorchMPSBackend, is_mps_available

if is_mps_available():
    backend = TorchMPSBackend()
    results = backend.run(sim, 1_000_000, sim.seed_seq, None)

TorchCUDABackend#

NVIDIA GPU acceleration with adaptive batching and CUDA streams.

TorchCUDABackend

Torch CUDA batch execution backend for NVIDIA GPUs.

Features:

  • Adaptive batch sizing based on GPU memory

  • CUDA stream support for async execution

  • Native float64 support (no precision loss)

  • Optional cuRAND integration for maximum performance

Configuration Options:

Parameter

Default

Description

device_id

0

CUDA device index for multi-GPU systems

use_curand

False

Use cuRAND instead of torch.Generator

batch_size

None

Fixed batch size (None = adaptive)

use_streams

True

Enable CUDA streams for async execution

from mcframework.backends import TorchCUDABackend, is_cuda_available

if is_cuda_available():
    # Basic usage
    backend = TorchCUDABackend()

    # Advanced configuration
    backend = TorchCUDABackend(
        device_id=0,
        use_curand=False,
        batch_size=None,  # Adaptive
        use_streams=True,
    )

    results = backend.run(sim, 10_000_000, sim.seed_seq, progress_callback)

Implementing Torch Support#

To enable GPU acceleration for your simulation, implement torch_batch():

from mcframework import MonteCarloSimulation

class MySimulation(MonteCarloSimulation):
    supports_batch = True  # Required flag

    def single_simulation(self, _rng=None, **kwargs):
        rng = self._rng(_rng, self.rng)
        return float(rng.normal())

    def torch_batch(self, n, *, device, generator):
        """Vectorized Torch implementation."""
        import torch

        # Use explicit generator for reproducibility
        samples = torch.randn(n, device=device, generator=generator)

        # Return float32 for MPS compatibility
        # Framework promotes to float64 on CPU
        return samples.float()

Key Requirements:

  1. Set supports_batch = True as a class attribute

  2. All random sampling must use the provided generator

  3. Never use global RNG (torch.manual_seed())

  4. Return float32 for MPS compatibility


RNG Architecture#

The framework uses explicit PyTorch Generator objects seeded from NumPy’s SeedSequence to maintain reproducible parallel streams:

from mcframework.backends import make_torch_generator
import numpy as np

# Create seed sequence
seed_seq = np.random.SeedSequence(42)

# Create explicit generator (spawns child seed)
generator = make_torch_generator(torch.device("cpu"), seed_seq)

# Use in sampling
samples = torch.rand(1000, generator=generator)

Why explicit generators?

  • manual_seed() is global state that breaks parallel composition

  • Explicit generators enable deterministic multi-stream MC

  • Mirrors NumPy’s spawn() semantics


Utility Functions#

make_blocks

Partition an integer range \([0, n)\) into half-open blocks \((i, j)\).

worker_run_chunk

Execute a small batch of single simulations in a separate worker.

is_windows_platform

Return True when running on a Windows platform.

validate_torch_device

Validate that the requested Torch device is available.

make_torch_generator

Create an explicit Torch generator seeded from a SeedSequence.

is_mps_available

Check if MPS (Metal Performance Shaders) is available.

is_cuda_available

Check if CUDA is available.

Availability Checks:

from mcframework.backends import is_mps_available, is_cuda_available

print(f"MPS available: {is_mps_available()}")
print(f"CUDA available: {is_cuda_available()}")

Device Validation:

from mcframework.backends import validate_torch_device

validate_torch_device("cpu")   # Always passes
validate_torch_device("mps")   # Raises RuntimeError if unavailable
validate_torch_device("cuda")  # Raises RuntimeError if unavailable

Backend Protocol#

All backends implement the ExecutionBackend protocol:

ExecutionBackend

Protocol defining the interface for execution backends.

Protocol Definition:

from typing import Protocol, Callable
import numpy as np

class ExecutionBackend(Protocol):
    def run(
        self,
        sim: "MonteCarloSimulation",
        n_simulations: int,
        seed_seq: np.random.SeedSequence | None,
        progress_callback: Callable[[int, int], None] | None = None,
        **simulation_kwargs,
    ) -> np.ndarray:
        """Execute simulations and return results array."""
        ...

Note

Torch backends achieve massive speedups through vectorization, not just parallelization. The entire batch executes as tensor operations.


Module Reference#

Base Classes and Utilities#

Base classes and utilities for execution backends.

This module provides:

Protocol

ExecutionBackend — Interface for simulation execution strategies

Functions

make_blocks() — Chunking helper for parallel work distribution worker_run_chunk() — Top-level worker for process-based parallelism

Helpers

is_windows_platform() — Platform detection for backend selection

class mcframework.backends.base.ExecutionBackend[source]#

Bases: Protocol

Protocol defining the interface for execution backends.

Backends are responsible for executing simulation draws and returning results. They handle the details of sequential vs parallel execution, thread vs process pools, and progress reporting.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#

Run simulation draws and return results.

Parameters:
simMonteCarloSimulation

The simulation instance to run.

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**simulation_kwargsAny

Additional keyword arguments passed to single_simulation.

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,).

__init__(*args, **kwargs)#
mcframework.backends.base.make_blocks(n: int, block_size: int = 10000) list[tuple[int, int]][source]#

Partition an integer range \([0, n)\) into half-open blocks \((i, j)\).

Parameters:
nint

Total number of items.

block_sizeint, default: 10_000

Target block length.

Returns:
list of tuple[int, int]

List of (i, j) index pairs covering [0, n).

Examples

>>> make_blocks(5, block_size=2)
[(0, 2), (2, 4), (4, 5)]
mcframework.backends.base.worker_run_chunk(sim: MonteCarloSimulation, chunk_size: int, seed_seq: np.random.SeedSequence, simulation_kwargs: dict[str, Any]) list[float][source]#

Execute a small batch of single simulations in a separate worker.

Parameters:
sim

Simulation instance to call (MonteCarloSimulation.single_simulation()). Must be pickleable when used with a process backend.

chunk_sizeint

Number of draws to compute in this worker.

seed_seqnumpy.random.SeedSequence

Seed sequence for creating an independent RNG stream in the worker.

simulation_kwargsdict

Keyword arguments forwarded to MonteCarloSimulation.single_simulation().

Returns:
list[float]

The simulated values.

Notes

Uses numpy.random.Philox to spawn a deterministic, independent stream per worker chunk.

mcframework.backends.base.is_windows_platform() bool[source]#

Return True when running on a Windows platform.

Sequential Backend#

Sequential execution backend for Monte Carlo simulations.

This module provides a single-threaded execution strategy that runs simulations sequentially with optional progress reporting.

class mcframework.backends.sequential.SequentialBackend[source]#

Bases: object

Sequential (single-threaded) execution backend.

Executes simulation draws one at a time on the main thread. Suitable for small simulations or debugging.

Examples

>>> backend = SequentialBackend()
>>> results = backend.run(sim, n_simulations=1000, seed_seq=None, progress_callback=None)
run(sim: MonteCarloSimulation, n_simulations: int, _seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#

Run simulations sequentially on a single thread.

Parameters:
simMonteCarloSimulation

The simulation instance to run.

n_simulationsint

Number of simulation draws to perform.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**simulation_kwargsAny

Additional keyword arguments passed to single_simulation.

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,).

Parallel Backends#

Parallel execution backends for Monte Carlo simulations.

This module provides:

Classes

ThreadBackend — Thread-based parallelism using ThreadPoolExecutor ProcessBackend — Process-based parallelism using ProcessPoolExecutor

class mcframework.backends.parallel.ThreadBackend[source]#

Bases: object

Thread-based parallel execution backend.

Uses concurrent.futures.ThreadPoolExecutor for parallel execution. Effective when NumPy releases the GIL (most numerical operations).

Parameters:
n_workersint

Number of worker threads to use.

chunks_per_workerint, default 8

Number of work chunks per worker for load balancing.

Examples

>>> backend = ThreadBackend(n_workers=4)
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq, progress_callback=None)
__init__(n_workers: int, chunks_per_worker: int = 8)[source]#
run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#

Run simulations in parallel using threads.

Parameters:
simMonteCarloSimulation

The simulation instance to run.

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for spawning independent RNG streams per chunk.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**simulation_kwargsAny

Additional keyword arguments passed to single_simulation.

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,).

class mcframework.backends.parallel.ProcessBackend[source]#

Bases: object

Process-based parallel execution backend.

Uses concurrent.futures.ProcessPoolExecutor with spawn context for parallel execution. Required on Windows or when thread-safety is a concern.

Parameters:
n_workersint

Number of worker processes to use.

chunks_per_workerint, default 8

Number of work chunks per worker for load balancing.

Notes

The simulation instance must be pickleable for process-based execution.

Examples

>>> backend = ProcessBackend(n_workers=4)
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq, progress_callback=None)
__init__(n_workers: int, chunks_per_worker: int = 8)[source]#
run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None, **simulation_kwargs: Any) np.ndarray[source]#

Run simulations in parallel using processes.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must be pickleable.

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for spawning independent RNG streams per chunk.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**simulation_kwargsAny

Additional keyword arguments passed to single_simulation.

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,).

Torch Backend (Unified)#

Torch execution backend for GPU-accelerated Monte Carlo simulations.

This module provides a unified interface for Torch-based backends:

Classes

TorchBackend — Factory that selects appropriate device backend

Device-Specific Backends

TorchCPUBackend — CPU execution (torch_cpu.py) TorchMPSBackend — Apple Silicon GPU (torch_mps.py) TorchCUDABackend — NVIDIA GPU (torch_cuda.py, stub)

Utilities

validate_torch_device() — Validate device availability make_torch_generator() — Create explicit RNG generators VALID_TORCH_DEVICES — Supported device types

Device Support
  • cpu — Safe default, works everywhere

  • mps — Apple Metal Performance Shaders (M1/M2/M3/M4 Macs)

  • cuda — NVIDIA Compute Unified Device Architecture (CUDA 12.x with CuPy for CuRAND)

Notes#

Use TorchBackend as the main entry point—it automatically selects the appropriate device-specific backend based on the device parameter.

Example#

>>> from mcframework.backends import TorchBackend
>>> backend = TorchBackend(device="mps")  # Auto-selects TorchMPSBackend
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
class mcframework.backends.torch.TorchBackend[source]#

Bases: object

Factory class that creates and wraps the appropriate device-specific backend.

This is a factory class that creates and wraps the appropriate device-specific backend (TorchCPUBackend, TorchMPSBackend, or TorchCUDABackend) based on the device parameter.

Parameters:
device{“cpu”, “mps”, “cuda”}, default "cpu"

Torch device for computation:

See also

TorchCPUBackend

Direct CPU backend access.

TorchMPSBackend

Direct MPS backend access.

TorchCUDABackend

Direct CUDA backend access.

Notes

Delegation model. This class delegates all execution to the device-specific backend. It exists to provide a unified interface and for backward compatibility.

Device selection. The backend is selected at construction time based on the device parameter. Device availability is validated during construction.

Examples

>>> # CPU execution
>>> backend = TorchBackend(device="cpu")
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
>>> # Apple Silicon GPU
>>> backend = TorchBackend(device="mps")
>>> results = backend.run(sim, n_simulations=1000000, seed_seq=seed_seq)
>>> # NVIDIA GPU (CUDA 12.x with CuPy for CuRAND)
>>> backend = TorchBackend(device="cuda")
__init__(device: str = 'cpu', **device_kwargs: Any)[source]#

Initialize Torch backend with specified device.

Parameters:
device{“cpu”, “mps”, “cuda”}, default "cpu"

Torch device for computation.

**device_kwargsAny

Device-specific configuration options:

CUDA options (ignored for cpu/mps):

  • device_id : int, default 0 — CUDA device index

  • use_curand : bool, default False — Use cuRAND via CuPy

  • batch_size : int or None — Fixed batch size (None = adaptive)

  • use_streams : bool, default True — Enable CUDA streams

Raises:
ImportError

If PyTorch is not installed.

ValueError

If the device type is not recognized.

RuntimeError

If the requested device is not available.

Examples

>>> # CPU (no kwargs needed)
>>> backend = TorchBackend(device="cpu")
>>> # MPS (no kwargs needed)
>>> backend = TorchBackend(device="mps")
>>> # CUDA with default settings
>>> backend = TorchBackend(device="cuda")
>>> # CUDA with custom settings
>>> backend = TorchBackend(
...     device="cuda",
...     device_id=0,
...     use_curand=True,
...     batch_size=100_000,
...     use_streams=True,
... )
run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **simulation_kwargs: Any) np.ndarray[source]#

Run simulations using the device-specific Torch backend.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch().

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations, ...).

Raises:
ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement torch_batch().

class mcframework.backends.torch.TorchCPUBackend[source]#

Bases: object

Torch CPU batch execution backend.

Uses PyTorch for vectorized execution on CPU. Requires simulations to implement torch_batch() and set supports_batch to True.

Notes

RNG architecture. Uses explicit torch.Generator objects seeded from numpy.random.SeedSequence.spawn(). This preserves:

  • Deterministic parallel streams

  • Counter-based RNG (Philox) semantics

  • Identical statistical structure across backends

Never uses torch.manual_seed() (global state).

Examples

>>> backend = TorchCPUBackend()
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
__init__()[source]#

Initialize Torch CPU backend.

Raises:
ImportError

If PyTorch is not installed.

device_type: str = 'cpu'#
run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch CPU batch execution.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch().

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations, ...).

Raises:
ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement torch_batch().

class mcframework.backends.torch.TorchMPSBackend[source]#

Bases: object

Torch MPS batch execution backend for Apple Silicon GPUs.

Uses PyTorch with MPS (Metal Performance Shaders) backend for GPU-accelerated execution on Apple Silicon Macs and leverage unified memory architecture. Requires simulations to implement torch_batch() and set supports_batch to True to enable Metal Performance Shaders GPU-accelerated batch execution.

See also

is_mps_available()

Check MPS availability before instantiation.

TorchCPUBackend

Fallback for non-Apple systems.

Notes

RNG architecture. Uses explicit Generator objects seeded from SeedSequence via spawn(). This preserves:

  • Deterministic parallel streams (best-effort on MPS)

  • Counter-based RNG (Philox) semantics

  • Correct statistical structure

Never uses manual_seed() (global state).

Dtype policy. MPS performs best with float() (float32):

for stats engine compatibility.

MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to:

  • Metal backend scheduling variations

  • float32 arithmetic rounding

  • GPU kernel execution order

Statistical properties (mean, variance, CI coverage) remain correct despite potential bitwise differences between runs. (see TestMPSDeterminism in tests/test_torch_backend.py for actual tests)

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...
__init__()[source]#

Initialize Torch MPS backend.

Raises:
ImportError

If PyTorch is not installed.

RuntimeError

If MPS is not available on this system.

device_type: str = 'mps'#
run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch MPS batch execution.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch().

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,). Results are float64 despite MPS using float32 internally.

Raises:
ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement torch_batch().

Notes

The dtype conversion flow is:

  1. torch_batch() returns float() (float32) on MPS device.

  2. Tensor moved to CPU via detach() and cpu()

  3. Promoted to double() (float64) via to()

  4. Converted to ndarray of double (float64) via numpy()

This ensures stats engine precision while maximizing MPS performance.

class mcframework.backends.torch.TorchCUDABackend[source]#

Bases: object

Torch CUDA batch execution backend for NVIDIA GPUs.

CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement torch_batch() (or cupy_batch() for cuRAND mode) and set supports_batch = True.

Parameters:
device_idint, default 0

CUDA device index to use. Use torch.cuda.device_count() to check available devices.

use_curandbool, default False

Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement cupy_batch().

batch_sizeint or None, default None

Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.

use_streamsbool, default True

Use CUDA streams for overlapped execution. Recommended for performance.

Attributes:
device_typestr

Always "cuda".

devicetorch.device

CUDA device object for this backend.

device_idint

CUDA device index.

use_curandbool

Whether cuRAND mode is enabled.

batch_sizeint or None

Fixed batch size, or None for adaptive.

use_streamsbool

Whether CUDA streams are enabled.

See also

is_cuda_available()

Check CUDA availability before instantiation.

TorchMPSBackend

Apple Silicon alternative.

TorchCPUBackend

CPU fallback.

Notes

RNG architecture: Uses explicit generators seeded from numpy.random.SeedSequence via spawn(). Never uses global RNG state (torch.manual_seed() or cupy.random.RandomState.seed()).

Adaptive batching: When batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Native float64: CUDA fully supports float64 tensors. If simulation’s torch_batch() or cupy_batch() returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.

CUDA streams: When use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.

Examples

>>> # Default configuration (adaptive batching, torch.Generator)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(device_id=0)
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...
>>> # High-performance configuration (fixed batching, CuPy)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(
...         device_id=0,
...         use_curand=True,
...         batch_size=100_000,
...         use_streams=True
...     )
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq)
...
__init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#

Initialize Torch CUDA backend with specified configuration.

Parameters:
device_idint, default 0

CUDA device index to use.

use_curandbool, default False

Use cuRAND via CuPy instead of torch.Generator.

batch_sizeint or None, default None

Fixed batch size (None = adaptive).

use_streamsbool, default True

Enable CUDA streams for overlapped execution.

Raises:
ImportError

If PyTorch is not installed, or if CuPy is required but not installed.

RuntimeError

If CUDA is not available or device index is invalid.

device_type: str = 'cuda'#
run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch CUDA batch execution with adaptive batching.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch() (or curand_batch() for cuRAND mode).

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,). Results are float64 regardless of internal tensor dtype.

Raises:
AttributeError

If simulation class is missing ‘supports_batch’ attribute.

ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement required batch method.

RuntimeError

If CUDA out-of-memory error occurs during execution.

Notes

Adaptive batching: When batch_size=None (default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.

Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.

Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).

mcframework.backends.torch.validate_torch_device(device_type: str) None[source]#

Validate that the requested Torch device is available.

Parameters:
device_typestr

Device type to validate ("cpu", "mps", "cuda").

Raises:
ValueError

If the device type is not recognized.

RuntimeError

If the device is not available on this system.

Examples

>>> validate_torch_device("cpu")  # Always succeeds
>>> validate_torch_device("mps")  # Succeeds on Apple Silicon
mcframework.backends.torch.is_mps_available() bool[source]#

Check if MPS (Metal Performance Shaders) is available.

Returns:
bool

True if MPS is available and PyTorch was built with MPS support.

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()
mcframework.backends.torch.is_cuda_available() bool[source]#

Check if CUDA is available.

Returns:
bool

True if CUDA is available and PyTorch was built with CUDA support.

Examples

>>> if is_cuda_available():
...     backend = TorchCUDABackend()
mcframework.backends.torch.validate_mps_device() None[source]#

Validate that MPS device is available and usable.

Raises:
ImportError

If PyTorch is not installed.

RuntimeError

If MPS is not available or not built into PyTorch.

Examples

>>> validate_mps_device()
mcframework.backends.torch.validate_cuda_device(device_id: int = 0) None[source]#

Validate that CUDA device is available and usable.

Parameters:
device_idint, default 0

CUDA device index to validate.

Raises:
ImportError

If PyTorch is not installed.

RuntimeError

If CUDA is not available or device index is invalid.

Examples

>>> validate_cuda_device()
>>> validate_cuda_device(device_id=1)  # Check second GPU
mcframework.backends.torch.make_torch_generator(device: torch.device, seed_seq: np.random.SeedSequence | None) torch.Generator[source]#

Create an explicit Torch generator seeded from a SeedSequence.

This function spawns a child seed from the provided SeedSequence and uses it to initialize a Torch Generator. This preserves the hierarchical spawning model used by the NumPy backend.

Parameters:
devicetorch.device

Device for the generator ("cpu", "mps", or "cuda").

seed_seqSeedSequence or None

NumPy seed sequence to derive the Torch seed from.

Returns:
torch.Generator

Explicitly seeded generator for reproducible sampling.

Notes

Why explicit generators?

  • torch.manual_seed() is global state that breaks parallel composition

  • Explicit generators enable deterministic multi-stream MC

  • This mirrors NumPy’s SeedSequence.spawn() semantics

Seed derivation:

child_seed = seed_seq.spawn(1)[0]
seed_int = child_seed.generate_state(1, dtype="uint64")[0]
generator.manual_seed(seed_int)

This ensures each call with the same seed_seq produces identical results.

Examples

>>> import numpy as np
>>> import torch
>>> seed_seq = np.random.SeedSequence(42)
>>> gen = make_torch_generator(torch.device("cpu"), seed_seq)

Torch CPU Backend#

Torch CPU execution backend for Monte Carlo simulations.

This module provides:

Classes

TorchCPUBackend — Torch-based batch execution on CPU

The CPU backend enables vectorized execution using PyTorch on CPU, providing a good balance of speed and compatibility.

Notes#

When to use CPU backend:

  • Baseline testing before GPU deployment

  • Systems without GPU acceleration

  • Debugging and validation

  • Small to medium simulation sizes

RNG discipline. Uses explicit torch.Generator objects seeded from numpy.random.SeedSequence. Fully deterministic with same seed.

class mcframework.backends.torch_cpu.TorchCPUBackend[source]#

Bases: object

Torch CPU batch execution backend.

Uses PyTorch for vectorized execution on CPU. Requires simulations to implement torch_batch() and set supports_batch to True.

Notes

RNG architecture. Uses explicit torch.Generator objects seeded from numpy.random.SeedSequence.spawn(). This preserves:

  • Deterministic parallel streams

  • Counter-based RNG (Philox) semantics

  • Identical statistical structure across backends

Never uses torch.manual_seed() (global state).

Examples

>>> backend = TorchCPUBackend()
>>> results = backend.run(sim, n_simulations=100000, seed_seq=seed_seq)
device_type: str = 'cpu'#
__init__()[source]#

Initialize Torch CPU backend.

Raises:
ImportError

If PyTorch is not installed.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch CPU batch execution.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch().

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations, ...).

Raises:
ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement torch_batch().

Torch MPS Backend (Apple Silicon)#

Torch MPS (Metal Performance Shaders) backend for Apple Silicon.

This module provides:

Classes

TorchMPSBackend — GPU-accelerated batch execution on Apple Silicon

Functions

is_mps_available() — Check MPS availability validate_mps_device() — Validate MPS is usable

The MPS backend enables GPU-accelerated Monte Carlo simulations on Apple Silicon Macs (M1/M2/M3/M4) using Metal Performance Shaders.

Notes#

MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to Metal backend scheduling and float32 arithmetic. Statistical properties (mean, variance, CI coverage) remain correct.

Dtype policy. MPS performs best with float32. Sampling uses float32, but results are promoted to float64 on CPU before returning to ensure stats engine precision.

System requirements: - macOS 12.3 (Monterey) or later - Apple Silicon (M1, M2, M3, M4 series) - PyTorch built with MPS support

class mcframework.backends.torch_mps.TorchMPSBackend[source]#

Bases: object

Torch MPS batch execution backend for Apple Silicon GPUs.

Uses PyTorch with MPS (Metal Performance Shaders) backend for GPU-accelerated execution on Apple Silicon Macs and leverage unified memory architecture. Requires simulations to implement torch_batch() and set supports_batch to True to enable Metal Performance Shaders GPU-accelerated batch execution.

See also

is_mps_available()

Check MPS availability before instantiation.

TorchCPUBackend

Fallback for non-Apple systems.

Notes

RNG architecture. Uses explicit Generator objects seeded from SeedSequence via spawn(). This preserves:

  • Deterministic parallel streams (best-effort on MPS)

  • Counter-based RNG (Philox) semantics

  • Correct statistical structure

Never uses manual_seed() (global state).

Dtype policy. MPS performs best with float() (float32):

for stats engine compatibility.

MPS determinism caveat. Torch MPS preserves RNG stream structure but does not guarantee bitwise reproducibility due to:

  • Metal backend scheduling variations

  • float32 arithmetic rounding

  • GPU kernel execution order

Statistical properties (mean, variance, CI coverage) remain correct despite potential bitwise differences between runs. (see TestMPSDeterminism in tests/test_torch_backend.py for actual tests)

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...
device_type: str = 'mps'#
__init__()[source]#

Initialize Torch MPS backend.

Raises:
ImportError

If PyTorch is not installed.

RuntimeError

If MPS is not available on this system.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch MPS batch execution.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch().

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,). Results are float64 despite MPS using float32 internally.

Raises:
ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement torch_batch().

Notes

The dtype conversion flow is:

  1. torch_batch() returns float() (float32) on MPS device.

  2. Tensor moved to CPU via detach() and cpu()

  3. Promoted to double() (float64) via to()

  4. Converted to ndarray of double (float64) via numpy()

This ensures stats engine precision while maximizing MPS performance.

mcframework.backends.torch_mps.is_mps_available() bool[source]#

Check if MPS (Metal Performance Shaders) is available.

Returns:
bool

True if MPS is available and PyTorch was built with MPS support.

Examples

>>> if is_mps_available():
...     backend = TorchMPSBackend()
mcframework.backends.torch_mps.validate_mps_device() None[source]#

Validate that MPS device is available and usable.

Raises:
ImportError

If PyTorch is not installed.

RuntimeError

If MPS is not available or not built into PyTorch.

Examples

>>> validate_mps_device()

Torch CUDA Backend (NVIDIA)#

Torch CUDA backend for NVIDIA GPU acceleration.

This module provides:

Classes

TorchCUDABackend — GPU-accelerated batch execution on NVIDIA GPUs

Functions

is_cuda_available() — Check CUDA availability validate_cuda_device() — Validate CUDA is usable

Features#

Adaptive Batch Sizing: Automatically estimates optimal batch size based on available GPU memory to prevent OOM errors while maximizing throughput.

Dual RNG Modes: - torch.Generator (default) — PyTorch’s Philox RNG, fully deterministic - cuRAND (optional) — Native GPU RNG via CuPy, maximum performance

CUDA Optimizations: - CUDA streams for overlapped execution - Native float64 support (zero conversion overhead vs MPS) - Efficient memory management via PyTorch’s caching allocator

Defensive Validation: Comprehensive checks for supports_batch attribute and required batch methods before execution.

Notes#

Native float64 support: Unlike MPS (Apple Silicon), CUDA fully supports float64 tensors. The backend intelligently handles both float32 and float64, promoting to float64 only when necessary.

Batch size estimation: Uses a probe run to estimate per-sample memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Examples#

>>> # Simple usage with defaults
>>> if is_cuda_available():
...     sim.run(1_000_000, backend="torch", torch_device="cuda")
>>> # Advanced: Direct backend construction with custom settings
>>> if is_cuda_available():
...     from mcframework.backends import TorchCUDABackend
...     backend = TorchCUDABackend(device_id=0, batch_size=100_000, use_streams=True)
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=sim.seed_seq)
...
class mcframework.backends.torch_cuda.TorchCUDABackend[source]#

Bases: object

Torch CUDA batch execution backend for NVIDIA GPUs.

CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement torch_batch() (or cupy_batch() for cuRAND mode) and set supports_batch = True.

Parameters:
device_idint, default 0

CUDA device index to use. Use torch.cuda.device_count() to check available devices.

use_curandbool, default False

Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement cupy_batch().

batch_sizeint or None, default None

Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.

use_streamsbool, default True

Use CUDA streams for overlapped execution. Recommended for performance.

Attributes:
device_typestr

Always "cuda".

devicetorch.device

CUDA device object for this backend.

device_idint

CUDA device index.

use_curandbool

Whether cuRAND mode is enabled.

batch_sizeint or None

Fixed batch size, or None for adaptive.

use_streamsbool

Whether CUDA streams are enabled.

See also

is_cuda_available()

Check CUDA availability before instantiation.

TorchMPSBackend

Apple Silicon alternative.

TorchCPUBackend

CPU fallback.

Notes

RNG architecture: Uses explicit generators seeded from numpy.random.SeedSequence via spawn(). Never uses global RNG state (torch.manual_seed() or cupy.random.RandomState.seed()).

Adaptive batching: When batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Native float64: CUDA fully supports float64 tensors. If simulation’s torch_batch() or cupy_batch() returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.

CUDA streams: When use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.

Examples

>>> # Default configuration (adaptive batching, torch.Generator)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(device_id=0)
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...
>>> # High-performance configuration (fixed batching, CuPy)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(
...         device_id=0,
...         use_curand=True,
...         batch_size=100_000,
...         use_streams=True
...     )
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq)
...
device_type: str = 'cuda'#
__init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#

Initialize Torch CUDA backend with specified configuration.

Parameters:
device_idint, default 0

CUDA device index to use.

use_curandbool, default False

Use cuRAND via CuPy instead of torch.Generator.

batch_sizeint or None, default None

Fixed batch size (None = adaptive).

use_streamsbool, default True

Enable CUDA streams for overlapped execution.

Raises:
ImportError

If PyTorch is not installed, or if CuPy is required but not installed.

RuntimeError

If CUDA is not available or device index is invalid.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch CUDA batch execution with adaptive batching.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch() (or curand_batch() for cuRAND mode).

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,). Results are float64 regardless of internal tensor dtype.

Raises:
AttributeError

If simulation class is missing ‘supports_batch’ attribute.

ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement required batch method.

RuntimeError

If CUDA out-of-memory error occurs during execution.

Notes

Adaptive batching: When batch_size=None (default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.

Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.

Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).

mcframework.backends.torch_cuda.is_cuda_available() bool[source]#

Check if CUDA is available.

Returns:
bool

True if CUDA is available and PyTorch was built with CUDA support.

Examples

>>> if is_cuda_available():
...     backend = TorchCUDABackend()
mcframework.backends.torch_cuda.validate_cuda_device(device_id: int = 0) None[source]#

Validate that CUDA device is available and usable.

Parameters:
device_idint, default 0

CUDA device index to validate.

Raises:
ImportError

If PyTorch is not installed.

RuntimeError

If CUDA is not available or device index is invalid.

Examples

>>> validate_cuda_device()
>>> validate_cuda_device(device_id=1)  # Check second GPU

See Also#

  • Core Module — Base simulation class and framework

  • Stats Engine — Statistical analysis of results

  • demos/demo_apple_silicon_benchmark.py — Benchmark script for Apple Silicon