mcframework.backends.TorchCUDABackend#

class mcframework.backends.TorchCUDABackend[source]#

Bases: object

Torch CUDA batch execution backend for NVIDIA GPUs.

CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement torch_batch() (or curand_batch() for cuRAND mode) and set supports_batch = True.

Parameters:

device_idint, default 0: CUDA device index to use. Use torch.cuda.device_count() to check available devices.
use_curandbool, default False: Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement curand_batch().
batch_sizeint or None, default None: Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.
use_streamsbool, default True: Use CUDA streams for overlapped execution. Recommended for performance.

Attributes:

device_typestr: Always "cuda".
devicetorch.device: CUDA device object for this backend.
device_idint: CUDA device index.
use_curandbool: Whether cuRAND mode is enabled.
batch_sizeint or None: Fixed batch size, or None for adaptive.
use_streamsbool: Whether CUDA streams are enabled.

See also

is_cuda_available(): Check CUDA availability before instantiation.
TorchMPSBackend: Apple Silicon alternative.
TorchCPUBackend: CPU fallback.

Notes

RNG architecture: Uses explicit generators seeded from numpy.random.SeedSequence via spawn(). Never uses global RNG state (torch.manual_seed() or cupy.random.RandomState.seed()).

Adaptive batching: When batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Native float64: CUDA fully supports float64 tensors. If simulation’s torch_batch() or curand_batch() returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.

CUDA streams: When use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.

Examples

>>> # Default configuration (adaptive batching, torch.Generator)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(device_id=0)
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...

>>> # High-performance configuration (fixed batching, CuPy)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(
...         device_id=0,
...         use_curand=True,
...         batch_size=100_000,
...         use_streams=True
...     )
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq)
...

Methods

run

Run simulations using Torch CUDA batch execution with adaptive batching.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) → np.ndarray[source]#

Run simulations using Torch CUDA batch execution with adaptive batching.

Parameters:

simMonteCarloSimulation: The simulation instance to run. Must have supports_batch = True and implement torch_batch() (or curand_batch() for cuRAND mode).
n_simulationsint: Number of simulation draws to perform.
seed_seqSeedSequence or None: Seed sequence for reproducible random streams.
progress_callbackcallable() or None: Optional callback f(completed, total) for progress reporting.
**_simulation_kwargsAny: Ignored for Torch backend (batch method handles all parameters).

Returns:

np.ndarray: Array of simulation results with shape (n_simulations,). Results are float64 regardless of internal tensor dtype.

Raises:

AttributeError: If simulation class is missing ‘supports_batch’ attribute.
ValueError: If the simulation does not support batch execution.
NotImplementedError: If the simulation does not implement required batch method.
RuntimeError: If CUDA out-of-memory error occurs during execution.

Notes

Adaptive batching: When batch_size=None (default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.

Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.

Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).

__init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#

Initialize Torch CUDA backend with specified configuration.

Parameters:

device_idint, default 0: CUDA device index to use.
use_curandbool, default False: Use cuRAND via CuPy instead of torch.Generator.
batch_sizeint or None, default None: Fixed batch size (None = adaptive).
use_streamsbool, default True: Enable CUDA streams for overlapped execution.

Raises:

ImportError: If PyTorch is not installed, or if CuPy is required but not installed.
RuntimeError: If CUDA is not available or device index is invalid.

classmethod __new__(*args, **kwargs)#