mcframework.backends.TorchCUDABackend#

class mcframework.backends.TorchCUDABackend[source]#

Bases: object

Torch CUDA batch execution backend for NVIDIA GPUs.

CUDA backend with adaptive batch sizing, dual RNG modes, and native float64 support. Requires simulations to implement torch_batch() (or cupy_batch() for cuRAND mode) and set supports_batch = True.

Parameters:
device_idint, default 0

CUDA device index to use. Use torch.cuda.device_count() to check available devices.

use_curandbool, default False

Use cuRAND (via CuPy) instead of torch.Generator for RNG. Requires CuPy installation and simulation to implement cupy_batch().

batch_sizeint or None, default None

Fixed batch size for simulation execution. If None, automatically estimates optimal batch size based on available GPU memory.

use_streamsbool, default True

Use CUDA streams for overlapped execution. Recommended for performance.

Attributes:
device_typestr

Always "cuda".

devicetorch.device

CUDA device object for this backend.

device_idint

CUDA device index.

use_curandbool

Whether cuRAND mode is enabled.

batch_sizeint or None

Fixed batch size, or None for adaptive.

use_streamsbool

Whether CUDA streams are enabled.

See also

is_cuda_available()

Check CUDA availability before instantiation.

TorchMPSBackend

Apple Silicon alternative.

TorchCPUBackend

CPU fallback.

Notes

RNG architecture: Uses explicit generators seeded from numpy.random.SeedSequence via spawn(). Never uses global RNG state (torch.manual_seed() or cupy.random.RandomState.seed()).

Adaptive batching: When batch_size=None, performs a probe run with 1000 samples to estimate memory requirements, then calculates optimal batch size to use ~75% of available GPU memory.

Native float64: CUDA fully supports float64 tensors. If simulation’s torch_batch() or cupy_batch() returns float64, the backend uses it directly with zero conversion overhead. If float32, it converts to float64 on GPU before moving to CPU for stats engine compatibility.

CUDA streams: When use_streams=True, executes each batch in a dedicated stream for better GPU utilization and overlapped execution.

Examples

>>> # Default configuration (adaptive batching, torch.Generator)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(device_id=0)
...     results = backend.run(sim, n_simulations=1_000_000, seed_seq=seed_seq)
...
>>> # High-performance configuration (fixed batching, CuPy)
>>> if is_cuda_available():
...     backend = TorchCUDABackend(
...         device_id=0,
...         use_curand=True,
...         batch_size=100_000,
...         use_streams=True
...     )
...     results = backend.run(sim, n_simulations=10_000_000, seed_seq=seed_seq)
...

Methods

run

Run simulations using Torch CUDA batch execution with adaptive batching.

run(sim: MonteCarloSimulation, n_simulations: int, seed_seq: np.random.SeedSequence | None, progress_callback: Callable[[int, int], None] | None = None, **_simulation_kwargs: Any) np.ndarray[source]#

Run simulations using Torch CUDA batch execution with adaptive batching.

Parameters:
simMonteCarloSimulation

The simulation instance to run. Must have supports_batch = True and implement torch_batch() (or curand_batch() for cuRAND mode).

n_simulationsint

Number of simulation draws to perform.

seed_seqSeedSequence or None

Seed sequence for reproducible random streams.

progress_callbackcallable() or None

Optional callback f(completed, total) for progress reporting.

**_simulation_kwargsAny

Ignored for Torch backend (batch method handles all parameters).

Returns:
np.ndarray

Array of simulation results with shape (n_simulations,). Results are float64 regardless of internal tensor dtype.

Raises:
AttributeError

If simulation class is missing ‘supports_batch’ attribute.

ValueError

If the simulation does not support batch execution.

NotImplementedError

If the simulation does not implement required batch method.

RuntimeError

If CUDA out-of-memory error occurs during execution.

Notes

Adaptive batching: When batch_size=None (default), automatically estimates optimal batch size. Large workloads are split across multiple batches with progress tracking.

Memory safety: Monitors GPU memory and adjusts batch size to prevent OOM errors. Uses PyTorch’s caching allocator for efficient memory reuse.

Determinism: With same seed, produces identical results (bitwise for torch.Generator, statistical for cuRAND).

__init__(device_id: int = 0, use_curand: bool = False, batch_size: int | None = None, use_streams: bool = True)[source]#

Initialize Torch CUDA backend with specified configuration.

Parameters:
device_idint, default 0

CUDA device index to use.

use_curandbool, default False

Use cuRAND via CuPy instead of torch.Generator.

batch_sizeint or None, default None

Fixed batch size (None = adaptive).

use_streamsbool, default True

Enable CUDA streams for overlapped execution.

Raises:
ImportError

If PyTorch is not installed, or if CuPy is required but not installed.

RuntimeError

If CUDA is not available or device index is invalid.

classmethod __new__(*args, **kwargs)#