Device Module¶

The homodyne.device module provides CPU device optimization and configuration for high-performance computing environments.

Overview¶

CPU-Only Architecture:

GPU support was removed to focus on reliable, HPC-optimized CPU execution. The device module provides:

Automatic CPU device detection and configuration
HPC optimization for multi-core CPUs (14-128 cores)
NUMA-aware thread allocation
Performance benchmarking
Optimal batch size estimation

Design Philosophy:

Simplify deployment (CPU-only, no GPU complications)
Optimize for HPC clusters with high core counts
Reliable performance on standard workstations
Automatic configuration with sensible defaults

Module Contents¶

Device Optimization Module for Homodyne¶

HPC CPU device optimization with intelligent configuration. Provides CPU-only device detection, configuration, and optimization for high-performance computing environments.

GPU support removed in v2.3.0 - CPU-only optimization focus.

Key Features: - Automatic CPU device detection and optimal configuration - HPC CPU optimization for 36/128-core nodes - Performance benchmarking and optimization - NUMA-aware configuration - Multi-core thread allocation strategies

Usage:: from homodyne.device import configure_optimal_device config = configure_optimal_device()

homodyne.device.configure_optimal_device(cpu_threads=None)[source]

Automatically configure the optimal CPU device for homodyne analysis.

Configures optimized CPU settings for HPC environments.

Parameters:: cpu_threads (int | None) – Number of CPU threads to use. If None, auto-detects optimal count.
Returns:: Device configuration summary with performance hints
Return type:: dict[str, Any]

homodyne.device.get_device_status()[source]

Get current device status and capabilities.

Returns:: Comprehensive CPU device status information
Return type:: dict[str, Any]

homodyne.device.benchmark_device_performance(test_size=5000)[source]

Benchmark CPU device performance for optimization planning.

Parameters:: test_size (int) – Size of benchmark computation
Returns:: Benchmark results with performance metrics
Return type:: dict[str, Any]

Primary Functions¶

`configure_optimal_device`	Automatically configure the optimal CPU device for homodyne analysis.
`get_device_status`	Get current device status and capabilities.
`benchmark_device_performance`	Benchmark CPU device performance for optimization planning.

Device Configuration¶

Automatic optimal device configuration.

homodyne.device.configure_optimal_device(cpu_threads=None)[source]

Automatically configure the optimal CPU device for homodyne analysis.

Configures optimized CPU settings for HPC environments.

Parameters:: cpu_threads (int | None) – Number of CPU threads to use. If None, auto-detects optimal count.
Returns:: Device configuration summary with performance hints
Return type:: dict[str, Any]

Usage Example¶

from homodyne.device import configure_optimal_device

# Auto-configure optimal CPU settings
config = configure_optimal_device()

print(f"Device: {config['device_type']}")
print(f"Threads: {config['device_info']['threads_configured']}")
print(f"Performance ready: {config['performance_ready']}")

# Manual thread count
config = configure_optimal_device(cpu_threads=32)

Configuration Result¶

The configuration dictionary contains:

device_type: Always “cpu”
configuration_successful: Boolean indicating success
performance_ready: Boolean indicating HPC optimization
device_info: Detailed CPU configuration
recommendations: Performance optimization suggestions
warnings: Any configuration issues

Device Status¶

Query current device capabilities and status.

homodyne.device.get_device_status()[source]

Get current device status and capabilities.

Returns:: Comprehensive CPU device status information
Return type:: dict[str, Any]

Usage Example¶

from homodyne.device import get_device_status

status = get_device_status()

print(f"CPU cores: {status['cpu_info']['physical_cores']}")
print(f"Performance estimate: {status['performance_estimate']}")

for rec in status['recommendations']:
    print(f"- {rec}")

Status Information¶

The status dictionary provides:

timestamp: When status was queried
cpu_info: CPU hardware information
performance_estimate: “high”, “medium-high”, or “medium”
recommendations: Performance suggestions

Performance Estimates¶

High: 32+ physical cores (HPC nodes)
Medium-High: 16-31 physical cores (workstations)
Medium: < 16 physical cores (standard systems)

Performance Benchmarking¶

Benchmark device performance for optimization planning.

homodyne.device.benchmark_device_performance(test_size=5000)[source]

Benchmark CPU device performance for optimization planning.

Parameters:: test_size (int) – Size of benchmark computation
Returns:: Benchmark results with performance metrics
Return type:: dict[str, Any]

Usage Example¶

from homodyne.device import benchmark_device_performance

# Run benchmark
results = benchmark_device_performance(test_size=5000)

print(f"Device: {results['device_type']}")
print(f"Results: {results['results']['cpu']}")

Benchmark Metrics¶

The benchmark measures:

Computation time for matrix operations
Memory bandwidth
Thread scaling efficiency
Optimal batch size recommendations

CPU Module¶

HPC CPU optimization utilities.

HPC CPU Optimization for Homodyne¶

CPU-primary optimization strategies for high-performance computing environments. Optimized for 36/128-core HPC nodes with intelligent thread management and JAX CPU configuration.

Key Features: - CPU core detection and optimal thread allocation - JAX CPU-specific optimizations for HPC environments - Memory-efficient processing strategies - NUMA-aware configuration - Intel/AMD architecture detection and optimization

HPC Environment Support: - 36-core HPC nodes (typical cluster setup) - 128-core HPC nodes (high-end clusters) - Multi-socket NUMA systems - Intel Xeon and AMD EPYC processors

homodyne.device.cpu.detect_cpu_info()[source]

Detect CPU architecture and capabilities for optimization.

Returns:: CPU information including cores, architecture, and optimization hints
Return type:: dict[str, Any]

homodyne.device.cpu.configure_cpu_hpc(num_threads=None, enable_hyperthreading=False, numa_policy='auto', memory_optimization='standard', enable_onednn=False)[source]

Configure JAX and system for HPC CPU optimization.

Optimizes thread allocation, memory usage, and computational efficiency for HPC environments with 36/128-core nodes.

Parameters:

num_threads (int | None) – Number of threads to use. If None, auto-detects optimal count.
enable_hyperthreading (bool) – Whether to use hyperthreading. Usually disabled for HPC.
numa_policy (str) – NUMA memory policy (“auto”, “local”, “interleave”)
memory_optimization (str) – Memory optimization level (“minimal”, “standard”, “aggressive”)
enable_onednn (bool) – Enable Intel oneDNN optimizations for matrix operations. Only recommended for Intel CPUs with matrix-heavy workloads. XPCS analysis is element-wise dominated, so benefit is minimal. Set to True to benchmark potential improvements.

Returns:

Configuration summary and performance hints

Return type:

dict[str, Any]

homodyne.device.cpu.configure_cpu_threading(num_threads=None)[source]

Configure CPU threading for NLSQ optimization.

Performance Optimization (Spec 001 - FR-005, T024): Simplified threading configuration for NLSQ initialization. Calls configure_cpu_hpc() with sensible defaults for optimization workloads.

Parameters:: num_threads (int | None) – Number of threads to use. If None, auto-detects optimal count based on physical cores.
Returns:: Configuration summary including thread count and XLA settings.
Return type:: dict[str, Any]

homodyne.device.cpu.get_optimal_batch_size(data_size, available_memory_gb=None, target_memory_usage=0.7)[source]

Calculate optimal batch size for CPU processing.

Parameters:

data_size (int) – Total size of data to process
available_memory_gb (float | None) – Available memory in GB. If None, auto-detects.
target_memory_usage (float) – Target fraction of memory to use

Returns:

Optimal batch size for processing

Return type:

int

homodyne.device.cpu.benchmark_cpu_performance(test_size=10000, num_iterations=5)[source]

Benchmark CPU performance for optimization planning.

Parameters:

test_size (int) – Size of test computation
num_iterations (int) – Number of benchmark iterations

Returns:

Benchmark results with timing information

Return type:

dict[str, float]

CPU-Specific Functions¶

`homodyne.device.cpu.configure_cpu_hpc`	Configure JAX and system for HPC CPU optimization.
`homodyne.device.cpu.detect_cpu_info`	Detect CPU architecture and capabilities for optimization.
`homodyne.device.cpu.benchmark_cpu_performance`	Benchmark CPU performance for optimization planning.
`homodyne.device.cpu.get_optimal_batch_size`	Calculate optimal batch size for CPU processing.

HPC Configuration¶

from homodyne.device import configure_cpu_hpc

# Configure for HPC environment
cpu_config = configure_cpu_hpc(
    num_threads=36,
    enable_hyperthreading=False,  # Better for HPC
    numa_policy="auto",
    memory_optimization="standard"
)

print(f"Threads configured: {cpu_config['threads_configured']}")
print(f"NUMA nodes: {cpu_config['numa_nodes']}")

CPU Information¶

from homodyne.device import detect_cpu_info

cpu_info = detect_cpu_info()

print(f"Physical cores: {cpu_info['physical_cores']}")
print(f"Logical cores: {cpu_info['logical_cores']}")
print(f"CPU frequency: {cpu_info['cpu_freq_mhz']} MHz")
print(f"L3 cache: {cpu_info['l3_cache_mb']} MB")

Optimal Batch Size¶

from homodyne.device import get_optimal_batch_size

# Estimate optimal batch size for memory
batch_size = get_optimal_batch_size(
    data_size_mb=1024,
    available_memory_gb=64
)

print(f"Recommended batch size: {batch_size}")

Configuration Module¶

Device configuration utilities.

Hardware detection and configuration helpers for CMC.¶

This module now only detects hardware characteristics to size shards and recommend the execution backend for Consensus Monte Carlo (CMC). Method selection is handled upstream and CMC is always used for MCMC paths.

Usage¶

from homodyne.device.config import detect_hardware

hw_config = detect_hardware() print(f”Platform: {hw_config.platform}”) print(f”Recommended backend: {hw_config.recommended_backend}”)

Integration¶

CMC coordinator reads HardwareConfig for backend selection and shard sizing.
No method-selection logic remains here; CMC is the only MCMC path.

class homodyne.device.config.HardwareConfig[source]

Bases: object

Hardware configuration for CMC optimization.

This dataclass encapsulates all detected hardware information needed for intelligent CMC decision-making and backend selection.

platform

Primary compute platform (CPU-only in v2.3.0+)

Type:: {‘cpu’}

num_devices

Number of available CPU devices

Type:: int

memory_per_device_gb

Available system memory in GB

Type:: float

num_nodes

Number of cluster nodes (1 for standalone)

Type:: int

cores_per_node

Number of physical CPU cores per node

Type:: int

total_memory_gb

Total system memory in GB

Type:: float

cluster_type

Detected cluster scheduler type

Type:: {‘pbs’, ‘slurm’, ‘standalone’, None}

recommended_backend

Recommended CMC backend based on hardware Options: ‘pjit’, ‘multiprocessing’, ‘pbs’, ‘slurm’

Type:: str

max_parallel_shards

Maximum number of shards that can run in parallel - Multi-node cluster: num_nodes * cores_per_node - CPU: cores_per_node

Type:: int

Examples

>>> hw = detect_hardware()
>>> print(hw.platform)
'cpu'
>>> print(hw.max_parallel_shards)
4
>>> print(hw.recommended_backend)
'multiprocessing'

platform: Literal['cpu']

num_devices: int

memory_per_device_gb: float

num_nodes: int

cores_per_node: int

total_memory_gb: float

cluster_type: Literal['pbs', 'slurm', 'standalone'] | None

recommended_backend: str

max_parallel_shards: int

__init__(platform, num_devices, memory_per_device_gb, num_nodes, cores_per_node, total_memory_gb, cluster_type, recommended_backend, max_parallel_shards)

homodyne.device.config.detect_hardware()[source]

Auto-detect hardware configuration for CMC optimization.

This function performs comprehensive hardware detection to inform intelligent CMC strategy selection and backend choice.

Detection Logic¶

JAX Devices: Query JAX for CPU devices (v2.3.0+ is CPU-only)
System Memory: Query total system memory via psutil - Fallback: Assume 32GB if psutil unavailable
Cluster Environment: Check environment variables - PBS: PBS_JOBID, PBS_NODEFILE - Slurm: SLURM_JOB_NUM_NODES, SLURM_CPUS_ON_NODE - Standalone: Neither PBS nor Slurm detected
CPU Resources: Count physical cores using psutil
Backend Recommendation: Select optimal backend based on: - Multi-node cluster → PBS/Slurm backend - CPU standalone → multiprocessing backend

returns:: Comprehensive hardware configuration for CMC
rtype:: HardwareConfig

Examples

>>> hw = detect_hardware()
>>> print(hw.platform)
'cpu'
>>> print(hw.num_devices)
4
>>> print(hw.memory_per_device_gb)
64.0
>>> print(hw.cluster_type)
'pbs'
>>> print(hw.recommended_backend)
'pbs'

Notes

Detection is robust with multiple fallback mechanisms
Cluster detection requires environment variables set by scheduler
CPU core count excludes hyperthreading for accurate parallelism
v2.3.0+ is CPU-only; JAX will always report platform=’cpu’

Environment Variables¶

The device module sets the following environment variables:

JAX Configuration¶

JAX_PLATFORM_NAME: Set to “cpu” (forces CPU execution)
OMP_NUM_THREADS: Set to optimal thread count
JAX_ENABLE_X64: Enable 64-bit precision when needed

HPC Optimization¶

KMP_AFFINITY: Thread affinity for Intel CPUs
GOMP_CPU_AFFINITY: Thread affinity for GNU OpenMP
OMP_PROC_BIND: Thread binding strategy
OMP_PLACES: Thread placement policy

Note: These are set automatically by configure_optimal_device(). Manual setting is not recommended.

HPC Best Practices¶

For HPC Clusters:

Request physical cores only:

# PBS/Torque
#PBS -l nodes=1:ppn=36

# SLURM
#SBATCH --ntasks-per-node=36
#SBATCH --cpus-per-task=1

Disable hyperthreading:

config = configure_optimal_device(cpu_threads=36)

NUMA awareness:

# Let system auto-detect NUMA topology
cpu_config = configure_cpu_hpc(numa_policy="auto")

Memory allocation:

# Request 4-5 GB per core for MCMC
#PBS -l mem=180gb  # 36 cores × 5 GB

For Workstations:

Leave headroom for OS:

# Use n_cores - 2 for interactive systems
import multiprocessing
n_cores = multiprocessing.cpu_count() - 2
config = configure_optimal_device(cpu_threads=n_cores)

Monitor performance:

# Use htop/top to verify thread usage
htop

Batch size optimization:

# Adjust batch size based on available memory
batch_size = get_optimal_batch_size(
    data_size_mb=2048,
    available_memory_gb=32
)

Troubleshooting¶

Low performance on HPC:

Verify physical core count:

python -c "from homodyne.device import detect_cpu_info; print(detect_cpu_info())"

Check thread binding:

# Should show affinity to specific cores
taskset -p $$

Benchmark performance:

from homodyne.device import benchmark_device_performance
results = benchmark_device_performance()

Import errors:

Install optional dependencies:
```
pip install psutil
```
Without psutil, basic CPU configuration still works

NUMA warnings:

Ignore on non-NUMA systems (laptops, desktops)
On HPC, verify NUMA topology:
```
numactl --hardware
```

Device Module¶

Overview¶

Module Contents¶

Device Optimization Module for Homodyne¶

Primary Functions¶

Device Configuration¶

Usage Example¶

Configuration Result¶

Device Status¶

Usage Example¶

Status Information¶

Performance Estimates¶

Performance Benchmarking¶

Usage Example¶

Benchmark Metrics¶

CPU Module¶

HPC CPU Optimization for Homodyne¶

CPU-Specific Functions¶

HPC Configuration¶

CPU Information¶

Optimal Batch Size¶

Configuration Module¶

Hardware detection and configuration helpers for CMC.¶

Usage¶

Integration¶

Detection Logic¶

Environment Variables¶

JAX Configuration¶

HPC Optimization¶

HPC Best Practices¶

Troubleshooting¶

See Also¶