Device Module¶
The homodyne.device module provides CPU device optimization and configuration for high-performance computing environments.
Overview¶
CPU-Only Architecture:
GPU support was removed to focus on reliable, HPC-optimized CPU execution. The device module provides:
Automatic CPU device detection and configuration
HPC optimization for multi-core CPUs (14-128 cores)
NUMA-aware thread allocation
Performance benchmarking
Optimal batch size estimation
Design Philosophy:
Simplify deployment (CPU-only, no GPU complications)
Optimize for HPC clusters with high core counts
Reliable performance on standard workstations
Automatic configuration with sensible defaults
Module Contents¶
Device Optimization Module for Homodyne¶
HPC CPU device optimization with intelligent configuration. Provides CPU-only device detection, configuration, and optimization for high-performance computing environments.
GPU support removed in v2.3.0 - CPU-only optimization focus.
Key Features: - Automatic CPU device detection and optimal configuration - HPC CPU optimization for 36/128-core nodes - Performance benchmarking and optimization - NUMA-aware configuration - Multi-core thread allocation strategies
- Usage:
from homodyne.device import configure_optimal_device config = configure_optimal_device()
- homodyne.device.configure_optimal_device(cpu_threads=None)[source]
Automatically configure the optimal CPU device for homodyne analysis.
Configures optimized CPU settings for HPC environments.
- homodyne.device.get_device_status()[source]
Get current device status and capabilities.
- homodyne.device.benchmark_device_performance(test_size=5000)[source]
Benchmark CPU device performance for optimization planning.
Primary Functions¶
|
Automatically configure the optimal CPU device for homodyne analysis. |
|
Get current device status and capabilities. |
|
Benchmark CPU device performance for optimization planning. |
Device Configuration¶
Automatic optimal device configuration.
- homodyne.device.configure_optimal_device(cpu_threads=None)[source]
Automatically configure the optimal CPU device for homodyne analysis.
Configures optimized CPU settings for HPC environments.
Usage Example¶
from homodyne.device import configure_optimal_device
# Auto-configure optimal CPU settings
config = configure_optimal_device()
print(f"Device: {config['device_type']}")
print(f"Threads: {config['device_info']['threads_configured']}")
print(f"Performance ready: {config['performance_ready']}")
# Manual thread count
config = configure_optimal_device(cpu_threads=32)
Configuration Result¶
The configuration dictionary contains:
device_type: Always “cpu”configuration_successful: Boolean indicating successperformance_ready: Boolean indicating HPC optimizationdevice_info: Detailed CPU configurationrecommendations: Performance optimization suggestionswarnings: Any configuration issues
Device Status¶
Query current device capabilities and status.
- homodyne.device.get_device_status()[source]
Get current device status and capabilities.
Usage Example¶
from homodyne.device import get_device_status
status = get_device_status()
print(f"CPU cores: {status['cpu_info']['physical_cores']}")
print(f"Performance estimate: {status['performance_estimate']}")
for rec in status['recommendations']:
print(f"- {rec}")
Status Information¶
The status dictionary provides:
timestamp: When status was queriedcpu_info: CPU hardware informationperformance_estimate: “high”, “medium-high”, or “medium”recommendations: Performance suggestions
Performance Estimates¶
High: 32+ physical cores (HPC nodes)
Medium-High: 16-31 physical cores (workstations)
Medium: < 16 physical cores (standard systems)
Performance Benchmarking¶
Benchmark device performance for optimization planning.
- homodyne.device.benchmark_device_performance(test_size=5000)[source]
Benchmark CPU device performance for optimization planning.
Usage Example¶
from homodyne.device import benchmark_device_performance
# Run benchmark
results = benchmark_device_performance(test_size=5000)
print(f"Device: {results['device_type']}")
print(f"Results: {results['results']['cpu']}")
Benchmark Metrics¶
The benchmark measures:
Computation time for matrix operations
Memory bandwidth
Thread scaling efficiency
Optimal batch size recommendations
CPU Module¶
HPC CPU optimization utilities.
HPC CPU Optimization for Homodyne¶
CPU-primary optimization strategies for high-performance computing environments. Optimized for 36/128-core HPC nodes with intelligent thread management and JAX CPU configuration.
Key Features: - CPU core detection and optimal thread allocation - JAX CPU-specific optimizations for HPC environments - Memory-efficient processing strategies - NUMA-aware configuration - Intel/AMD architecture detection and optimization
HPC Environment Support: - 36-core HPC nodes (typical cluster setup) - 128-core HPC nodes (high-end clusters) - Multi-socket NUMA systems - Intel Xeon and AMD EPYC processors
- homodyne.device.cpu.detect_cpu_info()[source]
Detect CPU architecture and capabilities for optimization.
- homodyne.device.cpu.configure_cpu_hpc(num_threads=None, enable_hyperthreading=False, numa_policy='auto', memory_optimization='standard', enable_onednn=False)[source]
Configure JAX and system for HPC CPU optimization.
Optimizes thread allocation, memory usage, and computational efficiency for HPC environments with 36/128-core nodes.
- Parameters:
num_threads (
int|None) – Number of threads to use. If None, auto-detects optimal count.enable_hyperthreading (
bool) – Whether to use hyperthreading. Usually disabled for HPC.numa_policy (
str) – NUMA memory policy (“auto”, “local”, “interleave”)memory_optimization (
str) – Memory optimization level (“minimal”, “standard”, “aggressive”)enable_onednn (
bool) – Enable Intel oneDNN optimizations for matrix operations. Only recommended for Intel CPUs with matrix-heavy workloads. XPCS analysis is element-wise dominated, so benefit is minimal. Set to True to benchmark potential improvements.
- Returns:
Configuration summary and performance hints
- Return type:
- homodyne.device.cpu.configure_cpu_threading(num_threads=None)[source]
Configure CPU threading for NLSQ optimization.
Performance Optimization (Spec 001 - FR-005, T024): Simplified threading configuration for NLSQ initialization. Calls configure_cpu_hpc() with sensible defaults for optimization workloads.
- homodyne.device.cpu.get_optimal_batch_size(data_size, available_memory_gb=None, target_memory_usage=0.7)[source]
Calculate optimal batch size for CPU processing.
- homodyne.device.cpu.benchmark_cpu_performance(test_size=10000, num_iterations=5)[source]
Benchmark CPU performance for optimization planning.
CPU-Specific Functions¶
|
Configure JAX and system for HPC CPU optimization. |
|
Detect CPU architecture and capabilities for optimization. |
|
Benchmark CPU performance for optimization planning. |
|
Calculate optimal batch size for CPU processing. |
HPC Configuration¶
from homodyne.device import configure_cpu_hpc
# Configure for HPC environment
cpu_config = configure_cpu_hpc(
num_threads=36,
enable_hyperthreading=False, # Better for HPC
numa_policy="auto",
memory_optimization="standard"
)
print(f"Threads configured: {cpu_config['threads_configured']}")
print(f"NUMA nodes: {cpu_config['numa_nodes']}")
CPU Information¶
from homodyne.device import detect_cpu_info
cpu_info = detect_cpu_info()
print(f"Physical cores: {cpu_info['physical_cores']}")
print(f"Logical cores: {cpu_info['logical_cores']}")
print(f"CPU frequency: {cpu_info['cpu_freq_mhz']} MHz")
print(f"L3 cache: {cpu_info['l3_cache_mb']} MB")
Optimal Batch Size¶
from homodyne.device import get_optimal_batch_size
# Estimate optimal batch size for memory
batch_size = get_optimal_batch_size(
data_size_mb=1024,
available_memory_gb=64
)
print(f"Recommended batch size: {batch_size}")
Configuration Module¶
Device configuration utilities.
Hardware detection and configuration helpers for CMC.¶
This module now only detects hardware characteristics to size shards and recommend the execution backend for Consensus Monte Carlo (CMC). Method selection is handled upstream and CMC is always used for MCMC paths.
Usage¶
from homodyne.device.config import detect_hardware
hw_config = detect_hardware() print(f”Platform: {hw_config.platform}”) print(f”Recommended backend: {hw_config.recommended_backend}”)
Integration¶
CMC coordinator reads
HardwareConfigfor backend selection and shard sizing.No method-selection logic remains here; CMC is the only MCMC path.
- class homodyne.device.config.HardwareConfig[source]
Bases:
objectHardware configuration for CMC optimization.
This dataclass encapsulates all detected hardware information needed for intelligent CMC decision-making and backend selection.
- platform
Primary compute platform (CPU-only in v2.3.0+)
- Type:
{‘cpu’}
- num_devices
Number of available CPU devices
- Type:
- memory_per_device_gb
Available system memory in GB
- Type:
- num_nodes
Number of cluster nodes (1 for standalone)
- Type:
- cores_per_node
Number of physical CPU cores per node
- Type:
- total_memory_gb
Total system memory in GB
- Type:
- cluster_type
Detected cluster scheduler type
- Type:
{‘pbs’, ‘slurm’, ‘standalone’, None}
- recommended_backend
Recommended CMC backend based on hardware Options: ‘pjit’, ‘multiprocessing’, ‘pbs’, ‘slurm’
- Type:
- max_parallel_shards
Maximum number of shards that can run in parallel - Multi-node cluster: num_nodes * cores_per_node - CPU: cores_per_node
- Type:
Examples
>>> hw = detect_hardware() >>> print(hw.platform) 'cpu' >>> print(hw.max_parallel_shards) 4 >>> print(hw.recommended_backend) 'multiprocessing'
- platform: Literal['cpu']
- num_devices: int
- memory_per_device_gb: float
- num_nodes: int
- cores_per_node: int
- total_memory_gb: float
- recommended_backend: str
- max_parallel_shards: int
- __init__(platform, num_devices, memory_per_device_gb, num_nodes, cores_per_node, total_memory_gb, cluster_type, recommended_backend, max_parallel_shards)
- homodyne.device.config.detect_hardware()[source]
Auto-detect hardware configuration for CMC optimization.
This function performs comprehensive hardware detection to inform intelligent CMC strategy selection and backend choice.
Detection Logic¶
JAX Devices: Query JAX for CPU devices (v2.3.0+ is CPU-only)
System Memory: Query total system memory via psutil - Fallback: Assume 32GB if psutil unavailable
Cluster Environment: Check environment variables - PBS: PBS_JOBID, PBS_NODEFILE - Slurm: SLURM_JOB_NUM_NODES, SLURM_CPUS_ON_NODE - Standalone: Neither PBS nor Slurm detected
CPU Resources: Count physical cores using psutil
Backend Recommendation: Select optimal backend based on: - Multi-node cluster → PBS/Slurm backend - CPU standalone → multiprocessing backend
- returns:
Comprehensive hardware configuration for CMC
- rtype:
HardwareConfig
Examples
>>> hw = detect_hardware() >>> print(hw.platform) 'cpu' >>> print(hw.num_devices) 4 >>> print(hw.memory_per_device_gb) 64.0 >>> print(hw.cluster_type) 'pbs' >>> print(hw.recommended_backend) 'pbs'
Notes
Detection is robust with multiple fallback mechanisms
Cluster detection requires environment variables set by scheduler
CPU core count excludes hyperthreading for accurate parallelism
v2.3.0+ is CPU-only; JAX will always report platform=’cpu’
Environment Variables¶
The device module sets the following environment variables:
JAX Configuration¶
JAX_PLATFORM_NAME: Set to “cpu” (forces CPU execution)OMP_NUM_THREADS: Set to optimal thread countJAX_ENABLE_X64: Enable 64-bit precision when needed
HPC Optimization¶
KMP_AFFINITY: Thread affinity for Intel CPUsGOMP_CPU_AFFINITY: Thread affinity for GNU OpenMPOMP_PROC_BIND: Thread binding strategyOMP_PLACES: Thread placement policy
Note: These are set automatically by configure_optimal_device(). Manual setting is not recommended.
HPC Best Practices¶
For HPC Clusters:
Request physical cores only:
# PBS/Torque #PBS -l nodes=1:ppn=36 # SLURM #SBATCH --ntasks-per-node=36 #SBATCH --cpus-per-task=1
Disable hyperthreading:
config = configure_optimal_device(cpu_threads=36)
NUMA awareness:
# Let system auto-detect NUMA topology cpu_config = configure_cpu_hpc(numa_policy="auto")
Memory allocation:
# Request 4-5 GB per core for MCMC #PBS -l mem=180gb # 36 cores × 5 GB
For Workstations:
Leave headroom for OS:
# Use n_cores - 2 for interactive systems import multiprocessing n_cores = multiprocessing.cpu_count() - 2 config = configure_optimal_device(cpu_threads=n_cores)
Monitor performance:
# Use htop/top to verify thread usage htop
Batch size optimization:
# Adjust batch size based on available memory batch_size = get_optimal_batch_size( data_size_mb=2048, available_memory_gb=32 )
Troubleshooting¶
Low performance on HPC:
Verify physical core count:
python -c "from homodyne.device import detect_cpu_info; print(detect_cpu_info())"
Check thread binding:
# Should show affinity to specific cores taskset -p $$
Benchmark performance:
from homodyne.device import benchmark_device_performance results = benchmark_device_performance()
Import errors:
Install optional dependencies:
pip install psutil
Without psutil, basic CPU configuration still works
NUMA warnings:
Ignore on non-NUMA systems (laptops, desktops)
On HPC, verify NUMA topology:
numactl --hardware
See Also¶
homodyne.optimization- Uses device configuration for optimizationhomodyne.core- JAX computations on configured deviceExternal: JAX CPU Performance FAQ