Device Module

The homodyne.device module provides CPU device optimization and configuration for high-performance computing environments.

Overview

CPU-Only Architecture:

GPU support was removed to focus on reliable, HPC-optimized CPU execution. The device module provides:

  • Automatic CPU device detection and configuration

  • HPC optimization for multi-core CPUs (14-128 cores)

  • NUMA-aware thread allocation

  • Performance benchmarking

  • Optimal batch size estimation

Design Philosophy:

  • Simplify deployment (CPU-only, no GPU complications)

  • Optimize for HPC clusters with high core counts

  • Reliable performance on standard workstations

  • Automatic configuration with sensible defaults

Module Contents

Device Optimization Module for Homodyne

HPC CPU device optimization with intelligent configuration. Provides CPU-only device detection, configuration, and optimization for high-performance computing environments.

GPU support removed in v2.3.0 - CPU-only optimization focus.

Key Features: - Automatic CPU device detection and optimal configuration - HPC CPU optimization for 36/128-core nodes - Performance benchmarking and optimization - NUMA-aware configuration - Multi-core thread allocation strategies

Usage:

from homodyne.device import configure_optimal_device config = configure_optimal_device()

homodyne.device.configure_optimal_device(cpu_threads=None)[source]

Automatically configure the optimal CPU device for homodyne analysis.

Configures optimized CPU settings for HPC environments.

Parameters:

cpu_threads (int | None) – Number of CPU threads to use. If None, auto-detects optimal count.

Returns:

Device configuration summary with performance hints

Return type:

dict[str, Any]

homodyne.device.get_device_status()[source]

Get current device status and capabilities.

Returns:

Comprehensive CPU device status information

Return type:

dict[str, Any]

homodyne.device.benchmark_device_performance(test_size=5000)[source]

Benchmark CPU device performance for optimization planning.

Parameters:

test_size (int) – Size of benchmark computation

Returns:

Benchmark results with performance metrics

Return type:

dict[str, Any]

Primary Functions

configure_optimal_device

Automatically configure the optimal CPU device for homodyne analysis.

get_device_status

Get current device status and capabilities.

benchmark_device_performance

Benchmark CPU device performance for optimization planning.

Device Configuration

Automatic optimal device configuration.

homodyne.device.configure_optimal_device(cpu_threads=None)[source]

Automatically configure the optimal CPU device for homodyne analysis.

Configures optimized CPU settings for HPC environments.

Parameters:

cpu_threads (int | None) – Number of CPU threads to use. If None, auto-detects optimal count.

Returns:

Device configuration summary with performance hints

Return type:

dict[str, Any]

Usage Example

from homodyne.device import configure_optimal_device

# Auto-configure optimal CPU settings
config = configure_optimal_device()

print(f"Device: {config['device_type']}")
print(f"Threads: {config['device_info']['threads_configured']}")
print(f"Performance ready: {config['performance_ready']}")

# Manual thread count
config = configure_optimal_device(cpu_threads=32)

Configuration Result

The configuration dictionary contains:

  • device_type: Always “cpu”

  • configuration_successful: Boolean indicating success

  • performance_ready: Boolean indicating HPC optimization

  • device_info: Detailed CPU configuration

  • recommendations: Performance optimization suggestions

  • warnings: Any configuration issues

Device Status

Query current device capabilities and status.

homodyne.device.get_device_status()[source]

Get current device status and capabilities.

Returns:

Comprehensive CPU device status information

Return type:

dict[str, Any]

Usage Example

from homodyne.device import get_device_status

status = get_device_status()

print(f"CPU cores: {status['cpu_info']['physical_cores']}")
print(f"Performance estimate: {status['performance_estimate']}")

for rec in status['recommendations']:
    print(f"- {rec}")

Status Information

The status dictionary provides:

  • timestamp: When status was queried

  • cpu_info: CPU hardware information

  • performance_estimate: “high”, “medium-high”, or “medium”

  • recommendations: Performance suggestions

Performance Estimates

  • High: 32+ physical cores (HPC nodes)

  • Medium-High: 16-31 physical cores (workstations)

  • Medium: < 16 physical cores (standard systems)

Performance Benchmarking

Benchmark device performance for optimization planning.

homodyne.device.benchmark_device_performance(test_size=5000)[source]

Benchmark CPU device performance for optimization planning.

Parameters:

test_size (int) – Size of benchmark computation

Returns:

Benchmark results with performance metrics

Return type:

dict[str, Any]

Usage Example

from homodyne.device import benchmark_device_performance

# Run benchmark
results = benchmark_device_performance(test_size=5000)

print(f"Device: {results['device_type']}")
print(f"Results: {results['results']['cpu']}")

Benchmark Metrics

The benchmark measures:

  • Computation time for matrix operations

  • Memory bandwidth

  • Thread scaling efficiency

  • Optimal batch size recommendations

CPU Module

HPC CPU optimization utilities.

HPC CPU Optimization for Homodyne

CPU-primary optimization strategies for high-performance computing environments. Optimized for 36/128-core HPC nodes with intelligent thread management and JAX CPU configuration.

Key Features: - CPU core detection and optimal thread allocation - JAX CPU-specific optimizations for HPC environments - Memory-efficient processing strategies - NUMA-aware configuration - Intel/AMD architecture detection and optimization

HPC Environment Support: - 36-core HPC nodes (typical cluster setup) - 128-core HPC nodes (high-end clusters) - Multi-socket NUMA systems - Intel Xeon and AMD EPYC processors

homodyne.device.cpu.detect_cpu_info()[source]

Detect CPU architecture and capabilities for optimization.

Returns:

CPU information including cores, architecture, and optimization hints

Return type:

dict[str, Any]

homodyne.device.cpu.configure_cpu_hpc(num_threads=None, enable_hyperthreading=False, numa_policy='auto', memory_optimization='standard', enable_onednn=False)[source]

Configure JAX and system for HPC CPU optimization.

Optimizes thread allocation, memory usage, and computational efficiency for HPC environments with 36/128-core nodes.

Parameters:
  • num_threads (int | None) – Number of threads to use. If None, auto-detects optimal count.

  • enable_hyperthreading (bool) – Whether to use hyperthreading. Usually disabled for HPC.

  • numa_policy (str) – NUMA memory policy (“auto”, “local”, “interleave”)

  • memory_optimization (str) – Memory optimization level (“minimal”, “standard”, “aggressive”)

  • enable_onednn (bool) – Enable Intel oneDNN optimizations for matrix operations. Only recommended for Intel CPUs with matrix-heavy workloads. XPCS analysis is element-wise dominated, so benefit is minimal. Set to True to benchmark potential improvements.

Returns:

Configuration summary and performance hints

Return type:

dict[str, Any]

homodyne.device.cpu.configure_cpu_threading(num_threads=None)[source]

Configure CPU threading for NLSQ optimization.

Performance Optimization (Spec 001 - FR-005, T024): Simplified threading configuration for NLSQ initialization. Calls configure_cpu_hpc() with sensible defaults for optimization workloads.

Parameters:

num_threads (int | None) – Number of threads to use. If None, auto-detects optimal count based on physical cores.

Returns:

Configuration summary including thread count and XLA settings.

Return type:

dict[str, Any]

homodyne.device.cpu.get_optimal_batch_size(data_size, available_memory_gb=None, target_memory_usage=0.7)[source]

Calculate optimal batch size for CPU processing.

Parameters:
  • data_size (int) – Total size of data to process

  • available_memory_gb (float | None) – Available memory in GB. If None, auto-detects.

  • target_memory_usage (float) – Target fraction of memory to use

Returns:

Optimal batch size for processing

Return type:

int

homodyne.device.cpu.benchmark_cpu_performance(test_size=10000, num_iterations=5)[source]

Benchmark CPU performance for optimization planning.

Parameters:
  • test_size (int) – Size of test computation

  • num_iterations (int) – Number of benchmark iterations

Returns:

Benchmark results with timing information

Return type:

dict[str, float]

CPU-Specific Functions

homodyne.device.cpu.configure_cpu_hpc

Configure JAX and system for HPC CPU optimization.

homodyne.device.cpu.detect_cpu_info

Detect CPU architecture and capabilities for optimization.

homodyne.device.cpu.benchmark_cpu_performance

Benchmark CPU performance for optimization planning.

homodyne.device.cpu.get_optimal_batch_size

Calculate optimal batch size for CPU processing.

HPC Configuration

from homodyne.device import configure_cpu_hpc

# Configure for HPC environment
cpu_config = configure_cpu_hpc(
    num_threads=36,
    enable_hyperthreading=False,  # Better for HPC
    numa_policy="auto",
    memory_optimization="standard"
)

print(f"Threads configured: {cpu_config['threads_configured']}")
print(f"NUMA nodes: {cpu_config['numa_nodes']}")

CPU Information

from homodyne.device import detect_cpu_info

cpu_info = detect_cpu_info()

print(f"Physical cores: {cpu_info['physical_cores']}")
print(f"Logical cores: {cpu_info['logical_cores']}")
print(f"CPU frequency: {cpu_info['cpu_freq_mhz']} MHz")
print(f"L3 cache: {cpu_info['l3_cache_mb']} MB")

Optimal Batch Size

from homodyne.device import get_optimal_batch_size

# Estimate optimal batch size for memory
batch_size = get_optimal_batch_size(
    data_size_mb=1024,
    available_memory_gb=64
)

print(f"Recommended batch size: {batch_size}")

Configuration Module

Device configuration utilities.

Hardware detection and configuration helpers for CMC.

This module now only detects hardware characteristics to size shards and recommend the execution backend for Consensus Monte Carlo (CMC). Method selection is handled upstream and CMC is always used for MCMC paths.

Usage

from homodyne.device.config import detect_hardware

hw_config = detect_hardware() print(f”Platform: {hw_config.platform}”) print(f”Recommended backend: {hw_config.recommended_backend}”)

Integration

  • CMC coordinator reads HardwareConfig for backend selection and shard sizing.

  • No method-selection logic remains here; CMC is the only MCMC path.

class homodyne.device.config.HardwareConfig[source]

Bases: object

Hardware configuration for CMC optimization.

This dataclass encapsulates all detected hardware information needed for intelligent CMC decision-making and backend selection.

platform

Primary compute platform (CPU-only in v2.3.0+)

Type:

{‘cpu’}

num_devices

Number of available CPU devices

Type:

int

memory_per_device_gb

Available system memory in GB

Type:

float

num_nodes

Number of cluster nodes (1 for standalone)

Type:

int

cores_per_node

Number of physical CPU cores per node

Type:

int

total_memory_gb

Total system memory in GB

Type:

float

cluster_type

Detected cluster scheduler type

Type:

{‘pbs’, ‘slurm’, ‘standalone’, None}

recommended_backend

Recommended CMC backend based on hardware Options: ‘pjit’, ‘multiprocessing’, ‘pbs’, ‘slurm’

Type:

str

max_parallel_shards

Maximum number of shards that can run in parallel - Multi-node cluster: num_nodes * cores_per_node - CPU: cores_per_node

Type:

int

Examples

>>> hw = detect_hardware()
>>> print(hw.platform)
'cpu'
>>> print(hw.max_parallel_shards)
4
>>> print(hw.recommended_backend)
'multiprocessing'
platform: Literal['cpu']
num_devices: int
memory_per_device_gb: float
num_nodes: int
cores_per_node: int
total_memory_gb: float
cluster_type: Literal['pbs', 'slurm', 'standalone'] | None
recommended_backend: str
max_parallel_shards: int
__init__(platform, num_devices, memory_per_device_gb, num_nodes, cores_per_node, total_memory_gb, cluster_type, recommended_backend, max_parallel_shards)
homodyne.device.config.detect_hardware()[source]

Auto-detect hardware configuration for CMC optimization.

This function performs comprehensive hardware detection to inform intelligent CMC strategy selection and backend choice.

Detection Logic

  1. JAX Devices: Query JAX for CPU devices (v2.3.0+ is CPU-only)

  2. System Memory: Query total system memory via psutil - Fallback: Assume 32GB if psutil unavailable

  3. Cluster Environment: Check environment variables - PBS: PBS_JOBID, PBS_NODEFILE - Slurm: SLURM_JOB_NUM_NODES, SLURM_CPUS_ON_NODE - Standalone: Neither PBS nor Slurm detected

  4. CPU Resources: Count physical cores using psutil

  5. Backend Recommendation: Select optimal backend based on: - Multi-node cluster → PBS/Slurm backend - CPU standalone → multiprocessing backend

returns:

Comprehensive hardware configuration for CMC

rtype:

HardwareConfig

Examples

>>> hw = detect_hardware()
>>> print(hw.platform)
'cpu'
>>> print(hw.num_devices)
4
>>> print(hw.memory_per_device_gb)
64.0
>>> print(hw.cluster_type)
'pbs'
>>> print(hw.recommended_backend)
'pbs'

Notes

  • Detection is robust with multiple fallback mechanisms

  • Cluster detection requires environment variables set by scheduler

  • CPU core count excludes hyperthreading for accurate parallelism

  • v2.3.0+ is CPU-only; JAX will always report platform=’cpu’

Environment Variables

The device module sets the following environment variables:

JAX Configuration

  • JAX_PLATFORM_NAME: Set to “cpu” (forces CPU execution)

  • OMP_NUM_THREADS: Set to optimal thread count

  • JAX_ENABLE_X64: Enable 64-bit precision when needed

HPC Optimization

  • KMP_AFFINITY: Thread affinity for Intel CPUs

  • GOMP_CPU_AFFINITY: Thread affinity for GNU OpenMP

  • OMP_PROC_BIND: Thread binding strategy

  • OMP_PLACES: Thread placement policy

Note: These are set automatically by configure_optimal_device(). Manual setting is not recommended.

HPC Best Practices

For HPC Clusters:

  1. Request physical cores only:

    # PBS/Torque
    #PBS -l nodes=1:ppn=36
    
    # SLURM
    #SBATCH --ntasks-per-node=36
    #SBATCH --cpus-per-task=1
    
  2. Disable hyperthreading:

    config = configure_optimal_device(cpu_threads=36)
    
  3. NUMA awareness:

    # Let system auto-detect NUMA topology
    cpu_config = configure_cpu_hpc(numa_policy="auto")
    
  4. Memory allocation:

    # Request 4-5 GB per core for MCMC
    #PBS -l mem=180gb  # 36 cores × 5 GB
    

For Workstations:

  1. Leave headroom for OS:

    # Use n_cores - 2 for interactive systems
    import multiprocessing
    n_cores = multiprocessing.cpu_count() - 2
    config = configure_optimal_device(cpu_threads=n_cores)
    
  2. Monitor performance:

    # Use htop/top to verify thread usage
    htop
    
  3. Batch size optimization:

    # Adjust batch size based on available memory
    batch_size = get_optimal_batch_size(
        data_size_mb=2048,
        available_memory_gb=32
    )
    

Troubleshooting

Low performance on HPC:

  • Verify physical core count:

    python -c "from homodyne.device import detect_cpu_info; print(detect_cpu_info())"
    
  • Check thread binding:

    # Should show affinity to specific cores
    taskset -p $$
    
  • Benchmark performance:

    from homodyne.device import benchmark_device_performance
    results = benchmark_device_performance()
    

Import errors:

  • Install optional dependencies:

    pip install psutil
    
  • Without psutil, basic CPU configuration still works

NUMA warnings:

  • Ignore on non-NUMA systems (laptops, desktops)

  • On HPC, verify NUMA topology:

    numactl --hardware
    

See Also

  • homodyne.optimization - Uses device configuration for optimization

  • homodyne.core - JAX computations on configured device

  • External: JAX CPU Performance FAQ