Large Dataset Handling and Streaming¶
Learning Objectives
By the end of this section you will understand:
How homodyne handles datasets that exceed available RAM
The memory threshold and how to configure it
The difference between stratified and hybrid streaming strategies
Performance tips for large datasets
—
Overview¶
XPCS datasets can be large: a typical experiment with 100 azimuthal angles, 1000 frames, and a 1 megapixel detector can produce \(O(10^8)\) data points in the flattened two-time correlation array.
Homodyne automatically selects the appropriate memory strategy:
Dataset Size |
Strategy |
Memory Usage |
|---|---|---|
< 10M points or < 75% RAM |
Stratified LS (full-batch) |
~2 GB per million points |
> 10M points or > 75% RAM |
Hybrid Streaming |
~2 GB fixed overhead |
Scale ratio > 1000 |
CMA-ES + NLSQ |
Bounded |
—
Memory Thresholds¶
The memory threshold is configurable:
optimization:
nlsq:
memory_fraction: 0.75 # Use streaming when data exceeds 75% of RAM
# memory_threshold_gb: 48 # Or set an explicit GB limit
How the threshold is computed:
from homodyne.optimization.nlsq import (
detect_total_system_memory,
get_adaptive_memory_threshold,
select_nlsq_strategy,
)
total_ram = detect_total_system_memory() # GB
threshold = get_adaptive_memory_threshold(memory_fraction=0.75)
print(f"Total RAM: {total_ram:.1f} GB")
print(f"Memory threshold: {threshold:.1f} GB")
# Check which strategy will be selected
from homodyne.optimization.nlsq import estimate_peak_memory_gb
n_points = 5_000_000
n_params = 9
peak_mem = estimate_peak_memory_gb(n_points, n_params)
decision = select_nlsq_strategy(n_points, n_params)
print(f"Peak memory estimate: {peak_mem:.1f} GB")
print(f"Selected strategy: {decision.strategy}")
—
Stratified LS (Full-Batch)¶
For datasets below the memory threshold, homodyne uses angle-stratified chunking: data is divided into chunks by azimuthal angle, ensuring each chunk contains a balanced representation of all angles. The full Jacobian is computed in memory.
This strategy gives the best numerical precision because all data is used simultaneously in the least-squares solve.
—
Hybrid Streaming¶
For datasets above the memory threshold, homodyne uses adaptive hybrid streaming:
Data is divided into blocks that fit in memory
Each block’s gradient and Hessian contributions are accumulated
The accumulated Gauss-Newton system is solved without storing the full Jacobian
This reduces peak memory from O(n × p) to O(block_size × p).
Note
Streaming mode is mathematically equivalent to full-batch mode for the Gauss-Newton update (no approximation). Only the memory access pattern changes.
Streaming is activated automatically. You do not need to set it manually. The log will show:
[INFO] Strategy: hybrid_streaming (n=15M pts, est. peak=32.4 GB > threshold 28.0 GB)
[INFO] Streaming block size: 500K points
—
Performance Tips for Large Datasets¶
1. Set the memory fraction conservatively
On shared HPC nodes where other jobs may be running:
optimization:
nlsq:
memory_fraction: 0.60 # Leave 40% for OS and other processes
2. Use per_angle_mode: “auto” or “constant”
individual mode with many angles increases the Jacobian column count
(memory scales as O(n × p)):
optimization:
nlsq:
anti_degeneracy:
per_angle_mode: "auto" # +2 params, not +2*n_phi
3. Reduce the angle count if possible
If your dataset has many nearly-identical angles, subsetting to a representative set reduces memory and computation time while preserving physical information.
4. Enable XLA compilation for the streaming path
Large datasets benefit from XLA’s ability to fuse memory operations:
homodyne-config-xla --mode cmc-hpc
# Reload shell to apply the new XLA_FLAGS
See Performance Tuning: CPU/NUMA Optimization for XLA configuration.
—
Monitoring Memory Usage¶
import psutil
import os
def get_memory_usage_gb():
process = psutil.Process(os.getpid())
return process.memory_info().rss / (1024**3)
print(f"Before fit: {get_memory_usage_gb():.2f} GB")
result = fit_nlsq_jax(data, config)
print(f"After fit: {get_memory_usage_gb():.2f} GB")
—
CMC Memory Considerations¶
CMC uses a different memory model: data is divided into shards, each of which is processed independently by a worker process.
Each worker holds one shard in memory. The total memory footprint is:
Total memory = n_workers × (shard_size × n_params × 8 bytes) + overhead
For max_points_per_shard: "auto" with 5K points and 4 workers:
4 workers × 5,000 × 9 × 8 bytes = 1.4 MB (negligible)
The shard result storage (combining posteriors) is the main memory cost:
n_shards × n_chains × n_samples × n_params × 8 bytes
Example: 1000 shards × 4 × 1500 × 9 × 8 bytes ≈ 430 MB
See Bayesian Inference with CMC for CMC-specific memory guidance.
—
See Also¶
Performance Tuning: CPU/NUMA Optimization — CPU/NUMA optimization
Bayesian Inference with CMC — CMC memory model
Troubleshooting Guide — Memory error troubleshooting