homodyne.data

The homodyne.data package provides data ingestion for XPCS experiments. XPCSDataLoader supports both legacy APS and modern APS-U HDF5 file formats and produces JAX-compatible arrays ready for optimisation.


XPCSDataLoader

XPCSDataLoader is the single entry point for loading XPCS correlation data. It handles:

  • Auto-detection of APS vs APS-U HDF5 format

  • Half-matrix reconstruction for correlation matrices

  • Mandatory diagonal correction applied post-load

  • Smart NPZ caching to avoid reloading large HDF5 files

  • Optional physics-based quality validation

  • JAX array output with NumPy fallback when JAX is unavailable

class homodyne.data.xpcs_loader.XPCSDataLoader[source]

Bases: object

Enhanced XPCS data loader for Homodyne.

Supports both APS (old) and APS-U (new) formats with YAML-first configuration, intelligent caching, and JAX integration.

Features: - YAML-first configuration with JSON support - Auto-detection of HDF5 format (APS vs APS-U) - Smart NPZ caching with compression - Half-matrix reconstruction for correlation matrices - Mandatory diagonal correction applied consistently - JAX array output when available - Integration with v2 physics validation

__init__(config_path=None, config_dict=None, configure_logging=True, generate_quality_reports=False)[source]

Initialize XPCS data loader with YAML-first configuration.

Parameters:
  • config_path (str | None) – Path to YAML or JSON configuration file

  • config_dict (dict | None) – Configuration dictionary (alternative to config_path)

  • configure_logging (bool) – Whether to apply logging configuration from config

  • generate_quality_reports (bool) – Whether to generate quality reports (default: False)

Raises:
  • XPCSDependencyError – If required dependencies are not available

  • XPCSConfigurationError – If configuration is invalid

load_experimental_data()[source]

Load experimental data with priority: cache NPZ → raw HDF → error.

Returns:

  • wavevector_q_list: Array of q values

  • phi_angles_list: Array of phi angles

  • t1: Time array for first dimension

  • t2: Time array for second dimension

  • c2_exp: Experimental correlation data

Return type:

dict[str, Any]


HDF5 Format Requirements

Homodyne supports two HDF5 layouts:

APS old format (legacy)

/exchange/
  correlation/          # C2 matrix: (n_phi, n_t1, n_t2)
  lag_steps/            # time lag indices
/measurement/
  sample/
    q_value             # scalar
    phi_values          # (n_phi,)

APS-U new format (APS Upgrade, current)

/xpcs/
  g2/                   # C2 data: (n_phi, n_delay)
  delay_frames/         # frame delay values
  q_values/             # (n_phi,)
  phi_values/           # (n_phi,)
  dt                    # frame time step (seconds)

Note

Homodyne detects the format automatically. If your file uses a non-standard layout, pass format_hint="aps" or format_hint="apsu" to the constructor to skip auto-detection.


NPZ Caching

Loading large HDF5 files repeatedly is slow. XPCSDataLoader caches the preprocessed arrays as compressed NPZ files alongside the HDF5 file. On subsequent loads, if the cache is valid (same file mtime), the NPZ is loaded directly — typically 10–100× faster.

Set use_cache=False in the YAML config to disable caching:

data:
  use_cache: false
  cache_dir: null    # defaults to same directory as HDF5 file

Note

Caches are loaded with allow_pickle=False (since v2.23.2). Cache metadata is stored as a JSON-encoded scalar (cache_metadata_json) and parsed with json.loads() rather than unpickled, so a cache file at a config-controlled path cannot trigger arbitrary object deserialization. Legacy caches that used the older cache_metadata object-array format are rejected with a clear error — delete the stale .npz and it regenerates on the next load.


Data Validation

Optional physics-based validation checks:

Check

Description

Shape consistency

Verifies C2 matrix dimensions against phi and time axes

NaN / Inf detection

Raises ValueError if non-finite values are present

Monotonicity

Verifies lag time array is strictly increasing

Value bounds

Checks C2 values fall in physically reasonable range

Enable strict validation via:

data:
  validate: true
  strict_bounds: true

Output Data Structure

XPCSDataLoader.load_experimental_data() returns a dictionary with the following keys:

Key

Shape

Description

c2

(N,)

Flattened C2 correlation values

t1

(N,)

First time indices (absolute, seconds)

t2

(N,)

Second time indices (absolute, seconds)

phi

(N,)

Scattering angle per data point (degrees)

q

scalar

Scattering wavevector magnitude (Å-1)

L

scalar

Gap / characteristic length (Å)

dt

scalar

Frame time step (seconds)

n_phi

scalar

Number of azimuthal angles


Usage Examples

From a YAML config file

from homodyne.data.xpcs_loader import XPCSDataLoader

loader = XPCSDataLoader(config_path="my_config.yaml")
data = loader.load_experimental_data()

print(f"Data points:  {len(data['c2'])}")
print(f"Phi angles:   {data['n_phi']}")
print(f"q:            {data['q']:.4g} Angstrom^-1")
print(f"Time step dt: {data['dt']:.4g} s")

From a ConfigManager

from homodyne.config.manager import ConfigManager
from homodyne.data.xpcs_loader import XPCSDataLoader

config_manager = ConfigManager("config.yaml")
loader = XPCSDataLoader(config_dict=config_manager.config)
data = loader.load_experimental_data()

Using the convenience function

from homodyne.data import load_xpcs_data

data = load_xpcs_data(config_path="my_config.yaml")

Supplementary Modules

XPCS Data Loader for Homodyne

Enhanced XPCS data loader supporting both APS (old) and APS-U (new) HDF5 formats with YAML-first configuration system, JAX compatibility, and modern architecture integration.

This module provides: - YAML-first configuration with JSON support - Smart NPZ caching to avoid reloading large HDF5 files - Auto-detection of APS vs APS-U format - Half-matrix reconstruction for correlation matrices - Mandatory diagonal correction applied post-load - JAX array output with numpy fallback - Integration with v2 logging and physics validation

Key Features: - Format Support: APS old format and APS-U new format - Configuration: YAML primary, JSON via converter - Caching: Intelligent NPZ caching with compression - Output: JAX arrays when available, numpy fallback - Validation: Optional physics-based data quality checks

exception homodyne.data.xpcs_loader.XPCSDataFormatError[source]

Bases: Exception

Raised when XPCS data format is not recognized or invalid.

exception homodyne.data.xpcs_loader.XPCSDependencyError[source]

Bases: Exception

Raised when required dependencies are not available.

exception homodyne.data.xpcs_loader.XPCSConfigurationError[source]

Bases: Exception

Raised when configuration is invalid or missing required parameters.

homodyne.data.xpcs_loader.load_xpcs_config(config_path)[source]

Load XPCS configuration from YAML or JSON file.

Primary format: YAML JSON support: Automatically converted to YAML format

Parameters:

config_path (str | Path) – Path to YAML or JSON configuration file

Return type:

dict[str, Any]

Returns:

Configuration dictionary with YAML-style structure

Raises:

XPCSConfigurationError – If configuration format is unsupported or invalid

homodyne.data.xpcs_loader.load_xpcs_data(config_path=None, config_dict=None)[source]

Convenience function to load XPCS data from configuration file or dict.

Supports both YAML and JSON configuration files with auto-detection, or direct configuration dictionary for programmatic use (backward compatible).

Parameters:
  • config_path (str | dict | None) – Path to YAML/JSON config file, OR dict for backward compatibility

  • config_dict (dict | None) – Configuration dictionary (alternative to config_path)

Return type:

dict[str, Any]

Returns:

Dictionary containing loaded experimental data with JAX arrays when available

Example

>>> # From config file
>>> data = load_xpcs_data(config_path="xpcs_config.yaml")
>>> print(data.keys())
dict_keys(['wavevector_q_list', 'phi_angles_list', 't1', 't2', 'c2_exp'])
>>> # From dict (backward compatible - positional)
>>> config = {"data_file": "experiment.h5", "analysis_mode": "static_isotropic"}
>>> data = load_xpcs_data(config)
>>> # From dict (keyword argument)
>>> data = load_xpcs_data(config_dict=config)