Loading XPCS Data¶

Learning Objectives

By the end of this section you will understand:

The HDF5 file formats supported by homodyne (APS legacy, APS-U)
How to use XPCSDataLoader and load_xpcs_data
What the loaded data dictionary contains
How to validate loaded data
Common data issues and how to fix them

—

Overview¶

Homodyne loads XPCS data from HDF5 files using XPCSDataLoader. The loader handles two HDF5 formats used at the Advanced Photon Source:

APS legacy format: older beamline data format
APS-U new format: new format from the APS Upgrade (APS-U)

The loader validates array shapes, checks for NaN/Inf values, and returns data in a standardized dictionary format that is accepted by fit_nlsq_jax and fit_mcmc_jax.

—

HDF5 File Requirements¶

Your HDF5 file must contain the following datasets:

Dataset path	Shape	Description
`/exchange/data`	(n_phi, n_t1, n_t2)	Two-time correlation \(C_2\) array
`/exchange/t`	(n_t,)	Time axis (same for t1 and t2)
`/exchange/phi`	(n_phi,)	Azimuthal angles in degrees
`/exchange/q`	scalar or (n_q,)	Scattering vector magnitude (Å⁻¹)

Note

The /exchange/data path is configurable via dataset_path in your YAML config. Some beamlines use /xpcs/g2 or similar paths.

Note

The \(C_2\) array should already be computed (not raw intensity frames). Homodyne does not compute \(C_2\) from frames; use beamline-specific reduction software (e.g., pyXPCS, xi-cam) for that step.

—

Basic Usage¶

Approach 1: Convenience function (simplest)

from homodyne.data import load_xpcs_data

# Load data from a YAML-configured experiment
data = load_xpcs_data("config.yaml")

# Inspect the loaded data
print(data.keys())
# dict_keys(['wavevector_q_list', 'phi_angles_list', 't1', 't2',
#            'c2_exp', 'sigma', 'L', 'dt'])

print(f"q values: {data['wavevector_q_list']}")
print(f"phi angles: {data['phi_angles_list']}")
print(f"C2 shape: {data['c2_exp'].shape}")  # (n_phi, n_t1, n_t2)

Approach 2: XPCSDataLoader class (full control)

from homodyne.data import XPCSDataLoader
from homodyne.config import ConfigManager

# Load configuration
config = ConfigManager("config.yaml")

# Create loader and load data
loader = XPCSDataLoader(config_path="config.yaml")
data = loader.load_experimental_data()

# Validate data quality
from homodyne.data import validate_xpcs_data, DataQualityReport
report: DataQualityReport = validate_xpcs_data(data)
if not report.is_valid:
    print(f"Data issues: {report.issues}")

—

Loaded Data Dictionary¶

The data dictionary returned by the loader has these keys:

Key	Shape / Type	Description
`c2_exp`	(n_phi, n_t1, n_t2)	Experimental two-time correlation matrix
`t1`	(n_t1,)	First time axis (absolute times, seconds)
`t2`	(n_t2,)	Second time axis (absolute times, seconds)
`phi_angles_list`	(n_phi,)	Azimuthal angles (degrees)
`wavevector_q_list`	(n_q,) or scalar	Scattering vector magnitudes (Å⁻¹)
`sigma`	(n_phi, n_t1, n_t2)	Uncertainty array (default: 0.01 × ones_like(c2_exp))
`L`	float	Gap distance in Å (for laminar_flow mode)
`dt`	float	Time step between frames (seconds)

Tip

fit_nlsq_jax also accepts the dictionary with keys phi, g2, t1, t2, q (direct format). The loader output uses the CLI format (phi_angles_list, c2_exp, wavevector_q_list). Both formats are handled automatically.

—

YAML Configuration for Data Loading¶

The data section of your YAML configures the loader:

data:
  file_path: "/path/to/data.h5"      # Path to HDF5 file
  dataset_path: "/exchange/data"      # Internal HDF5 path to C2 array
  q_value: 0.054                      # q in Å⁻¹ (scalar or path to array)
  gap_distance: 500.0                 # µm (converted to Å internally)
  dt: 0.1                             # Frame interval in seconds

  # Optional filters
  phi_range: [-180, 180]              # Only load angles in this range
  t_range: [0.0, 100.0]              # Only load times in this range
  max_points: null                    # null = load all (no subsampling)

Warning

max_points: null is the correct default. Never set this to a finite value unless you understand the implications. Subsampling data can introduce bias and violates the no-silent-truncation principle.

—

Multiple q-Values¶

If your HDF5 file contains data at multiple q-values, homodyne fits one q at a time. Specify which q to use in the config:

data:
  q_value: 0.054     # Use this specific q
  # OR
  q_index: 3         # Use the 4th q-value (0-indexed)

For batch processing across multiple q-values, see Batch Processing Multiple Datasets.

—

Phi Angle Filtering¶

For laminar flow experiments, you may want to restrict analysis to a subset of azimuthal angles for performance or physical reasons:

from homodyne.data import filter_phi_angles

phi_angles = data['phi_angles_list']

# Filter to ±60° around flow direction (phi_0 = 0)
indices, filtered_angles = filter_phi_angles(
    phi_angles,
    phi_center=0.0,
    phi_half_width=60.0,
)

# Apply filter to data arrays
data['c2_exp'] = data['c2_exp'][indices]
data['phi_angles_list'] = filtered_angles

—

Data Validation¶

Run explicit validation before fitting to catch common issues early:

from homodyne.data import validate_xpcs_data, DataQualityReport

report: DataQualityReport = validate_xpcs_data(data)

print(f"Valid: {report.is_valid}")
print(f"Warnings: {report.warnings}")
print(f"Issues: {report.issues}")

# Example output:
# Valid: True
# Warnings: ['C2 values exceed 2.0 at 3 points (possible outliers)']
# Issues: []

The validator checks:

Array shapes are consistent
No NaN or Inf values in c2_exp, t1, t2
Time arrays are strictly monotonically increasing
c2_exp values are in a physically reasonable range (0.5–3.0)
q and phi values are within expected ranges

—

Common Data Issues and Fixes¶

Issue: HDF5 path not found

XPCSDataFormatError: Dataset '/exchange/data' not found in HDF5 file

Fix: Check the actual path in your HDF5 file:

import h5py
with h5py.File("data.h5", "r") as f:
    f.visit(print)   # Print all dataset paths

Then update dataset_path in your YAML config.

Issue: NaN values in C2

ValueError: C2 array contains 1234 NaN values

Fix: NaN values usually appear at diagonal pixels (\(t_1 = t_2\)) or at very short lag times where the correlation is ill-defined. The loader automatically masks these, but unexpected NaN patterns indicate data quality issues in the upstream reduction step.

Issue: Non-monotonic time axis

ValueError: Time axis is not strictly monotonically increasing

Fix: This indicates a problem in the XPCS reduction pipeline. The time axis must be sorted before passing to homodyne.

Issue: Wrong C2 shape

ValueError: Expected C2 shape (n_phi, n_t1, n_t2), got (n_t1, n_t2)

Fix: Add the phi axis if there is only one angle:

import numpy as np
if data['c2_exp'].ndim == 2:
    data['c2_exp'] = data['c2_exp'][np.newaxis, ...]  # (1, n_t1, n_t2)
    data['phi_angles_list'] = np.array([0.0])

Issue: Very large dataset causing memory errors

For datasets exceeding system RAM, the NLSQ optimizer automatically activates streaming mode. See Large Dataset Handling and Streaming for details.

—

Supported File Formats¶

Format	Description
APS legacy HDF5	Older APS beamline format; `/exchange/` group
APS-U HDF5	New APS-U beamline format; different internal structure
NPZ cache	Automatically created by loader for re-loading speed

The loader auto-detects the format based on the HDF5 group structure.

—

Loading XPCS Data¶

Overview¶

HDF5 File Requirements¶

Basic Usage¶

Loaded Data Dictionary¶

YAML Configuration for Data Loading¶

Multiple q-Values¶

Phi Angle Filtering¶

Data Validation¶

Common Data Issues and Fixes¶

Supported File Formats¶

See Also¶