homodyne.optimization.cmc¶

The cmc sub-package implements Consensus Monte Carlo (CMC) — a divide-and-conquer Bayesian inference strategy for large-scale XPCS datasets. The dataset is partitioned into independent shards, each processed with per-shard NUTS/MCMC (via NumPyro), and the posterior distributions are combined using precision-weighted Gaussian moments.

Note

CMC is a secondary method. Always run NLSQ first and pass the result as a warm-start. This reduces divergence rates from ~28 % to under 5 %. See homodyne.optimization for the recommended two-step workflow.

fit_mcmc_jax¶

The primary entry point for CMC analysis.

homodyne.optimization.cmc.core.fit_mcmc_jax(data, t1, t2, phi, q, L, analysis_mode, method='mcmc', cmc_config=None, initial_values=None, parameter_space=None, dt=None, output_dir=None, progress_bar=True, run_id=None, nlsq_result=None, **kwargs)[source]

Run CMC (Consensus Monte Carlo) analysis on XPCS data.

This function signature matches the CLI call in cli/commands.py:1201.

Parameters:

data (ndarray) – Pooled C2 correlation data, shape (n_total,).
t1 (ndarray) – Pooled time coordinates t1, shape (n_total,).
t2 (ndarray) – Pooled time coordinates t2, shape (n_total,).
phi (ndarray) – Pooled phi angles, shape (n_total,).
q (float) – Wavevector magnitude.
L (float) – Stator-rotor gap length (nm).
analysis_mode (str) – Analysis mode: “static” or “laminar_flow”.
method (str) – Method identifier (always “mcmc” for CMC).
cmc_config (dict[str, Any] | None) – CMC configuration from ConfigManager.get_cmc_config().
initial_values (dict[str, float] | None) – Initial parameter values from ConfigManager.get_initial_parameters().
parameter_space (ParameterSpace | None) – Parameter space with bounds and priors from ParameterSpace.from_config().
dt (float | None) – Time step for physics model. If None, inferred from pooled time arrays.
output_dir (Path | str | None) – Output directory for saving results.
progress_bar (bool) – Whether to show progress bar during sampling.
run_id (str | None) – Optional identifier used to correlate logs across shards/backends.
nlsq_result (dict | None) – Optional NLSQ result dictionary for warm-start priors. When provided, builds informative priors centered on NLSQ estimates, improving convergence speed and reducing divergences. Should contain parameter values and optionally uncertainties (see extract_nlsq_values_for_cmc).
**kwargs – Additional keyword arguments (for compatibility).

Returns:

Complete result with posterior samples and diagnostics.

Return type:

CMCResult

Raises:

ValueError – If data validation fails.
RuntimeError – If MCMC sampling fails.

Examples

>>> from homodyne.optimization.cmc import fit_mcmc_jax
>>> result = fit_mcmc_jax(
...     data=c2_pooled,
...     t1=t1_pooled,
...     t2=t2_pooled,
...     phi=phi_pooled,
...     q=0.01,
...     L=2000000.0,
...     analysis_mode="laminar_flow",
...     method="mcmc",
...     cmc_config=config.get_cmc_config(),
...     initial_values=config.get_initial_parameters(),
...     parameter_space=parameter_space,
... )
>>> print(result.convergence_status)
converged

Shard Size Selection¶

NUTS is \(O(n)\) per leapfrog step — it evaluates all points in a shard. Never use shard sizes above 100 K for any mode. The "auto" setting is strongly recommended.

Warning

Do not set max_points_per_shard above 100 K. Extremely large shards will cause NUTS to time out or exhaust memory.

Note

Single-shard hard limit: If a dataset exceeds 100 K points and would run as a single shard (non-CMC path), homodyne automatically falls back to random CMC sharding to prevent NUTS from running \(O(n)\) leapfrog on the full dataset. This also applies when num_shards=1 is forced.

Dataset size	laminar_flow auto	static auto	Notes
< 2 M pts	3–5 K	5–6 K
2–5 M pts	3–5 K	5–6 K
5–50 M pts	5 K	6–10 K
50–100 M pts	5–8 K	9–15 K
100 M+ pts	8–10 K	12–20 K	HPC only

CMCConfig¶

class homodyne.optimization.cmc.config.CMCConfig[source]

Bases: object

Configuration for Consensus Monte Carlo (CMC) analysis.

enable

Whether to enable CMC. “auto” enables based on data size.

Type:: bool | str

min_points_for_cmc

Minimum data points to trigger CMC mode.

Type:: int

sharding_strategy

How to partition data: “stratified”, “random”, “contiguous”.

Type:: str

num_shards

Number of data shards. “auto” calculates from data size.

Type:: int | str

max_points_per_shard

Maximum points per shard. “auto” calculates optimally based on dataset size, analysis mode, and angle count (see _resolve_max_points_per_shard). Default: “auto”. Typical auto values: 5–20K for laminar_flow, 10–20K for static (scales with dataset size).

Type:: int | str

backend_name

Execution backend: “auto”, “multiprocessing”, “pjit”, “pbs”.

Type:: str

enable_checkpoints

Whether to save checkpoints during sampling.

Type:: bool

checkpoint_dir

Directory for checkpoint files.

Type:: str

num_warmup

Number of warmup/burn-in samples per chain.

Type:: int

num_samples

Number of posterior samples per chain.

Type:: int

num_chains

Number of MCMC chains.

Type:: int

chain_method

MCMC chain execution method. "parallel" (default) runs chains concurrently via JAX vectorization. "sequential" runs chains one at a time. Parallel is faster on multi-core CPUs but adds ~5-15% overhead on very small shards (<500 points); the sampler auto-falls-back to sequential in that case.

Type:: str

target_accept_prob

Target acceptance probability for NUTS.

Type:: float

dense_mass

Use dense mass matrix for NUTS. When True, learns parameter correlations for more efficient sampling. Default: True.

Type:: bool

max_r_hat

Maximum R-hat for convergence.

Type:: float

min_ess

Minimum effective sample size.

Type:: float

combination_method

How to combine shard posteriors. Options:

"consensus_mc": Correct Consensus Monte Carlo (precision-weighted means). Recommended. Combines per-shard posterior moments, then generates new samples from the combined Gaussian.
"weighted_gaussian": Legacy element-wise weighted averaging (deprecated).
"simple_average": Simple element-wise averaging (deprecated).

Type:: str

min_success_rate

Minimum fraction of shards that must succeed.

Type:: float

run_id

Optional identifier used for structured logging across shards.

Type:: str | None

per_angle_mode

Per-angle scaling mode for anti-degeneracy defense (v2.18.0+):

"auto": Auto-selects based on n_phi threshold (recommended). When n_phi >= threshold: Estimates per-angle values, AVERAGES them, broadcasts single value to all angles (matches NLSQ behavior). When n_phi < threshold: Uses individual mode.
"constant": Per-angle contrast/offset from quantile estimation, used DIRECTLY (different fixed value per angle, NOT averaged). Reduces to 8 params (7 physical + 1 sigma).
"individual": Independent contrast + offset per angle, all sampled. May suffer from parameter absorption degeneracy with many angles.

Type:: str

constant_scaling_threshold

n_phi threshold for auto mode’s per-angle strategy. When n_phi >= threshold, auto mode samples averaged contrast/offset (single value broadcast to all angles). When n_phi < threshold, auto mode falls back to individual per-angle sampling. Default: 3.

Type:: int

enable: bool | str = 'auto'

min_points_for_cmc: int = 100000

per_angle_mode: str = 'auto'

constant_scaling_threshold: int = 3

sharding_strategy: str = 'random'

num_shards: int | str = 'auto'

max_points_per_shard: int | str = 'auto'

backend_name: str = 'auto'

enable_checkpoints: bool = True

checkpoint_dir: str = './checkpoints/cmc'

num_warmup: int = 500

num_samples: int = 1500

num_chains: int = 4

chain_method: str = 'parallel'

target_accept_prob: float = 0.85

dense_mass: bool = True

adaptive_sampling: bool = True

max_tree_depth: int = 10

min_warmup: int = 100

min_samples: int = 200

enable_jax_profiling: bool = False

jax_profile_dir: str = './profiles/jax'

max_r_hat: float = 1.1

min_ess: float = 400.0

max_divergence_rate: float = 0.1

combination_method: str = 'robust_consensus_mc'

min_success_rate: float = 0.9

run_id: str | None = None

per_shard_timeout: int = 3600

heartbeat_timeout: int = 600

min_success_rate_warning: float = 0.8

require_nlsq_warmstart: bool = False

use_nlsq_informed_priors: bool = True

nlsq_prior_width_factor: float = 2.0

prior_tempering: bool = True

max_parameter_cv: float = 1.0

heterogeneity_abort: bool = True

min_points_per_shard: int = 10000

min_points_per_param: int = 1500

reparameterization_d_total: bool = True

reparameterization_log_gamma: bool = True

bimodal_min_weight: float = 0.2

bimodal_min_separation: float = 0.5

seed: int = 42

classmethod from_dict(config_dict)[source]

Create CMCConfig from configuration dictionary.

Parameters:: config_dict (dict[str, Any]) – CMC configuration dictionary from ConfigManager.get_cmc_config().
Returns:: Validated configuration object.
Return type:: CMCConfig
Raises:: ValueError – If required fields are missing or invalid.

validate()[source]

Validate configuration values.

Returns:: List of validation error messages (empty if valid).
Return type:: list[str]

is_valid()[source]

Check if configuration is valid.

Returns:: True if configuration has no validation errors.
Return type:: bool

should_enable_cmc(n_points, analysis_mode=None)[source]

Determine if CMC should be enabled for given data size.

Parameters:

n_points (int) – Number of data points.
analysis_mode (str | None) – Deprecated — ignored. Kept for backward compatibility.

Returns:

True if CMC should be enabled.

Return type:

bool

Notes

Threshold is min_points_for_cmc (default 100,000) for all modes.

get_num_shards(n_points, n_phi, n_params=7)[source]

Calculate number of shards with param-aware sizing.

Parameters:

n_points (int) – Total number of data points.
n_phi (int) – Number of phi angles.
n_params (int) – Number of model parameters (default: 7 for static).

Returns:

Number of shards to use.

Return type:

int

get_adaptive_sample_counts(shard_size, n_params=7)[source]

Calculate adaptive warmup/samples based on shard size.

Small datasets benefit from fewer NUTS samples because: 1. JIT compilation overhead is amortized over fewer samples 2. Step size adaptation converges faster with simple likelihoods 3. Mass matrix estimation requires fewer warmup iterations

Profiling showed 1310s for 50 points with 500 warmup + 1500 samples. Adaptive scaling reduces this by 60-80% while maintaining statistical validity (ESS targets are reduced proportionally).

Parameters:

shard_size (int) – Number of data points in the shard.
n_params (int) – Number of model parameters (affects minimum samples).

Returns:

(num_warmup, num_samples) adjusted for shard size.

Return type:

tuple[int, int]

get_effective_per_angle_mode(n_phi, nlsq_per_angle_mode=None, has_nlsq_warmstart=False)[source]

Determine effective per-angle mode based on configuration and data.

Parameters:

n_phi (int) – Number of phi angles in the dataset.
nlsq_per_angle_mode (str | None) – Optional per-angle mode from NLSQ result. When provided (from warm-start), CMC will use this mode to ensure parameterization parity with NLSQ. This prevents CMC vs NLSQ divergence from different model structures.
has_nlsq_warmstart (bool) – Whether an NLSQ warm-start result is available. When True and both CMC and NLSQ use “auto” mode, upgrades to “constant_averaged” for fewer sampled parameters and better stability.

Returns:

Effective mode: “auto”, “constant”, “constant_averaged”, or “individual”.

Return type:

str

Notes

Mode semantics (same as NLSQ):

auto: Sample single averaged contrast/offset (10 params for laminar_flow). Only activated when n_phi >= threshold (many angles).
constant: Use FIXED per-angle values from quantile estimation (8 params).
constant_averaged: Use FIXED averaged scaling for NLSQ parity.
individual: Sample per-angle contrast/offset (n_phi*2 + 7 + 1 params).

Priority: nlsq_per_angle_mode > explicit config > auto-selection

When NLSQ warm-start is present and both sides use “auto”, upgrades to “constant_averaged” to fix scaling values and reduce parameter count. This prevents contrast/offset sampling from absorbing physical parameter signal, which was the root cause of heterogeneous shard posteriors.

to_dict()[source]

Convert configuration to dictionary.

Returns:: Configuration as dictionary.
Return type:: dict[str, Any]

__init__(enable='auto', min_points_for_cmc=100000, per_angle_mode='auto', constant_scaling_threshold=3, sharding_strategy='random', num_shards='auto', max_points_per_shard='auto', backend_name='auto', enable_checkpoints=True, checkpoint_dir='./checkpoints/cmc', num_warmup=500, num_samples=1500, num_chains=4, chain_method='parallel', target_accept_prob=0.85, dense_mass=True, adaptive_sampling=True, max_tree_depth=10, min_warmup=100, min_samples=200, enable_jax_profiling=False, jax_profile_dir='./profiles/jax', max_r_hat=1.1, min_ess=400.0, max_divergence_rate=0.1, combination_method='robust_consensus_mc', min_success_rate=0.9, run_id=None, per_shard_timeout=3600, heartbeat_timeout=600, min_success_rate_warning=0.8, require_nlsq_warmstart=False, use_nlsq_informed_priors=True, nlsq_prior_width_factor=2.0, prior_tempering=True, max_parameter_cv=1.0, heterogeneity_abort=True, min_points_per_shard=10000, min_points_per_param=1500, reparameterization_d_total=True, reparameterization_log_gamma=True, bimodal_min_weight=0.2, bimodal_min_separation=0.5, seed=42, _validation_errors=<factory>)

CMCResult¶

class homodyne.optimization.cmc.results.CMCResult[source]

Bases: object

CMC analysis result with posterior samples and diagnostics.

This dataclass is compatible with save_mcmc_results() in cli/commands.py.

parameters

Posterior mean values, shape (n_params,).

Type:: np.ndarray

uncertainties

Posterior standard deviations, shape (n_params,).

Type:: np.ndarray

param_names

Parameter names in sampling order.

Type:: list[str]

samples

Raw samples, {name: (n_chains, n_samples)}.

Type:: dict[str, np.ndarray]

convergence_status

“converged” | “divergences” | “not_converged”.

Type:: str

r_hat

Per-parameter R-hat values.

Type:: dict[str, float]

ess_bulk

Per-parameter bulk ESS.

Type:: dict[str, float]

ess_tail

Per-parameter tail ESS.

Type:: dict[str, float]

divergences

Total number of divergent transitions.

Type:: int

inference_data

ArviZ InferenceData for plotting.

Type:: az.InferenceData

execution_time

Total sampling time in seconds.

Type:: float

warmup_time

Warmup time in seconds.

Type:: float

n_chains

Number of MCMC chains.

Type:: int

n_samples

Samples per chain.

Type:: int

n_warmup

Warmup samples.

Type:: int

analysis_mode

Analysis mode used.

Type:: str

covariance

Parameter covariance matrix (from samples).

Type:: np.ndarray

chi_squared

Placeholder for compatibility (not directly computed in MCMC).

Type:: float

reduced_chi_squared

Placeholder for compatibility.

Type:: float

device_info

Device used for computation.

Type:: dict[str, Any]

parameters: ndarray

uncertainties: ndarray

param_names: list[str]

samples: dict[str, ndarray]

convergence_status: str

r_hat: dict[str, float]

ess_bulk: dict[str, float]

ess_tail: dict[str, float]

divergences: int

inference_data: arviz.InferenceData

execution_time: float

warmup_time: float

n_chains: int = 4

n_samples: int = 2000

n_warmup: int = 500

analysis_mode: str = 'static'

per_angle_mode: str = 'auto'

num_shards: int = 1

covariance: ndarray

chi_squared: float = 0.0

reduced_chi_squared: float = 0.0

device_info: dict[str, Any]

recovery_actions: list[str]

quality_flag: str = 'good'

mean_params: ParameterStats

std_params: ParameterStats

mean_contrast: float | None = None

std_contrast: float | None = None

mean_offset: float | None = None

std_offset: float | None = None

is_cmc_result()[source]

Return True - required by CLI for diagnostic generation.

Return type:: bool

property success: bool: Return True if sampling converged (backward compatibility).

property message: str: Return descriptive message about result.

classmethod from_mcmc_samples(mcmc_samples, stats, analysis_mode, n_warmup=500, min_ess=None)[source]

Create CMCResult from MCMC samples.

Parameters:

mcmc_samples (MCMCSamples) – Raw MCMC samples.
stats (SamplingStats) – Sampling statistics.
analysis_mode (str) – Analysis mode used.
n_warmup (int) – Number of warmup samples.
min_ess (float | None) – Minimum effective sample size for convergence checks. If None, uses DEFAULT_MIN_ESS from diagnostics module.

Returns:

Complete result object.

Return type:

CMCResult

get_posterior_stats()[source]

Get posterior statistics for each parameter.

Returns:: Statistics per parameter: mean, std, median, hdi_5%, hdi_95%.
Return type:: dict[str, dict[str, float]]

get_samples_array()[source]

Get samples as 3D array.

Returns:: Shape (n_chains, n_samples, n_params).
Return type:: ndarray

validate_parameters(n_phi=None)[source]

Validate that result contains expected parameters.

Parameters:: n_phi (int | None) – Number of phi angles expected. If None, infers from samples.
Returns:: List of validation warnings (empty if all valid).
Return type:: list[str]

__init__(parameters, uncertainties, param_names, samples, convergence_status, r_hat, ess_bulk, ess_tail, divergences, inference_data, execution_time, warmup_time, n_chains=4, n_samples=2000, n_warmup=500, analysis_mode='static', per_angle_mode='auto', num_shards=1, covariance=<factory>, chi_squared=0.0, reduced_chi_squared=0.0, device_info=<factory>, recovery_actions=<factory>, quality_flag='good', mean_params=<factory>, std_params=<factory>, mean_contrast=None, std_contrast=None, mean_offset=None, std_offset=None)

Note

ArviZ field mapping: NumPyro stores NUTS energy as potential_energy but ArviZ plot_energy() expects energy. The CMCResult.inference_data attribute automatically maps potential_energy → energy and replaces dots in extra_fields keys (e.g., adapt_state.step_size → adapt_state_step_size) for xarray compatibility.

Per-Angle Mode¶

The per_angle_mode parameter controls how contrast and offset are handled across azimuthal angles. This must match the setting used in NLSQ to ensure the warm-start is compatible.

Mode	Behaviour	Parameters (laminar_flow, 23 angles)
`"auto"`	n_phi ≥ threshold → averaged scaling (9), n_phi < threshold → individual	9 (7 physical + 2 optimised avg scaling)
`"constant"`	Fixed per-angle scaling from quantile estimation, not optimised	7 (physical only)
`"individual"`	Independent contrast + offset per angle, all sampled	53 (7 + 46)

Tip

Always use per_angle_mode: "auto" (the default). It prevents parameter absorption degeneracy while keeping the parameter count manageable.

YAML Configuration Reference¶

optimization:
  cmc:
    enable: true
    sharding:
      strategy: "random"               # random | stratified | contiguous
      max_points_per_shard: "auto"     # ALWAYS use auto
      min_points_per_param: 1500       # Minimum data points per parameter
    backend_config:
      name: "auto"                     # auto | multiprocessing | pjit | pbs
    per_angle_mode: "auto"             # Match NLSQ per_angle_mode
    combination:
      method: "robust_consensus_mc"    # Default: MAD-based outlier detection
      min_success_rate: 0.90
    per_shard_mcmc:
      num_warmup: 500
      num_samples: 1500
      num_chains: 4
      chain_method: "parallel"         # parallel | vectorized | sequential
      target_accept_prob: 0.85
      max_tree_depth: 10
      adaptive_sampling: true
    validation:
      max_divergence_rate: 0.10
      max_r_hat: 1.1
      min_ess: 400

Quality Filtering¶

CMC includes automatic quality filtering to prevent corrupted posteriors from being included in the consensus combination:

Setting	Default	Purpose
`max_divergence_rate`	0.10	Exclude shards with >10 % divergences
`require_nlsq_warmstart`	`false`	Require NLSQ warm-start for `laminar_flow`

Usage Examples¶

Standard NLSQ → CMC workflow¶

from homodyne.optimization.nlsq import fit_nlsq_jax
from homodyne.optimization.cmc import fit_mcmc_jax
from homodyne.config.manager import ConfigManager
from homodyne.data.xpcs_loader import XPCSDataLoader

config_manager = ConfigManager("config.yaml")
data = XPCSDataLoader(config_dict=config_manager.config).load_experimental_data()

# Step 1 — NLSQ warm-start
nlsq_result = fit_nlsq_jax(data, config_manager.config)

# Step 2 — CMC with warm-start
cmc_result = fit_mcmc_jax(
    data,
    config_manager.config,
    nlsq_result=nlsq_result,
)

print("Posterior means:", cmc_result.parameters)
print("Posterior stds:", cmc_result.uncertainties)

CMC without warm-start (exploratory)¶

from homodyne.optimization.cmc import fit_mcmc_jax

cmc_result = fit_mcmc_jax(data, config)   # uses prior-median initialisation

Accessing diagnostics¶

for param, rhat in cmc_result.r_hat.items():
    print(f"{param}: R-hat = {rhat:.3f}")

print(f"Divergences: {cmc_result.divergences}")
print(f"ESS (bulk, min): {min(cmc_result.ess_bulk.values()):.0f}")

CMC core module - main entry point.

This module provides the fit_mcmc_jax() function that serves as the main entry point for CMC analysis, matching the CLI signature.

homodyne.optimization.cmc.core.run_cmc_analysis(data, t1, t2, phi, q, L, analysis_mode, config, parameter_space, initial_values=None, dt=None)[source]

Simplified interface for CMC analysis.

This is a convenience wrapper around fit_mcmc_jax() that takes a CMCConfig object directly instead of a dict.

Parameters:

data (ndarray) – Data arrays.
t1 (ndarray) – Data arrays.
t2 (ndarray) – Data arrays.
phi (ndarray) – Data arrays.
q (float) – Physics parameters.
L (float) – Physics parameters.
analysis_mode (str) – Analysis mode.
config (CMCConfig) – CMC configuration object.
parameter_space (ParameterSpace) – Parameter space.
initial_values (dict[str, float] | None) – Initial values.
dt (float | None) – Time step (None infers from pooled time arrays).

Returns:

Analysis result.

Return type:

CMCResult