homodyne.optimization.cmc¶
The cmc sub-package implements Consensus Monte Carlo (CMC) — a
divide-and-conquer Bayesian inference strategy for large-scale XPCS datasets.
The dataset is partitioned into independent shards, each processed with
per-shard NUTS/MCMC (via NumPyro), and the posterior distributions are
combined using precision-weighted Gaussian moments.
Note
CMC is a secondary method. Always run NLSQ first and pass the result as a warm-start. This reduces divergence rates from ~28 % to under 5 %. See homodyne.optimization for the recommended two-step workflow.
fit_mcmc_jax¶
The primary entry point for CMC analysis.
- homodyne.optimization.cmc.core.fit_mcmc_jax(data, t1, t2, phi, q, L, analysis_mode, method='mcmc', cmc_config=None, initial_values=None, parameter_space=None, dt=None, output_dir=None, progress_bar=True, run_id=None, nlsq_result=None, **kwargs)[source]
Run CMC (Consensus Monte Carlo) analysis on XPCS data.
This function signature matches the CLI call in cli/commands.py:1201.
- Parameters:
data (
ndarray) – Pooled C2 correlation data, shape (n_total,).t1 (
ndarray) – Pooled time coordinates t1, shape (n_total,).t2 (
ndarray) – Pooled time coordinates t2, shape (n_total,).phi (
ndarray) – Pooled phi angles, shape (n_total,).q (
float) – Wavevector magnitude.L (
float) – Stator-rotor gap length (nm).analysis_mode (
str) – Analysis mode: “static” or “laminar_flow”.method (
str) – Method identifier (always “mcmc” for CMC).cmc_config (
dict[str,Any] |None) – CMC configuration from ConfigManager.get_cmc_config().initial_values (
dict[str,float] |None) – Initial parameter values from ConfigManager.get_initial_parameters().parameter_space (
ParameterSpace|None) – Parameter space with bounds and priors from ParameterSpace.from_config().dt (
float|None) – Time step for physics model. If None, inferred from pooled time arrays.output_dir (
Path|str|None) – Output directory for saving results.progress_bar (
bool) – Whether to show progress bar during sampling.run_id (
str|None) – Optional identifier used to correlate logs across shards/backends.nlsq_result (
dict|None) – Optional NLSQ result dictionary for warm-start priors. When provided, builds informative priors centered on NLSQ estimates, improving convergence speed and reducing divergences. Should contain parameter values and optionally uncertainties (see extract_nlsq_values_for_cmc).**kwargs – Additional keyword arguments (for compatibility).
- Returns:
Complete result with posterior samples and diagnostics.
- Return type:
CMCResult- Raises:
ValueError – If data validation fails.
RuntimeError – If MCMC sampling fails.
Examples
>>> from homodyne.optimization.cmc import fit_mcmc_jax >>> result = fit_mcmc_jax( ... data=c2_pooled, ... t1=t1_pooled, ... t2=t2_pooled, ... phi=phi_pooled, ... q=0.01, ... L=2000000.0, ... analysis_mode="laminar_flow", ... method="mcmc", ... cmc_config=config.get_cmc_config(), ... initial_values=config.get_initial_parameters(), ... parameter_space=parameter_space, ... ) >>> print(result.convergence_status) converged
CMCConfig¶
- class homodyne.optimization.cmc.config.CMCConfig[source]
Bases:
objectConfiguration for Consensus Monte Carlo (CMC) analysis.
- min_points_for_cmc
Minimum data points to trigger CMC mode.
- Type:
- sharding_strategy
How to partition data: “stratified”, “random”, “contiguous”.
- Type:
- max_points_per_shard
Maximum points per shard. “auto” calculates optimally based on dataset size, analysis mode, and angle count (see
_resolve_max_points_per_shard). Default: “auto”. Typical auto values: 5–20K for laminar_flow, 10–20K for static (scales with dataset size).
- backend_name
Execution backend: “auto”, “multiprocessing”, “pjit”, “pbs”.
- Type:
- enable_checkpoints
Whether to save checkpoints during sampling.
- Type:
- checkpoint_dir
Directory for checkpoint files.
- Type:
- num_warmup
Number of warmup/burn-in samples per chain.
- Type:
- num_samples
Number of posterior samples per chain.
- Type:
- num_chains
Number of MCMC chains.
- Type:
- chain_method
MCMC chain execution method.
"parallel"(default) runs chains concurrently via JAX vectorization."sequential"runs chains one at a time. Parallel is faster on multi-core CPUs but adds ~5-15% overhead on very small shards (<500 points); the sampler auto-falls-back to sequential in that case.- Type:
- target_accept_prob
Target acceptance probability for NUTS.
- Type:
- dense_mass
Use dense mass matrix for NUTS. When True, learns parameter correlations for more efficient sampling. Default: True.
- Type:
- max_r_hat
Maximum R-hat for convergence.
- Type:
- min_ess
Minimum effective sample size.
- Type:
- combination_method
How to combine shard posteriors. Options:
"consensus_mc": Correct Consensus Monte Carlo (precision-weighted means). Recommended. Combines per-shard posterior moments, then generates new samples from the combined Gaussian."weighted_gaussian": Legacy element-wise weighted averaging (deprecated)."simple_average": Simple element-wise averaging (deprecated).
- Type:
- min_success_rate
Minimum fraction of shards that must succeed.
- Type:
- run_id
Optional identifier used for structured logging across shards.
- Type:
str | None
- per_angle_mode
Per-angle scaling mode for anti-degeneracy defense (v2.18.0+):
"auto": Auto-selects based on n_phi threshold (recommended). When n_phi >= threshold: Estimates per-angle values, AVERAGES them, broadcasts single value to all angles (matches NLSQ behavior). When n_phi < threshold: Uses individual mode."constant": Per-angle contrast/offset from quantile estimation, used DIRECTLY (different fixed value per angle, NOT averaged). Reduces to 8 params (7 physical + 1 sigma)."individual": Independent contrast + offset per angle, all sampled. May suffer from parameter absorption degeneracy with many angles.
- Type:
- constant_scaling_threshold
n_phi threshold for auto mode’s per-angle strategy. When n_phi >= threshold, auto mode samples averaged contrast/offset (single value broadcast to all angles). When n_phi < threshold, auto mode falls back to individual per-angle sampling. Default: 3.
- Type:
- min_points_for_cmc: int = 100000
- per_angle_mode: str = 'auto'
- constant_scaling_threshold: int = 3
- sharding_strategy: str = 'random'
- backend_name: str = 'auto'
- enable_checkpoints: bool = True
- checkpoint_dir: str = './checkpoints/cmc'
- num_warmup: int = 500
- num_samples: int = 1500
- num_chains: int = 4
- chain_method: str = 'parallel'
- target_accept_prob: float = 0.85
- dense_mass: bool = True
- adaptive_sampling: bool = True
- max_tree_depth: int = 10
- min_warmup: int = 100
- min_samples: int = 200
- enable_jax_profiling: bool = False
- jax_profile_dir: str = './profiles/jax'
- max_r_hat: float = 1.1
- min_ess: float = 400.0
- max_divergence_rate: float = 0.1
- combination_method: str = 'robust_consensus_mc'
- min_success_rate: float = 0.9
- per_shard_timeout: int = 3600
- heartbeat_timeout: int = 600
- min_success_rate_warning: float = 0.8
- require_nlsq_warmstart: bool = False
- use_nlsq_informed_priors: bool = True
- nlsq_prior_width_factor: float = 2.0
- prior_tempering: bool = True
- max_parameter_cv: float = 1.0
- heterogeneity_abort: bool = True
- min_points_per_shard: int = 10000
- min_points_per_param: int = 1500
- reparameterization_d_total: bool = True
- reparameterization_log_gamma: bool = True
- bimodal_min_weight: float = 0.2
- bimodal_min_separation: float = 0.5
- seed: int = 42
- classmethod from_dict(config_dict)[source]
Create CMCConfig from configuration dictionary.
- Parameters:
config_dict (
dict[str,Any]) – CMC configuration dictionary from ConfigManager.get_cmc_config().- Returns:
Validated configuration object.
- Return type:
CMCConfig- Raises:
ValueError – If required fields are missing or invalid.
- validate()[source]
Validate configuration values.
- is_valid()[source]
Check if configuration is valid.
- Returns:
True if configuration has no validation errors.
- Return type:
- should_enable_cmc(n_points, analysis_mode=None)[source]
Determine if CMC should be enabled for given data size.
- Parameters:
- Returns:
True if CMC should be enabled.
- Return type:
Notes
Threshold is
min_points_for_cmc(default 100,000) for all modes.
- get_num_shards(n_points, n_phi, n_params=7)[source]
Calculate number of shards with param-aware sizing.
- get_adaptive_sample_counts(shard_size, n_params=7)[source]
Calculate adaptive warmup/samples based on shard size.
Small datasets benefit from fewer NUTS samples because: 1. JIT compilation overhead is amortized over fewer samples 2. Step size adaptation converges faster with simple likelihoods 3. Mass matrix estimation requires fewer warmup iterations
Profiling showed 1310s for 50 points with 500 warmup + 1500 samples. Adaptive scaling reduces this by 60-80% while maintaining statistical validity (ESS targets are reduced proportionally).
- get_effective_per_angle_mode(n_phi, nlsq_per_angle_mode=None, has_nlsq_warmstart=False)[source]
Determine effective per-angle mode based on configuration and data.
- Parameters:
n_phi (
int) – Number of phi angles in the dataset.nlsq_per_angle_mode (
str|None) – Optional per-angle mode from NLSQ result. When provided (from warm-start), CMC will use this mode to ensure parameterization parity with NLSQ. This prevents CMC vs NLSQ divergence from different model structures.has_nlsq_warmstart (
bool) – Whether an NLSQ warm-start result is available. When True and both CMC and NLSQ use “auto” mode, upgrades to “constant_averaged” for fewer sampled parameters and better stability.
- Returns:
Effective mode: “auto”, “constant”, “constant_averaged”, or “individual”.
- Return type:
Notes
Mode semantics (same as NLSQ):
auto: Sample single averaged contrast/offset (10 params for laminar_flow). Only activated when n_phi >= threshold (many angles).
constant: Use FIXED per-angle values from quantile estimation (8 params).
constant_averaged: Use FIXED averaged scaling for NLSQ parity.
individual: Sample per-angle contrast/offset (n_phi*2 + 7 + 1 params).
Priority: nlsq_per_angle_mode > explicit config > auto-selection
When NLSQ warm-start is present and both sides use “auto”, upgrades to “constant_averaged” to fix scaling values and reduce parameter count. This prevents contrast/offset sampling from absorbing physical parameter signal, which was the root cause of heterogeneous shard posteriors.
- to_dict()[source]
Convert configuration to dictionary.
- __init__(enable='auto', min_points_for_cmc=100000, per_angle_mode='auto', constant_scaling_threshold=3, sharding_strategy='random', num_shards='auto', max_points_per_shard='auto', backend_name='auto', enable_checkpoints=True, checkpoint_dir='./checkpoints/cmc', num_warmup=500, num_samples=1500, num_chains=4, chain_method='parallel', target_accept_prob=0.85, dense_mass=True, adaptive_sampling=True, max_tree_depth=10, min_warmup=100, min_samples=200, enable_jax_profiling=False, jax_profile_dir='./profiles/jax', max_r_hat=1.1, min_ess=400.0, max_divergence_rate=0.1, combination_method='robust_consensus_mc', min_success_rate=0.9, run_id=None, per_shard_timeout=3600, heartbeat_timeout=600, min_success_rate_warning=0.8, require_nlsq_warmstart=False, use_nlsq_informed_priors=True, nlsq_prior_width_factor=2.0, prior_tempering=True, max_parameter_cv=1.0, heterogeneity_abort=True, min_points_per_shard=10000, min_points_per_param=1500, reparameterization_d_total=True, reparameterization_log_gamma=True, bimodal_min_weight=0.2, bimodal_min_separation=0.5, seed=42, _validation_errors=<factory>)
CMCResult¶
- class homodyne.optimization.cmc.results.CMCResult[source]
Bases:
objectCMC analysis result with posterior samples and diagnostics.
This dataclass is compatible with save_mcmc_results() in cli/commands.py.
- parameters
Posterior mean values, shape (n_params,).
- Type:
np.ndarray
- uncertainties
Posterior standard deviations, shape (n_params,).
- Type:
np.ndarray
- convergence_status
“converged” | “divergences” | “not_converged”.
- Type:
- divergences
Total number of divergent transitions.
- Type:
- inference_data
ArviZ InferenceData for plotting.
- Type:
az.InferenceData
- execution_time
Total sampling time in seconds.
- Type:
- warmup_time
Warmup time in seconds.
- Type:
- n_chains
Number of MCMC chains.
- Type:
- n_samples
Samples per chain.
- Type:
- n_warmup
Warmup samples.
- Type:
- analysis_mode
Analysis mode used.
- Type:
- covariance
Parameter covariance matrix (from samples).
- Type:
np.ndarray
- chi_squared
Placeholder for compatibility (not directly computed in MCMC).
- Type:
- reduced_chi_squared
Placeholder for compatibility.
- Type:
- parameters: ndarray
- uncertainties: ndarray
- convergence_status: str
- divergences: int
- inference_data: arviz.InferenceData
- execution_time: float
- warmup_time: float
- n_chains: int = 4
- n_samples: int = 2000
- n_warmup: int = 500
- analysis_mode: str = 'static'
- per_angle_mode: str = 'auto'
- num_shards: int = 1
- covariance: ndarray
- chi_squared: float = 0.0
- reduced_chi_squared: float = 0.0
- quality_flag: str = 'good'
- mean_params: ParameterStats
- std_params: ParameterStats
- property success: bool
Return True if sampling converged (backward compatibility).
- property message: str
Return descriptive message about result.
- classmethod from_mcmc_samples(mcmc_samples, stats, analysis_mode, n_warmup=500, min_ess=None)[source]
Create CMCResult from MCMC samples.
- Parameters:
mcmc_samples (
MCMCSamples) – Raw MCMC samples.stats (
SamplingStats) – Sampling statistics.analysis_mode (
str) – Analysis mode used.n_warmup (
int) – Number of warmup samples.min_ess (
float|None) – Minimum effective sample size for convergence checks. If None, usesDEFAULT_MIN_ESSfrom diagnostics module.
- Returns:
Complete result object.
- Return type:
CMCResult
- get_posterior_stats()[source]
Get posterior statistics for each parameter.
- get_samples_array()[source]
Get samples as 3D array.
- Returns:
Shape (n_chains, n_samples, n_params).
- Return type:
- validate_parameters(n_phi=None)[source]
Validate that result contains expected parameters.
- __init__(parameters, uncertainties, param_names, samples, convergence_status, r_hat, ess_bulk, ess_tail, divergences, inference_data, execution_time, warmup_time, n_chains=4, n_samples=2000, n_warmup=500, analysis_mode='static', per_angle_mode='auto', num_shards=1, covariance=<factory>, chi_squared=0.0, reduced_chi_squared=0.0, device_info=<factory>, recovery_actions=<factory>, quality_flag='good', mean_params=<factory>, std_params=<factory>, mean_contrast=None, std_contrast=None, mean_offset=None, std_offset=None)
Note
ArviZ field mapping: NumPyro stores NUTS energy as potential_energy
but ArviZ plot_energy() expects energy. The CMCResult.inference_data
attribute automatically maps potential_energy → energy and replaces dots
in extra_fields keys (e.g., adapt_state.step_size →
adapt_state_step_size) for xarray compatibility.
Per-Angle Mode¶
The per_angle_mode parameter controls how contrast and offset are handled
across azimuthal angles. This must match the setting used in NLSQ to ensure
the warm-start is compatible.
Mode |
Behaviour |
Parameters (laminar_flow, 23 angles) |
|---|---|---|
|
n_phi ≥ threshold → averaged scaling (9), n_phi < threshold → individual |
9 (7 physical + 2 optimised avg scaling) |
|
Fixed per-angle scaling from quantile estimation, not optimised |
7 (physical only) |
|
Independent contrast + offset per angle, all sampled |
53 (7 + 46) |
Tip
Always use per_angle_mode: "auto" (the default). It prevents parameter
absorption degeneracy while keeping the parameter count manageable.
YAML Configuration Reference¶
optimization:
cmc:
enable: true
sharding:
strategy: "random" # random | stratified | contiguous
max_points_per_shard: "auto" # ALWAYS use auto
min_points_per_param: 1500 # Minimum data points per parameter
backend_config:
name: "auto" # auto | multiprocessing | pjit | pbs
per_angle_mode: "auto" # Match NLSQ per_angle_mode
combination:
method: "robust_consensus_mc" # Default: MAD-based outlier detection
min_success_rate: 0.90
per_shard_mcmc:
num_warmup: 500
num_samples: 1500
num_chains: 4
chain_method: "parallel" # parallel | vectorized | sequential
target_accept_prob: 0.85
max_tree_depth: 10
adaptive_sampling: true
validation:
max_divergence_rate: 0.10
max_r_hat: 1.1
min_ess: 400
Quality Filtering¶
CMC includes automatic quality filtering to prevent corrupted posteriors from being included in the consensus combination:
Setting |
Default |
Purpose |
|---|---|---|
|
0.10 |
Exclude shards with >10 % divergences |
|
|
Require NLSQ warm-start for |
Usage Examples¶
Standard NLSQ → CMC workflow¶
from homodyne.optimization.nlsq import fit_nlsq_jax
from homodyne.optimization.cmc import fit_mcmc_jax
from homodyne.config.manager import ConfigManager
from homodyne.data.xpcs_loader import XPCSDataLoader
config_manager = ConfigManager("config.yaml")
data = XPCSDataLoader(config_dict=config_manager.config).load_experimental_data()
# Step 1 — NLSQ warm-start
nlsq_result = fit_nlsq_jax(data, config_manager.config)
# Step 2 — CMC with warm-start
cmc_result = fit_mcmc_jax(
data,
config_manager.config,
nlsq_result=nlsq_result,
)
print("Posterior means:", cmc_result.parameters)
print("Posterior stds:", cmc_result.uncertainties)
CMC without warm-start (exploratory)¶
from homodyne.optimization.cmc import fit_mcmc_jax
cmc_result = fit_mcmc_jax(data, config) # uses prior-median initialisation
Accessing diagnostics¶
for param, rhat in cmc_result.r_hat.items():
print(f"{param}: R-hat = {rhat:.3f}")
print(f"Divergences: {cmc_result.divergences}")
print(f"ESS (bulk, min): {min(cmc_result.ess_bulk.values()):.0f}")
CMC core module - main entry point.
This module provides the fit_mcmc_jax() function that serves as the main entry point for CMC analysis, matching the CLI signature.
- homodyne.optimization.cmc.core.run_cmc_analysis(data, t1, t2, phi, q, L, analysis_mode, config, parameter_space, initial_values=None, dt=None)[source]
Simplified interface for CMC analysis.
This is a convenience wrapper around fit_mcmc_jax() that takes a CMCConfig object directly instead of a dict.
- Parameters:
data (
ndarray) – Data arrays.t1 (
ndarray) – Data arrays.t2 (
ndarray) – Data arrays.phi (
ndarray) – Data arrays.q (
float) – Physics parameters.L (
float) – Physics parameters.analysis_mode (
str) – Analysis mode.config (
CMCConfig) – CMC configuration object.parameter_space (
ParameterSpace) – Parameter space.dt (
float|None) – Time step (None infers from pooled time arrays).
- Returns:
Analysis result.
- Return type:
CMCResult