Skip to content

Diagnostics#

Host-side diagnostics for sample quality and NRPT run health. These run in numpy (no XLA compile) over the collected samples and the stats dict returned by hamon.nrpt / hamon.tune_schedule.

hamon.report_nrpt_diagnostics(stats: dict, samples: Bool[Array, 'n_samples n_variables'] | None = None, *, tau_min: float = 0.01, efficiency_fail: float = 0.2, efficiency_warn: float = 0.35, rej_std_max: float = 0.15, entropy_frozen: float = 0.05, entropy_uniform: float = 0.95, min_attempts: int = 50, ess_warn: float = 0.1) -> NRPTHealthReport #

Evaluate NRPT stats (and optionally samples) into a single verdict.

For NRPT, round-trip diagnostics are the primary quality signal: states must travel the full temperature ladder for tempering to work. Marginal-convergence checks are reported for information but never used as pass/fail criteria — when PT correctly samples multiple modes, the marginals shift between halves of the run and a naive convergence test produces false "NEED_MORE" verdicts.

Decision criteria (each threshold is a keyword argument):

  • ISSUE: tau_observed < tau_min — no round trips, information is not flowing through the ladder.
  • ISSUE/WARN: efficiency < efficiency_fail / < efficiency_warn — the round-trip rate is below the ELE-optimal τ̄. The report sets efficiency_limiter to attribute the cause and point at the right knob: "schedule" when the ladder is not equalized (std(rejection_rates) > rej_std_max — tune further / add chains) or "local_exploration" when it is equalized (an ELE violation — raise gibbs_steps_per_round, or add chains as the alternative lever). A chain-count recommendation is included either way.
  • ISSUE: std(rejection_rates) > rej_std_max — schedule not equalized.
  • ISSUE: marginal_entropy < entropy_frozen — sampler frozen.
  • WARN: marginal_entropy > entropy_uniform — β may be too low.
  • WARN: a sharp peak in the λ(β) profile (barrier bottleneck).
  • WARN: ess_fraction < ess_warn — worst-mixing variable has low effective sample size (informational; never a hard failure).

All of these statistics are noisy when few swaps were attempted: when min(attempted) < min_attempts the would-be issues are demoted to warnings and insufficient_data is set instead of condemning a short tuning probe.

Arguments:

Returns:

An :class:NRPTHealthReport. Issues are logged at WARNING level.

hamon.NRPTHealthReport #

Result of :func:report_nrpt_diagnostics.

Attributes:

Name Type Description
healthy

True when no issues were found and there was enough data to judge.

insufficient_data

True when swap-attempt counts were too low to apply the pass/fail criteria; would-be issues are demoted to warnings and healthy reflects only what could be checked.

issues

Hard failures — the samples should not be trusted.

warnings

Soft findings worth investigating.

acceptance_mean / rejection_std

Swap-rate statistics.

total_round_trips / rejection_std

Round-trip diagnostics (None when the run was made with track_round_trips=False).

barrier_identified / rejection_std

False when the index process did not round-trip (a stalled conveyor), so Lambda is a within-basin artifact and must not be trusted — add chains / equalize the ladder. True when round trips flowed; None when round-trip diagnostics were unavailable. See :func:hamon.round_trips.barrier_is_identified.

recommended_n_chains / rejection_std

Suggested chain count when efficiency is low.

efficiency_limiter / rejection_std

When round-trip efficiency is low, which knob to turn — "schedule" (the ladder is not equalized: tune it further or add chains) or "local_exploration" (the ladder is equalized, so the local kernel is the bottleneck — an ELE violation; raise gibbs_steps_per_round, or increase N as the alternative lever). None when efficiency is healthy or unavailable.

barrier_peak_beta / rejection_std

Midpoint β of a sharp barrier peak, if detected.

convergence_status / rank_stability / marginal_entropy

Sample-based metrics (None when samples was not provided). For NRPT, convergence is informational only — correct multi-modal sampling shifts marginals between run halves, so a non-CONVERGED status is not treated as a failure.

min_ess / median_ess / ess_fraction

Effective-sample-size summaries over the provided samples (None when not provided). ess_fraction is min_ess / n_samples for the worst-mixing variable; a low value drives a warning (never a hard failure — see :func:effective_sample_size on the multimodal caveat).

summary() -> str #

Human-readable multi-line summary.

hamon.effective_sample_size(samples: Shaped[Array, 'n_samples n_variables'] | np.ndarray) -> ESSReport #

Estimate the effective sample size of an autocorrelated MCMC trace.

MCMC draws are autocorrelated, so n correlated samples carry the information of fewer independent ones. ESS estimates that effective count: the Monte-Carlo error of any estimate computed from the trace scales as σ/√ESS, not σ/√n. For an iid trace ESS ≈ n; for a slowly mixing one it can be far smaller. Pair ess_fraction (or min_ess) with the run's wall-clock time to get ESS/second, the efficiency metric used to compare schedules or chain counts.

Computed on the host with numpy (FFT autocorrelation + Geyer initial-positive-sequence; see :func:_ess_1d), so there is no XLA compile cost. Inputs may be jax arrays; they are pulled to the host once.

For multimodal parallel tempering, a low single-marginal ESS can reflect the chain correctly jumping between modes (mode switches are long-range correlation), so read ESS alongside the round-trip diagnostics rather than instead of them.

Parameters:

Name Type Description Default
samples Shaped[Array, 'n_samples n_variables'] | ndarray

array of shape (n_samples, n_variables) (a 1-D (n_samples,) trace is treated as a single variable). Boolean or numeric.

required

Returns:

Name Type Description
An ESSReport

class:ESSReport.

hamon.ESSReport #

Result of :func:effective_sample_size.

Attributes:

Name Type Description
per_variable

per-column effective sample size, shape (n_variables,).

min_ess

smallest ESS across variables (the worst-mixing variable — the conservative number to quote).

median_ess / mean_ess

summary ESS across variables.

ess_fraction / mean_ess

min_ess / n_samples — the efficiency of the worst-mixing variable, in [0, 1].

n_samples / mean_ess

number of samples the estimate was computed from.

__annotations__ class-attribute #

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_fields__ class-attribute #

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_params__ class-attribute #
__dict__ class-attribute #

Read-only proxy of a mapping.

__doc__ class-attribute #

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to 'utf-8'. errors defaults to 'strict'.

__firstlineno__ class-attribute #

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.int(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer iteral.

int('0b100', base=0) 4

__match_args__ class-attribute #

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

__module__ class-attribute #

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to 'utf-8'. errors defaults to 'strict'.

__static_attributes__ class-attribute #

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

__weakref__ property #

list of weak references to the object

__eq__(other) #
__init__(per_variable: np.ndarray, min_ess: float, median_ess: float, mean_ess: float, ess_fraction: float, n_samples: int) -> None #
__repr__() #

hamon.diagnostics.sample_convergence(samples: Bool[Array, 'n_samples n_variables'], *, target_k: int = 15, drift_threshold: float = 0.01, jaccard_threshold: float = 0.8) -> ConvergenceReport #

Measure stability of marginal probability estimates.

Splits samples into quartile checkpoints (25 %, 50 %, 75 %, 100 %), computes marginals at each checkpoint, and reports the L1 drift between consecutive checkpoints together with the rank stability of the top-k variables.

Parameters:

Name Type Description Default
samples Bool[Array, 'n_samples n_variables']

boolean array of shape (n_samples, n_variables).

required
target_k int

number of top variables to track for rank stability.

15
drift_threshold float

maximum acceptable L1 drift per variable for the final checkpoint to be considered converged.

0.01
jaccard_threshold float

minimum Jaccard similarity of top-k sets between halves for rank stability to be considered converged.

0.8

Returns:

Name Type Description
A ConvergenceReport

class:ConvergenceReport.

hamon.diagnostics.marginal_entropy(samples: Bool[Array, 'n_samples n_variables']) -> float #

Normalized entropy of the empirical marginal distribution.

Computes the mean per-variable binary entropy, normalized to [0, 1]. A value near 0 means most variables are frozen (all True or all False); near 1 means each variable is near 50/50.

Parameters:

Name Type Description Default
samples Bool[Array, 'n_samples n_variables']

boolean array of shape (n_samples, n_variables).

required

Returns:

Type Description
float

Scalar in [0, 1].

hamon.diagnostics.energy_balance(biases: Shaped[Array, ' n'], edges: Shaped[Array, 'm 2'], weights: Shaped[Array, ' m'], *, beta: float = 1.0, warn_low: float = 0.05, warn_high: float = 2.0) -> EnergyBalanceReport #

Check whether bias and coupling energy scales are balanced.

Computes the energy contribution from biases vs couplings at the given temperature and reports their ratio. Logs a warning when the ratio falls outside [warn_low, warn_high].

Parameters:

Name Type Description Default
biases Shaped[Array, ' n']

per-node bias array of shape (n,).

required
edges Shaped[Array, 'm 2']

integer index pairs of shape (m, 2).

required
weights Shaped[Array, ' m']

per-edge coupling of shape (m,).

required
beta float

inverse temperature.

1.0
warn_low float

ratio below which a warning is logged.

0.05
warn_high float

ratio above which a warning is logged.

2.0

Returns:

Name Type Description
An EnergyBalanceReport

class:EnergyBalanceReport.