Concepts#
This page explains the ideas behind Hamon without requiring prior knowledge of probabilistic graphical models or MCMC. If you already know this material, skip to the API reference.
Energy-based models#
An energy-based model (EBM) assigns a scalar energy \( E(x) \) to every possible configuration \( x \) of a set of discrete variables. Lower energy means higher probability:
The partition function \( Z \) makes this a proper distribution, but computing it requires summing over every configuration — exponential in the number of variables. Hamon sidesteps this by sampling from \( p(x) \) without ever computing \( Z \).
Ising models#
The simplest discrete EBM. Each variable (spin) takes values in \(\{-1, +1\}\). The energy is:
where \( \beta \) is inverse temperature, \( w_{ij} \) are coupling weights,
and \( h_i \) are biases. Hamon represents these as IsingEBM objects.
Nodes, blocks, and factors#
Hamon models the dependency structure as a bipartite factor graph.
Nodes are the random variables. SpinNode for \(\pm 1\) variables,
CategoricalNode for multi-valued variables.
Factors encode interactions between groups of nodes. A WeightedFactor
stores a weight tensor and connects a set of nodes via an InteractionGroup.
Blocks are disjoint subsets of nodes that can be updated simultaneously. In a two-color Ising model, one block holds even-indexed spins and the other holds odd-indexed spins. Because spins in the same block don't interact directly, they can be sampled in parallel — this is the core of block Gibbs sampling.
A BlockSpec manages the index bookkeeping: which nodes live in which block,
how to scatter block-level updates back into the global state, and how to pad
variable-size blocks so JAX can stack them into arrays.
Block Gibbs sampling#
Gibbs sampling updates one variable at a time from its conditional distribution.
Block Gibbs updates an entire block at once. Hamon's sample_blocks function
sweeps through each free block in sequence, sampling all its nodes in parallel
given the current state of everything else.
A SamplingSchedule controls warmup (burn-in), the number of samples to
collect, and how many sweeps to run between recorded samples.
The temperature problem#
Block Gibbs at a single temperature can get trapped in local energy minima, especially when the model has frustrated interactions or phase transitions. The sampler mixes locally but can't cross energy barriers.
The solution: run the same model at multiple temperatures simultaneously.
Parallel tempering#
Parallel tempering (PT) maintains \( N \) copies of the system at inverse temperatures \( 0 = \beta_0 < \beta_1 < \cdots < \beta_N = 1 \):
- Hot chains (\( \beta \approx 0 \)) sample almost uniformly — they explore freely but carry no information about the target distribution.
- Cold chains (\( \beta = 1 \)) sample from the target — they resolve fine structure but can't escape local minima.
- Swap moves periodically propose exchanging states between adjacent temperature levels. Accepted swaps transport information from hot to cold.
The round-trip rate \( \tau \) measures how quickly information flows through the temperature ladder. A state that starts at the reference (\( \beta = 0 \)), travels to the target (\( \beta = 1 \)), and returns has completed one round trip.
DEO vs. SEO#
Hamon uses Deterministic Even-Odd (DEO) swap communication, which makes the index process non-reversible. On even rounds, swap proposals go to pairs \((0,1), (2,3), \ldots\); on odd rounds, \((1,2), (3,4), \ldots\).
The non-reversibility creates a directional drift through the temperature ladder — a conveyor belt effect that roughly doubles the round-trip rate compared to the reversible alternative (Stochastic Even-Odd, SEO).
Single-pass only
Hamon applies one parity per round, not both. Composing even-then-odd swaps in the same round produces the identity permutation, destroying the non-reversibility that makes DEO work.
Adaptive schedules#
The spacing of temperature levels matters. Hamon implements Algorithm 4 from
Syed et al. (2021): it monitors rejection
rates across the ladder and repositions \( \beta \) values to equalize
communication cost. nrpt_adaptive runs this tuning loop automatically.
Communication barrier#
The global communication barrier \( \Lambda \) is the fundamental quantity that governs tempering performance:
where \( \Lambda = \sum_{i=0}^{N-1} \frac{r_i}{1 - r_i} \) and \( r_i \) is the
rejection rate between chains \( i \) and \( i+1 \). Hamon reports \( \Lambda \)
in the round-trip diagnostics and uses it in discover_chain_count to recommend
how many chains to use.
What Hamon optimizes#
All of the above — block Gibbs, parallel tempering, schedule adaptation — is mathematically standard. Hamon's contribution is making it fast in JAX:
jax.vmapover chains: all \( N \) tempering chains run as a single vectorized kernel, not a Python loop. Compile time is constant in \( N \).lax.scancarry threading: global state flows through the scan loop without redundant reconstruction each iteration.- Vectorized DEO swaps: swap decisions are computed for all pairs in one
jnp.wherecall, exploiting temperature-linearity of Ising energies. - Incremental ΔE tracking:
boundary_energyclassifies edges and computes energy deltas from block updates without recomputing the full energy.
See the architecture guide for implementation details.