Hash functions and the avalanche effect

Entry 006 · 2026-04-04

A cryptographic hash function maps an arbitrary-length input to a fixed-length output, ideally with three properties: preimage resistance (given H(x), hard to find x); second-preimage resistance (given x, hard to find x' ≠ x with H(x') = H(x)); and collision resistance (hard to find any pair x, x' with the same hash). The last property is strictly stronger than the others.

Modern hash functions also satisfy an informal property called the avalanche effect: flipping a single input bit should flip, on average, half the output bits, with the flips appearing statistically independent. This is not part of the formal security definition, but it is what makes a hash function useful as a PRF-like building block. A function with poor avalanche may be collision-resistant in a formal sense but useless in constructions that require its output to look random.

The MD5 and SHA-1 families achieved avalanche through repeated rounds of bit-level mixing operations: rotations, additions, XORs, non-linear bit functions like (x ∧ y) ∨ (¬x ∧ z). Each round is too simple to be secure on its own, but the composition spreads any single-bit change throughout the state within a few rounds. The security margin is the gap between the number of rounds the function uses and the number of rounds that have been broken.

SHA-2 followed the same Merkle–Damgård lineage with larger state and more rounds. SHA-3, the result of the NIST competition won by Keccak, took a different structural approach (the sponge construction) that we'll revisit in a later entry. The lineage is a useful study in incremental hardening: each generation responded to specific weaknesses found in the previous, and each generation has held up at least as long as its predecessor.