Information theory and Shannon's source coding theorem
Shannon's 1948 paper introduced both information theory and a
specific quantitative statement about lossless compression: a source emitting
symbols drawn i.i.d. from a distribution with entropy H can be
encoded into a binary string of expected length arbitrarily close to
H bits per symbol, but no scheme can do better in the long
run.
The proof is in two parts. The achievability side is constructive:
build a typical-set encoder. For block length n large enough,
almost all observed sequences belong to a set of size roughly
2^{nH}; index that set and the index needs about
nH bits. The converse side is a counting argument: if you have
fewer than 2^{nH} distinct codewords, you cannot uniquely encode
sequences from a set whose probability mass concentrates on
2^{nH} elements without introducing decoding errors.
The theorem is asymptotic and makes no statement about specific short messages — that's the territory of Kolmogorov complexity. It also says nothing about computability of the optimal encoder; the typical-set encoder is only mildly practical and was quickly superseded by Huffman codes (which are exactly optimal for single-symbol encodings) and later by arithmetic codes (which are essentially optimal at the block level and trivially adaptive).
The result still anchors a lot of modern compression intuition.
Whenever you see a compressor advertised as "matching the entropy of the
source", what is being matched is the per-symbol entropy under some assumed
model; the gap to Shannon's bound is a measure of how well the model fits
the actual source. The bound itself is not the achievement; the achievement
is reducing the engineering problem of compression to the modelling problem
of estimating H.