The law of large numbers says that, for each single event , its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. In many application however, the need arises to judge simultaneously the probabilities of events of an entire class from one and the same sample. Moreover it, is required that the relative frequency of the events converge to the probability uniformly over the entire class of events [1]. The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds.
For a class of predicates defined on a set and a set of samples , where , the empirical frequency of on is
The theoretical probability of is defined as
The Uniform Convergence Theorem states, roughly, that if is "simple" and we draw samples independently (with replacement) from according to any distribution , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability.[2]
Here "simple" means that the Vapnik–Chervonenkis dimension of the class is small relative to the size of the sample. In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.
The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis[1] using the concept of growth function.
The statement of the Uniform Convergence Theorem is as follows:[3]
If is a set of -valued functions defined on a set and is a probability distribution on then for and a positive integer, we have:
In the above, for any and indicates that the probability is taken over consisting of i.i.d. draws from the distribution
Finally, the growth function is defined in the following way, for any -valued functions over and for any natural number :
From the point of view of Learning Theory one can consider to be the Concept/Hypothesis class defined over the instance set . Crucially, the Sauer–Shelah lemma implies that , where is the VC dimension of .
[1] and [3] are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.
Symmetrization: We transform the problem of analyzing into the problem of analyzing , where and are i.i.d samples of size drawn according to the distribution . One can view as the original randomly drawn sample of length , while may be thought as the testing sample which is used to estimate .
Permutation: Since and are picked identically and independently, so swapping elements between them will not change the probability distribution on and . So, we will try to bound the probability of for some by considering the effect of a specific collection of permutations of the joint sample . Specifically, we consider permutations which swap and in some subset of . The symbol means the concatenation of and .[citation needed]
Reduction to a finite class: We can now restrict the function class to a fixed joint sample and hence, if has finite VC Dimension, it reduces to the problem to one involving a finite function class.
We present the technical details of the proof. It should be stressed that this proof glosses over details like the measurability of the events and ; measurability is granted in the case of being finite or countable, but this is not normally the case in standard applications of the theorem (e.g. for statistical learning theory or to prove the Glivenko-Cantelli theorem). To get measurability, one needs to use a notion of separability of the underlying space, possibly related to [4].
^ abcVapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025.
This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968.
The translation was reproduced as:
Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN978-3-319-21851-9.
^Krapp, Lothar Sebastian; Wirth, Laura (26 September 2025), Measurability in the Fundamental Theorem of Statistical Learning, v3, arXiv:2410.10243, doi:10.48550/arXiv.2410.10243