The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

If *A* and *B* are sets such that *A* is a subset of
*B*, generalisation may be understood as the inference from
*A* of a hypothesis sufficient to construct *B*. One might
infer any number of hypotheses from *A*, yet only some of those may
generalise to *B*. How can one know which are likely to generalise?
One strategy is to choose the shortest, equating the ability to compress
information with the ability to generalise (a “proxy for
intelligence”). We examine this in the context of a mathematical
formalism of enactive cognition. We show that compression is neither
necessary nor sufficient to maximise performance (measured in terms of
the probability of a hypothesis generalising). We formulate a proxy
unrelated to length or simplicity, called weakness. We show that if
tasks are uniformly distributed, then there is no choice of proxy that
performs at least as well as weakness maximisation in all tasks while
performing strictly better in at least one. In other words, weakness is
the pareto optimal choice of proxy. In experiments comparing maximum
weakness and minimum description length in the context of binary
arithmetic, the former generalised at between *1.1* and *5*
times the rate of the latter. We argue this demonstrates that weakness
is a far better proxy, and explains why Deepmind’s Apperception Engine
is able to generalise effectively.