Abstract
Artificial general intelligence (AGI) may herald our extinction,
according to AI safety research. Yet claims regarding AGI must rely upon
mathematical formalisms – theoretical agents we may analyse or attempt
to build. AIXI appears to be the only such formalism supported by proof
that its behaviour is optimal, a consequence of its use of compression
as a proxy for intelligence. Unfortunately, AIXI is incomputable and
claims regarding its behaviour highly subjective. We argue that this is
because AIXI formalises cognition as taking place in isolation from the
environment in which goals are pursued (Cartesian dualism). We propose
an alternative, supported by proof and experiment, which overcomes these
problems. Integrating research from cognitive science with AI, we
formalise an enactive model of learning and reasoning to address the
problem of subjectivity. This allows us to formulate a different proxy
for intelligence, called weakness, which addresses the problem of
incomputability. We prove optimal behaviour is attained when weakness is
maximised. This proof is supplemented by experimental results comparing
weakness and description length (the closest analogue to compression
possible without reintroducing subjectivity). Weakness outperforms
description length, suggesting it is a better proxy. Furthermore we show
that, if cognition is enactive, then minimisation of description length
is neither necessary nor sufficient to attain optimal performance. These
results undermine the notion that compression is closely related to
intelligence. We conclude with a discussion of limitations, implications
and future research. There remain several open questions regarding the
implementation of scale-able general intelligence. In the short term,
these results may be best utilised to improve the performance of
existing systems. For example, our results explain why Deepmind’s
Apperception Engine is able to generalise effectively, and how to
replicate that performance by maximising weakness. Likewise in the
context of neural networks, our results suggest both limitations of
“scale is all you need”, and how those limitations can be overcome.