Hyper-Parameter Initialization for Squared Exponential Kernel-based
Gaussian Process Regression
Abstract
Hyper-parameter optimization is an essential task in the use of machine
learning techniques. Such optimizations are typically done starting with
an initial guess provided to hyperparameter values followed by
optimization (or minimization) of some cost function via gradient-based
methods. The initial values become crucial since there is every chance
for reaching local minimums in the cost functions being minimized,
especially since gradient-based optimizing is done. Therefore,
initializing hyper-parameters several times and repeating optimization
to achieve the best solutions is usually attempted. Repetition of
optimization can be computationally expensive when using techniques like
Gaussian Process (GP) which has an O(n3) complexity, and not having a
formal strategy to initialize hyperparameter values is an additional
challenge. In general, reinitialization of hyper-parameter values in the
contexts of many machine learning techniques including GP has been done
at random over the years; some recent developments have proposed some
initialization strategies based on the optimization of some meta loss
cost functions. To simplify this challenge of hyperparameter
initialization, this paper introduces a data-dependent deterministic
initialization technique. The specific case of the squared exponential
kernel-based GP regression problem is focused on, and the proposed
technique brings novelty by being deterministic as opposed to random
initialization, and fast (due to the deterministic nature) as opposed to
optimizing some form of meta cost function as done in some previous
works. Although global suitability of this initialization technique is
not proven in this paper, as a preliminary study the technique’s
effectiveness is demonstrated via several synthetic as well as real
data-based nonlinear regression examples, hinting that the technique may
have the effectiveness for broader usage.