Hyper-Parameter Initialization for Squared Exponential Kernel-based Gaussian Process Regression
preprintposted on 2020-04-23, 21:15 authored by Nalika Ulapane, Karthick ThiyagarajanKarthick Thiyagarajan, sarath kodagoda
Hyper-parameter optimization is an essential task in the use of machine learning techniques. Such optimizations are typically done starting with an initial guess provided to hyperparameter values followed by optimization (or minimization) of some cost function via gradient-based methods. The initial values become crucial since there is every chance for reaching local minimums in the cost functions being minimized, especially since gradient-based optimizing is done. Therefore, initializing hyper-parameters several times and repeating optimization to achieve the best solutions is usually attempted. Repetition of optimization can be computationally expensive when using techniques like Gaussian Process (GP) which has an O(n3) complexity, and not having a formal strategy to initialize hyperparameter values is an additional challenge. In general, reinitialization of hyper-parameter values in the contexts of many machine learning techniques including GP has been done at random over the years; some recent developments have proposed some initialization strategies based on the optimization of some meta loss cost functions. To simplify this challenge of hyperparameter initialization, this paper introduces a data-dependent deterministic initialization technique. The speciﬁc case of the squared exponential kernel-based GP regression problem is focused on, and the proposed technique brings novelty by being deterministic as opposed to random initialization, and fast (due to the deterministic nature) as opposed to optimizing some form of meta cost function as done in some previous works. Although global suitability of this initialization technique is not proven in this paper, as a preliminary study the technique’s effectiveness is demonstrated via several synthetic as well as real data-based nonlinear regression examples, hinting that the technique may have the effectiveness for broader usage.