loading page

Improving Model Selection in Deep Supervised Transfer Learning Under Homogeneous Setting
  • Osayande Pascal Omondiagbe ,
  • Stephen G. MacDonell ,
  • Sherlock Licorish
Osayande Pascal Omondiagbe
Landcare Research - University of Otago

Corresponding Author:[email protected]

Author Profile
Stephen G. MacDonell
Author Profile
Sherlock Licorish
Author Profile

Abstract

In traditional machine learning environments, the use of non-parametric error estimation to set the discriminative threshold of a classifier to achieve the best accuracy is very effective. This method is not effective in a transfer learning environment because it is only reliable when both the training and testing data have similar distributions which is not the case in a transfer learning setting. Although the use of control variate techniques has been proposed to exploit the information about the error in the training sample to reduce the error in the test sample, this method yields a finite variance and the model uncertainty is not distributed among the variance. In this paper, we proposed and test a new transfer learning validation method called control linear minimum mean-squared error (CLMMSE) for source model selection under homogeneous transfer learning settings with the absence of an adequate pre-trained source model. Our approach adopts the Bayesian linear minimum mean-squared error (LMMSE) and integrates the idea of importance sampling into a control variate approach to provide an accurate estimate for the LMMSE that is then used to select the optimal source model. By combining importance sampling with the control variate technique to reduce further the variance, we can achieve a much tighter bound with the LMMSE. This approach reduces the risk in the target domain under data shift. Experimental results on synthetic data under two data shift settings demonstrate the efficacy of our approach. A further experiment on two real-world datasets shows that we were able to improve the accuracy of two state-of-the-art models tested; Bert (0.94\% to 65\%) and CodeBERT (1.82\% to 18.2\%) when compared to using previous selection methods.