Information in Missing Patterns: Enhancing Prediction Accuracy in Weighted Linear Regression with Missing Data Using Soft Clustering

The linear system with missing information is
investigated in this paper. New methods are
introduced to improve the Mean Squared Error (MSE)
on the test set in comparison to state-of-the-art method
s, through appropriate tuning of Bias-Variance
trade-off. The concept is to cluster the data and
adapt the learning model to each cluster. Hence,
we set forth a controlled bias into the problem and
positively utilize it to enhance learning capability on
the instances considered in some specific
neighborhood. To deal with missing infrormation,
we propose a novel algorithm "Missing-SCOP" based
on SCOP-KMEANS algorithm introduced by Wagstaff,
et al., utilizing the missing pattern of the dataset for
construction of a soft-constraint matrix and clustering
in missing scenario. It is shown that controlled
over-fitting suggested by our algorithm improves
prediction accuracy in various cases.
Numerical experiments approve the efficacy of our
proposed algorithm in enhancing the prediction
accuracy.