Federated-PCA on Vertical-Partitioned Data

2020-05-22T08:44:49Z (GMT) by Yiu-ming Cheung Feng Yu
In the cross-silo federated learning setting, one kind of data partition according to features, which is so-called vertical federated learning (i.e. feature-wise federated learning) (Yang et al. 2019), is to apply to multiple datasets that share the same sample ID space but different feature spaces. Simultaneously, the image dataset can also be partitioned according to labels. To improve the model performance of the isolated parties based on feature-wise (i.e. label-wise) results, the most effective method is to federate the model results of the isolated parties together. However, it is a non-trivial task to allow the participating parties to share the model results without violating the data privacy of the parties. In this paper, within the framework of principal component analysis (PCA), we propose a Federated-PCA machine learning approach, in which the PCA method is used to reduce the dimensionality of sample data for all parties and extract the principal component feature information to improve the efficiency of subsequent training work. This process will not reveal the original data information of each party. The federal system can help each side build a common profit strategy. Under this federal mechanism, the identity and status of each party are the same. By comparing the federated results of the isolated parties and the result of the unseparated party through multiple sets of comparative experiments, we find that the experimental results of these two settings are close, and the proposed method can effectively improve the training model performance of most participating parties.