Networked Exponential Families For Big Data Over Networks
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
machine learning from massive network-structured datasets
(“big data over networks”). High-dimensional data points are
interpreted as the realizations of a random process distributed
according to some exponential family. Networked exponential
families allow to jointly leverage the information contained
in high-dimensional data points and their network structure.
For data points representing individuals, we obtain perfectly
personalized models which enable high-precision medicine or
more general recommendation systems.We learn the parameters
of networked exponential families, using the network Lasso
which implicitly pools (or clusters) the data points according to
the intrinsic network structure and a local likelihood function.
Our main theoretical result characterizes how the accuracy
of network Lasso depends on the network structure and the
information geometry of the node-wise exponential families.
The network Lasso can be implemented as highly scalable
message-passing over the data network. Such message passing
is appealing for federated machine learning relying on edge
computing. The proposed method is also privacy preserving in
the sense that no raw data but only parameter (estimates) are
shared among different nodes.