Optimality of Spectrum Pursuit for Column Subset Selection Problem:
Theoretical Guarantees and Applications in Deep Learning
Abstract
We propose a novel technique for finding representatives from a large,
unsupervised dataset. The approach is based on the concept of self-rank,
defined as the minimum number of samples needed to reconstruct all
samples with an accuracy proportional to the rank-$K$ approximation.
Our proposed algorithm enjoys linear complexity w.r.t. the size of
original dataset and simultaneously it provides an adaptive upper bound
for approximation ratio. These favorable characteristics result in
filling a historical gap between practical and theoretical methods in
finding representatives.