TechRxiv
Mask_RPCA__Deep_Unfolding__for_Foreground_Background_Separation_PREPRINT.pdf (1.59 MB)
Download file

Interpretable Neural Networks for Video Separation: Deep Unfolding RPCA with Foreground Masking

Download (1.59 MB)
preprint
posted on 2022-05-02, 18:25 authored by Boris JoukovskyBoris Joukovsky, Nikos Deligiannis, Yonina C. Eldar
This paper presents two deep unfolding neural networks for the simultaneous tasks of background subtraction and foreground detection in video. Unlike conventional neural networks based on deep feature extraction, we incorporate domain-knowledge models by considering a masked variation of the robust principal component analysis problem (RPCA). With this approach, we separate video clips into low-rank and sparse components, respectively corresponding to the backgrounds and foreground masks indicating the presence of moving objects. Our models, coined ROMAN-S and ROMAN-R, map the iterations of two alternating direction of multipliers methods (ADMM) to trainable convolutional layers, and the proximal operators are mapped to non-linear activation functions with trainable thresholds. This approach leads to lightweight networks with enhanced interpretability that can be trained on few data. In ROMAN-S, the correlation in time of successive binary masks is controlled with a side-information scheme based on L1-L1 minimization. ROMAN-R enhances the foreground detection by learning a dictionary of atoms to represent the moving foreground in a high-dimensional feature space and by using reweighted-L1-L1 minimization. Experiments are conducted on both synthetic and real video datasets and comparisons are made with existing deep unfolding RPCA neural networks, which do not use a mask formulation for the foreground. The models are also compared to a U-Net baseline. Results show that our proposed models outperform other deep unfolding models, as well as the untrained optimization algorithms. ROMAN-R, in particular, is competitive with the U-Net baseline for foreground detection, with the additional advantage of providing video backgrounds and requiring substantially fewer training parameters and smaller training sets.

Funding

Interpreteerbare en Verklaarbare Deep Learning voor Video Verwerking

Research Foundation - Flanders

Find out more...

History

Email Address of Submitting Author

bjoukovs@etrovub.be

ORCID of Submitting Author

0000-0002-2881-2727

Submitting Author's Institution

Electronics and informatics (ETRO), Vrije Universiteit Brussels (VUB)

Submitting Author's Country

  • Belgium

Usage metrics

    Exports