Abstract
Traditional convolutional neural network (CNN) methods rely on dense
tensors, which makes them suboptimal for spatially sparse data. In this
paper, we propose a CNN model based on sparse tensors for efficient
processing of large and sparse medical images. In contrast to a dense
CNN that takes the entire voxel grid as input, a sparse CNN processes
only on the non-empty voxels, thus reducing the memory and computation
overhead caused by the sparse input data. We evaluate our method on two
clinically relevant skull reconstruction tasks: (1) given a defective
skull, reconstruct the complete skull (i.e., skull shape completion),
and (2) given a coarse skull, reconstruct a high-resolution skull with
fine geometric details (shape super-resolution). Our method outperforms
the state of the art in the skull reconstruction task quantitatively and
qualitatively, while requiring substantially less memory for training
and inference. We observed that, on the 3D skull data, the overall
memory consumption of the sparse CNN grows approximately linearly during
inference with respect to the image resolutions. During training, the
memory usage remains clearly below increases in image resolution - an x8
increase in voxel number leads to less than x8 increase in memory
requirements. Our study demonstrates the effectiveness of using a sparse
CNN for skull reconstruction tasks, and our findings can be applied to
other spatially sparse problems. We proof this by additional
experimental results on other sparse medical datasets, like the aorta
and the heart. Project page at https://github.com/Jianningli/SparseCNN