FARMYARD: A Generic GPU-based Pipeline for Feature Discovery from
Massive Planetary LiDAR Data
Abstract
In recent decades, with the placement of LiDAR remote sensing
instruments in orbit, we now have global coverage of the bare ground
elevation on the Earth, Mars and beyond. Encoded in such planetary LiDAR
data are interesting landscape features that promise to further our
knowledge of planetary topography. However, discovery of such features
comes with 3 major challenges: First, the volume of planetary LiDAR data
can be massive, often comprising of hundreds of millions to billions of
data points. This calling for analytical algorithms with great
efficiency. Second, interesting features can often repeat themselves in
multiple scales in local regions, thus it is vital to enable multi-scale
feature discovery. Third, planetary LiDAR data can be heterogeneous, and
evaluation of the quality of the extracted features can often be
hampered by a variety of interfering factors. In response to these
challenges, we propose FARMYARD, a generic pipeline for Feature
Discovery From Planetary LiDAR Data. To the best of our knowledge, this
is the first time such a pipeline has been proposed, which provides a
brand new methodology for comparative studies of planetary topography.
Specifically, drawing on the parallel computing power of the Graphics
Processing Unit (GPU), we propose a novel pseudo-on-pass sweep (POPS)
framework for fast and memory-efficient feature extraction for massive
planetary LiDAR data, a two-level division scheme for local regions with
support for multi-scale features, and a Domain-Shifted Partition (DSP)
scheme for feature evaluation that is robust against interfering
factors. To showcase the utility of our FARMYARD pipeline, we deploy it
to an ongoing real-world research project called PARKER, which seeks to
find topographical signatures of life by discovering features that can
potentially distinguish between the Earth and alien worlds with no known
life activity. We also highlight the efficiency of our POPS framework
with experiments on both synthetic and real data, which can be hundreds
or even thousands of times faster than its CPU-based counterpart,
including an MPI-based multi-core parallel solution.