Abstract
Gestures constitute an important form of nonverbal communication where
bodily actions are used for delivering messages alone or in parallel
with spoken words. Recently, there exists an emerging trend of WiFi
sensing enabled gesture recognition due to its inherent merits like
device-free, non-line-of-sight covering, and privacy-friendly. However,
current WiFi-based approaches mainly reply on domain-specific training
since they don’t know “\emph{where to look}’‘ and
“\emph{when to look}”. To this end, we propose
WiGRUNT, a WiFi-enabled gesture recognition system using dual-attention
network, to mimic how a keen human being intercepting a gesture
regardless of the environment variations. The key insight is to train
the network to dynamically focus on the domain-independent features of a
gesture on the WiFi Channel State Information (CSI) via a
spatial-temporal dual-attention mechanism. WiGRUNT roots in a Deep
Residual Network (ResNet) backbone to evaluate the importance of
spatial-temporal clues and exploit their inbuilt sequential correlations
for fine-grained gesture recognition. We evaluate WiGRUNT on the open
Widar3 dataset and show that it significantly outperforms its
state-of-the-art rivals by achieving the best-ever performance in-domain
or cross-domain.