Abstract
Segmentation ambiguity in generative linguistic steganography could
induce decoding errors. One existing disambiguating way is removing the
tokens whose mapping words are the prefixes of others in each candidate
pool. However, it neglects probability distribution of candidates and
degrades imperceptibility. To enhance steganographic security, meanwhile
addressing segmentation ambiguity, we propose a secure and
disambiguating approach for linguistic steganography. In this letter, we
focus on two questions: (1) Which candidate pools should be modified?
(2) Which tokens should be retained? Firstly, we propose a secure
token-selection principle that the sum of selected tokens’ probabilities
is positively correlated to statistical imperceptibility. To meet both
disambiguation and optimal security, we present a lightweight
disambiguating approach that is finding out a maximum weight independent
set (MWIS) in one candidate graph only when candidate-level ambiguity
occurs. Experiments show that our approach outperforms the existing
method in various security metrics, improving 25.7% statistical
imperceptibility and 12.2% anti-steganalysis capacity averagely.