Stegomalware: A Systematic Survey of Malware Hiding and Detection in
Images, Machine Learning Models and Research Challenges
Abstract
Malware distribution to the victim network is commonly performed through
file attachments in phishing email or downloading illegitimate files
from the internet, when the victim interacts with the source of
infection. To detect and prevent the malware distribution in the victim
machine, the existing end device security applications may leverage
sophisticated techniques such as signature-based or anomaly-based,
machine learning techniques. The well-known file formats Portable
Executable (PE) for Windows and Executable and Linkable Format (ELF) for
Linux based operating system are used for malware analysis and the
malware detection capabilities of these files has been well advanced for
real time detection. But the malware payload hiding in multimedia like
cover images using steganography detection has been a challenge for
enterprises, as these are rarely seen and usually act as a stager in
sophisticated attacks. In this article, to our knowledge, we are the
first to try to address the knowledge gap between the current progress
in image steganography and steganalysis academic research focusing on
data hiding and the review of the stegomalware (malware payload hiding
in images) targeting enterprises with cyberattacks current status. We
present the stegomalware history, generation tools, file format
specification description. Based on our findings, we perform the detail
review of the image steganography techniques including the recent
Generative Adversarial Networks (GAN) based models and the image
steganalysis methods including the Deep Learning opportunities and
challenges in stegomalware generation and detection are presented based
on our findings.