Conference_Paper___NIDS_PCAP_Transformation (7).pdf (1.51 MB)
Download file

Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets

Download (1.51 MB)
posted on 2022-11-30, 05:57 authored by Yasir Ali Farrukh, Irfan KhanIrfan Khan, Syed Wali, David Bierbrauer, John Pavlik, Nathaniel Bastian

Adapting modern approaches for network intrusion detection is becoming critical, given the rapid technological advancement and adversarial attack rates. Therefore, packet-based methods utilizing payload data are gaining much popularity due to their effectiveness in detecting certain attacks. However, packet-based approaches suffer from a lack of standardization, resulting in incomparability and reproducibility issues. Unlike flow-based datasets, no standard labeled dataset exists, forcing researchers to follow bespoke labeling pipelines for individual approaches. Without a standardized baseline, proposed approaches cannot be compared and evaluated with each other. One cannot gauge whether the proposed approach is a methodological advancement or is just being benefited from the proprietary interpretation of the dataset. Addressing comparability and reproducibility issues, we introduce Payload-Byte, an open-source tool for extracting and labeling network packets in this work. Payload-Byte utilizes metadata information and labels raw traffic captures of modern intrusion detection datasets in a generalized manner. Moreover, we transformed the labeled data into a byte-wise feature vector that can be utilized for training machine learning models. The whole cycle of processing and labeling is explicitly stated in this work. Furthermore, source code and processed data are made publicly available so that it may act as a standardized baseline for future research work. Lastly, we present a brief comparative analysis of machine learning models trained on packet-based and flow-based data.

UNSW-NB15 and CIC-IDS2017.


This work was supported in part by the U.S. Military Academy (USMA) under Cooperative Agreement No. W911NF-22-2-0081, the U.S. Army Combat Capabilities Development Command (DEVCOM) C5ISR Center under Support Agreement No. USMA21056, and the National Security Agency Laboratory for Advanced Cybersecurity Research under Interagency Agreement No. USMA21035. The views and conclusions expressed in this paper are those of the authors and do not reflect the official policy or position of the U.S. Military Academy, U.S. Army, U.S. Department of Defense, or U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. The U.S. Government reserves a royalty-free, nonexclusive and irrevocable right to reproduce, publish, or otherwise use this data for Federal purposes, and to authorize others to do so in accordance with 2 CFR 200.315(b).


Email Address of Submitting Author

ORCID of Submitting Author


Submitting Author's Institution

Texas A&M University

Submitting Author's Country

  • United States of America