TechRxiv
IEEE_format.pdf (691.42 kB)
Download file

Detecting Anomalies in Logs by Combining NLP features with Embedding or TF-IDF

Download (691.42 kB)
preprint
posted on 13.04.2022, 00:49 authored by Arpanjeet SandhuArpanjeet Sandhu, Sabah MohammedSabah Mohammed

Following image classification, the focus is now shifting to text categorization. Text classification has numerous real-world uses. Using categories to tag information or items to improve browsing or identify related stuff on your website. The practise of categorising text into ordered groupings is known as text classification, sometimes known as text tagging or text categorization. Text classifiers can automatically assess text and assign a set of pre-defined tags or categories depending on its content using Natural Language Processing (NLP). In this study, we will try to create a model that can detect abnormalities in a log data collection by combining NLP features with other methodologies. This is handled as a text categorization challenge. That is why we evaluate two of the most well-known techniques while also using extra features extracted from the data set. We shall contrast the bag of words technique with the embedding technique. As opposed to Bag of words, embedding tries to preserve the meaning of the sentence, which can aid with text classification.

History

Email Address of Submitting Author

asandhu9@lakeheadu.ca

Submitting Author's Institution

Lakehead University, Thunder Bay, Ontario

Submitting Author's Country

Canada