Detecting Anomalies in Logs by Combining NLP features with Embedding or TF-IDF

Arpanjeet Sandhu; Sabah Mohammed

doi:10.36227/techrxiv.19498769.v1

loading page

Detecting Anomalies in Logs by Combining NLP features with Embedding or TF-IDF

Arpanjeet Sandhu ,
Sabah Mohammed

Abstract

Following image classification, the focus is now shifting to text categorization. Text classification has numerous real-world uses. Using categories to tag information or items to improve browsing or identify related stuff on your website. The practise of categorising text into ordered groupings is known as text classification, sometimes known as text tagging or text categorization. Text classifiers can automatically assess text and assign a set of pre-defined tags or categories depending on its content using Natural Language Processing (NLP). In this study, we will try to create a model that can detect abnormalities in a log data collection by combining NLP features with other methodologies. This is handled as a text categorization challenge. That is why we evaluate two of the most well-known techniques while also using extra features extracted from the data set. We shall contrast the bag of words technique with the embedding technique. As opposed to Bag of words, embedding tries to preserve the meaning of the sentence, which can aid with text classification.