loading page

SVM transformations for Multi-labeled Topics
  • Amr Elsayed
Amr Elsayed
Author Profile

Abstract

The rapid growth of research papers poses a significant challenge for manual curation and interpretation for categorizing the articles into one or more related topics. The Support Vector Machine (SVM) model addressed this multi-labeled topic classification problem over different datasets, however, transforming the multi-labeled dataset into single-labeled ones to fit the SVM-required modeling plays a vital role in the performance. Applying different transformations over the problem leads to different behavior and accuracy for SVM over datasets.
This paper employs the SVM model on a research multi-labeled articles dataset. It addresses different kinds of transformations to measure their behavior. It proposes the Least Class Classifier (LCC) technique that challenges the problem of the imbalanced datasets to achieve an equal chance for the minor classes. The results showed the label powerset transformation achieved the best average accuracy score across all topics classification. Both the label powerset and the binary relevance reached $\approx 90\%$ as the Hamming loss measurement for the fraction of topics that are incorrectly assigned. However, the binary relevance illustrated the best recall and precision balancing as per class classification measurements. Moreover, the proposed LCC technique showed promising results for increasing the recall calculations for the minor class in the imbalanced dataset.