TechRxiv
1/1
2 files

Transparent Dimension Reduction by Feature Construction with Genetic Algorithm

preprint
posted on 2023-01-19, 03:32 authored by Nikita RadeevNikita Radeev

There are domain areas where all transformations of data must be transparent and interpretable (medicine and finance for example). Dimension reduction is an important part of a preprocessing pipeline but algorithms for it are not transparent at the current time. In this work, we provide a genetic algorithm for transparent dimension reduction of numerical data. The algorithm constructs features in a form of expression trees based on a subset of numerical features from the source data and common arithmetical operations. It is designed to maximize quality in binary classification tasks and generate features explainable by a human which achieves by using human-interpretable operations in a feature construction. Also, data transformed by the algorithm can be used in a visual analysis because the algorithm builds features that make space linearly separable using distance criteria in a fitness function to shift classes from each other as far as possible without loss of classification quality. The multicriterial dynamic fitness function is provided to build features with high diversity.

History

Email Address of Submitting Author

rdvnkt@yanex.ru

ORCID of Submitting Author

https://orcid.org/0000-0002-4334-5725

Submitting Author's Institution

Novosibirsk State University

Submitting Author's Country

  • Russian Federation