Transparent Dimension Reduction by Feature Construction with Genetic Algorithm
There are domain areas where all transformations of data must be transparent and interpretable (medicine and finance for example). Dimension reduction is an important part of a preprocessing pipeline but algorithms for it are not transparent at the current time. In this work, we provide a genetic algorithm for transparent dimension reduction of numerical data. The algorithm constructs features in a form of expression trees based on a subset of numerical features from the source data and common arithmetical operations. It is designed to maximize quality in binary classification tasks and generate features explainable by a human which achieves by using human-interpretable operations in a feature construction. Also, data transformed by the algorithm can be used in a visual analysis because the algorithm builds features that make space linearly separable using distance criteria in a fitness function to shift classes from each other as far as possible without loss of classification quality. The multicriterial dynamic fitness function is provided to build features with high diversity.
History
Email Address of Submitting Author
rdvnkt@yanex.ruORCID of Submitting Author
https://orcid.org/0000-0002-4334-5725Submitting Author's Institution
Novosibirsk State UniversitySubmitting Author's Country
- Russian Federation