Analyzing Biomedical Datasets with Symbolic Tree Adaptive Resonance Theory

Sasha Petrenko; Daniel Hier; Tayo Obafemi-Ajayi; Mary Bone; Erik Timpson; Michael Speight

doi:10.36227/techrxiv.24542782.v2

loading page

Analyzing Biomedical Datasets with Symbolic Tree Adaptive Resonance Theory

Sasha Petrenko ,
Daniel Hier ,
Tayo Obafemi-Ajayi ,
Mary Bone ,
Erik Timpson ,
Michael Speight

Abstract

Biomedical datasets distill many mechanisms of human diseases, linking diseases to genes and phenotypes (signs and symptoms of disease), genetic mutations to altered protein structures, and altered proteins to changes in molecular functions and biological processes. It is desirable to gain new insights from these data, especially with regard to the uncovering of hierarchical structures relating disease variants. However, analysis to this end has proven difficult due to the complexity of the connections between multicategorical symbolic data. This article proposes Symbolic Tree Adaptive Resonance Theory (START), with additional supervised, Dual-Vigilance (DV-START), and Distributed Dual-Vigilance (DDV-START) formulations, for the clustering of multicategorical symbolic data from biomedical datasets by demonstrating its utility in clustering variants of Charcot-Marie-Tooth disease using genomic, phenotypic, and proteomic data.

AUTHORS NOTE: this article outlines the Symbolic Tree Adaptive Resonance Theory (START) machine learning algorithm, which is unrelated to the similarly named Spectral Timing Adaptive Resonance Theory (START) explanatory neural network model.