Abstract
Biomedical datasets distill many mechanisms of human diseases, linking
diseases to genes and phenotypes (signs and symptoms of disease),
genetic mutations to altered protein structures, and altered proteins to
changes in molecular functions and biological processes. It is desirable
to gain new insights from these data, especially with regard to the
uncovering of hierarchical structures relating disease variants.
However, analysis to this end has proven difficult due to the complexity
of the connections between multicategorical symbolic data. This article
proposes Symbolic Tree Adaptive Resonance Theory (START), with
additional supervised, Dual-Vigilance (DV-START), and Distributed
Dual-Vigilance (DDV-START) formulations, for the clustering of
multicategorical symbolic data from biomedical datasets by demonstrating
its utility in clustering variants of Charcot-Marie-Tooth disease using
genomic, phenotypic, and proteomic data.
AUTHORS NOTE: this article outlines the Symbolic Tree Adaptive Resonance
Theory (START) machine learning algorithm, which is unrelated to the
similarly named Spectral Timing Adaptive Resonance Theory (START)
explanatory neural network model.