Abstract
Studying genetic variation underlying phenotypes is an important topic
in genomics. In plant genomic research, for example, scientists analyze
the variation between cultivars and wild types to develop crops with
improved resistance to diseases. This analysis is commonly based on
comparison to a single reference genome. Because the number of genomes
is growing rapidly and to avoid bias towards a single reference genome,
the field is shifting towards the use of pangenomes, i.e., abstract
representations of multiple genomes in a species or population. While
pangenomes allow for a more complete picture of the genetic variation,
their large size and complex data structure hinder analysis. To deal
with this, genome scientists need visual analytics tools that support
interactive and exploratory analysis of pangenomes to identify relevant
information for variant analysis. A major challenge is to handle
multiple references together with providing the adequate context of
heterogeneous (meta)data, such as annotations, evolutionary
relationships, and phenotypes. To address this challenge, we developed
PanVA, a visual analytics design for variant analysis in pangenomes.
PanVA supports a novel strategy for pangenomic variant analysis that was
designed with the active participation of genomics researchers. PanVA
uniquely allows researchers to get a complete picture of the variation
within genes in a large set of genomes, and identify associations with
phenotypes. The design combines tailored visual representations with
interactions such as sorting, grouping and aggregation, allowing the
user to navigate and explore different perspectives. The realization of
the PanVA design is possible through PanTools. Through user evaluation
in the context of plants and pathogen research, we demonstrate that
PanVA helps researchers explore regions of interest and generate
hypotheses about genetic variants and their role in phenotypic
variation.