TechRxiv
ALM1.pdf (375.08 kB)
Download file

Efficient analysis of overdispersed data using an accurate computation of the Dirichlet multinomial distribution

Download (375.08 kB)
preprint
posted on 2023-09-07, 17:12 authored by Sherenaz Al-Haj Baddar, Alessandro LanguascoAlessandro Languasco, Mauro Migliardi

Modeling count data using suitable statistical distributions has been instrumental for analyzing the patterns it conveys. However, failing to address critical aspects, like overdispersion, jeopardizes the effectiveness of such an analysis. In this paper, overdispersed count data is modeled using the Dirichlet Multinomial (DM) distribution by maximizing its likelihood using a fixed-point iteration algorithm. This is achieved by estimating the DM distribution parameters while comparing the recent Languasco-Migliardi (LM), and the Yu-Shaw (YS) procedures, which address the well-known computational difficulties of evaluating its log-likelihood. Experiments were conducted using multiple datasets from different domains spanning polls, images, and IoT network traffic. They all showed the superiority of the LM procedure as it succeeded at estimating the DM parameters at the designated level of accuracy in all experiments, while the YS procedure failed to produce sufficiently accurate results (or any results at all) in several experiments. Moreover, the LM procedure achieved a speedup that ranged from 2-fold to 20-fold over YS.

Funding

none

History

Email Address of Submitting Author

alessandro.languasco@unipd.it

ORCID of Submitting Author

0000-0003-2723-554X

Submitting Author's Institution

Universita' di Padova

Submitting Author's Country

  • Italy