loading page

Efficient analysis of overdispersed data using an accurate computation of the Dirichlet multinomial distribution
  • Sherenaz Al-Haj Baddar ,
  • Alessandro Languasco ,
  • Mauro Migliardi
Sherenaz Al-Haj Baddar
Author Profile
Alessandro Languasco
Universita di Padova

Corresponding Author:[email protected]

Author Profile
Mauro Migliardi
Author Profile

Abstract

Modeling count data using suitable statistical distributions has been instrumental for analyzing the patterns it conveys. However, failing to address critical aspects, like overdispersion, jeopardizes the effectiveness of such an analysis. In this paper, overdispersed count data is modeled using the Dirichlet Multinomial (DM) distribution by maximizing its likelihood using a fixed-point iteration algorithm. This is achieved by estimating the DM distribution parameters while comparing the recent Languasco-Migliardi (LM), and the Yu-Shaw (YS) procedures, which address the well-known computational difficulties of evaluating its log-likelihood. Experiments were conducted using multiple datasets from different domains spanning polls, images, and IoT network traffic. They all showed the superiority of the LM procedure as it succeeded at estimating the DM parameters at the designated level of accuracy in all experiments, while the YS procedure failed to produce sufficiently accurate results (or any results at all) in several experiments. Moreover, the LM procedure achieved a speedup that ranged from 2-fold to 20-fold over YS.