Issues in GLMM family distribution fitting

martina_bovo

Hello everyone! I’m a PhD student working with eye-tracking data for the first time. First of all, I would like to thank Dev_start for the amazing tutorials provided on their website, through which I learned a lot about eye-tracking data processing and analysis.

I would like to share with the forum an issue I’m encountering when checking my model’s assumptions, particularly whether the predicted data distribution corresponds well to the observed data distribution. My dependent variable is saccadic latency (ms), and the distribution of my data is right-skewed and clearly non-normal.

I therefore decided to implement a GLMM using lme4 in R, choosing the Gamma family. However, the correspondence between the estimated and observed data distributions was poor, and the residuals were not uniformly distributed, possibly violating model assumptions. Subsequently, I chose a lognormal distribution (using the glmmTMB package). There was a slight improvement in the distribution correspondence, although the model still underestimates the median peak of my data; however, the residual distribution appears to have improved.

I am unsure at what point I should stop searching for a distribution family that better fits my data. Do you have any suggestions for other distributions that are suitable for reaction times or latencies? How can I determine when the correspondence between predicted and observed distributions is acceptable?

Thank you again for your help, and I hope this discussion may be useful to others who are approaching GLMMs for the first time.

Best regards,
Martina

Tommaso

Hi Martina,

this is a very common issue with reaction time / latency data, so you’re definitely not alone.

What you’re seeing makes sense: the families available in lme4, and also the ones you tried in glmmTMB, are not really optimal for reaction times. They are approximations that can sometimes be sufficient……

If you want distributions that usually work well with reaction times and latencies, I would strongly suggest looking into ex-Gaussian and lognormal models, which can be easily run using brms.

I know Bayesian modeling can seem a bit intimidating at first, but in practice the model specification is very similar to what you already did.

As for when to stop searching for a better distribution: there’s no perfect solution. In my opinion, the goal is not to match the observed distribution exactly, but to reach a model where (1) the main features of the data are captured (especially skew and tails), (2) residual diagnostics look reasonable, and (3) the model is theoretically appropriate for the process you’re studying.

An alternative sometimes used is log-transforming the data and fitting a Gaussian model, but this is generally something we tend to discourage (see e.g. https://osf.io/preprints/psyarxiv/9ksa6_v1).

martina_bovo

Thank you very much Tommaso, your suggestions are really helpful and clear to me! I will try to approach the Bayesian modeling and I’ll further look into my model’s assumption checks for the features you indicated.
Thanks again!!!
Best,

Martina