PaperSwipe

Clustering country-level all-cause mortality data: a review

Published 2 days agoVersion 1arXiv:2512.04831

Authors

Pedro Menezes de Araujo, Isobel Claire Gormley, Thomas Brendan Murphy

Categories

stat.AP

Abstract

Mortality data are relevant to demography, public health, and actuarial science. Whilst clustering is increasingly used to explore patterns in such data, no study has reviewed its application to country-level all-cause mortality. This review therefore summarises recent work and addresses key questions: why clustering is used, which mortality data are analysed, which methods are most common, and what main findings emerge. To address these questions, we examine studies applying clustering to country-level all-cause mortality, focusing on mortality indices, data sources, and methodological choices, and we replicate some approaches using Human Mortality Database (HMD) data. Our analysis reveals that clustering is mainly motivated by forecasting and by studying convergence and inequality. Most studies use HMD data from developed countries and rely on k-means, hierarchical, or functional clustering. Main findings include a persistent East-West European division across applications, with clustering generally improving forecast accuracy over single-country models. Overall, this review highlights the methodological range in the literature, summarises clustering results, and identifies gaps, such as the limited evaluation of clustering quality and the underuse of data from countries outside the high-income world.

Clustering country-level all-cause mortality data: a review

2 days ago
v1
3 authors

Categories

stat.AP

Abstract

Mortality data are relevant to demography, public health, and actuarial science. Whilst clustering is increasingly used to explore patterns in such data, no study has reviewed its application to country-level all-cause mortality. This review therefore summarises recent work and addresses key questions: why clustering is used, which mortality data are analysed, which methods are most common, and what main findings emerge. To address these questions, we examine studies applying clustering to country-level all-cause mortality, focusing on mortality indices, data sources, and methodological choices, and we replicate some approaches using Human Mortality Database (HMD) data. Our analysis reveals that clustering is mainly motivated by forecasting and by studying convergence and inequality. Most studies use HMD data from developed countries and rely on k-means, hierarchical, or functional clustering. Main findings include a persistent East-West European division across applications, with clustering generally improving forecast accuracy over single-country models. Overall, this review highlights the methodological range in the literature, summarises clustering results, and identifies gaps, such as the limited evaluation of clustering quality and the underuse of data from countries outside the high-income world.

Authors

Pedro Menezes de Araujo, Isobel Claire Gormley, Thomas Brendan Murphy

arXiv ID: 2512.04831
Published Dec 4, 2025

Click to preview the PDF directly in your browser