A semi-parametric density estimation with application in clustering

Loading...
Thumbnail Image

Authors

Salehi, Mahdi
Bekker, Andriette, 1958-
Arashi, Mohammad

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Abstract

The idea behind density-based clustering is to associate groups to the connected components of the level sets of the density of the data to be estimated by a nonparametric method. This approach claims some advantages over both distance- and model-based clustering. Some researchers developed this technique by proposing a graph theory–based method for identifying local modes of the underlying density being estimated by the well-known kernel density estimation (KDE) with normal and t kernels. The present work proposes a semi-parametric KDE with a more flexible family of kernels including skew-normal (SN) and skew-t (ST). We show that the proposed estimator not only reduces boundary bias but it is also closer to the actual density compared to that of the usual estimator employing the Gaussian kernel. Finding optimal bandwidth for one-dimensional and multidimensional cases under the mentioned asymmetric kernels is another main result of this paper where we shrink the bandwidth more than the one obtained under the normal assumption. Finally, through a comprehensive numerical study, we will illustrate the application of the proposed semi-parametric KDE on the density-based clustering using some simulated and real data sets.

Description

DAT AVAILABILITY : The geyser and the olive oil data sets are available in the R packages sm and pdfCluster, respectively.

Keywords

Density-based clustering, Clustering, Optimum bandwidth, Asymmetric kernels, Boundary bias, Density-based Silhouette, Kernel density estimation

Sustainable Development Goals

None

Citation

Salehi, M., Bekker, A. & Arashi, M. A Semi-parametric Density Estimation with Application in Clustering. Journal of Classification 40, 52–78 (2023). https://doi.org/10.1007/s00357-022-09425-9.