A semi-parametric density estimation with application in clustering
Loading...
Date
Authors
Salehi, Mahdi
Bekker, Andriette, 1958-
Arashi, Mohammad
Journal Title
Journal ISSN
Volume Title
Publisher
Springer
Abstract
The idea behind density-based clustering is to associate groups to the connected components of the level sets of the density of the data to be estimated by a nonparametric method. This approach claims some advantages over both distance- and model-based clustering. Some researchers developed this technique by proposing a graph theory–based method for identifying local modes of the underlying density being estimated by the well-known kernel density estimation (KDE) with normal and t kernels. The present work proposes a semi-parametric KDE with a more flexible family of kernels including skew-normal (SN) and skew-t (ST). We show that the proposed estimator not only reduces boundary bias but it is also closer to the actual density compared to that of the usual estimator employing the Gaussian kernel. Finding optimal bandwidth for one-dimensional and multidimensional cases under the mentioned asymmetric kernels is another main result of this paper where we shrink the bandwidth more than the one obtained under the normal assumption. Finally, through a comprehensive numerical study, we will illustrate the application of the proposed semi-parametric KDE on the density-based clustering using some simulated and real data sets.
Description
DAT AVAILABILITY : The geyser and the olive oil data sets are available in the R packages sm and pdfCluster, respectively.
Keywords
Density-based clustering, Clustering, Optimum bandwidth, Asymmetric kernels, Boundary bias, Density-based Silhouette, Kernel density estimation
Sustainable Development Goals
None
Citation
Salehi, M., Bekker, A. & Arashi, M. A Semi-parametric Density Estimation with Application in Clustering. Journal of Classification 40, 52–78 (2023). https://doi.org/10.1007/s00357-022-09425-9.
