A semi-parametric density estimation with application in clustering

Salehi, Mahdi; Bekker, Andriette, 1958-; Arashi, Mohammad

A semi-parametric density estimation with application in clustering

Files

Salehi_SemiParametric_2023.pdf (7.83 MB)

Date

2023-04

Authors

Salehi, Mahdi

Bekker, Andriette, 1958-

Arashi, Mohammad

Publisher

Springer

Abstract

The idea behind density-based clustering is to associate groups to the connected components of the level sets of the density of the data to be estimated by a nonparametric method. This approach claims some advantages over both distance- and model-based clustering. Some researchers developed this technique by proposing a graph theory–based method for identifying local modes of the underlying density being estimated by the well-known kernel density estimation (KDE) with normal and t kernels. The present work proposes a semi-parametric KDE with a more flexible family of kernels including skew-normal (SN) and skew-t (ST). We show that the proposed estimator not only reduces boundary bias but it is also closer to the actual density compared to that of the usual estimator employing the Gaussian kernel. Finding optimal bandwidth for one-dimensional and multidimensional cases under the mentioned asymmetric kernels is another main result of this paper where we shrink the bandwidth more than the one obtained under the normal assumption. Finally, through a comprehensive numerical study, we will illustrate the application of the proposed semi-parametric KDE on the density-based clustering using some simulated and real data sets.

Description

DAT AVAILABILITY : The geyser and the olive oil data sets are available in the R packages sm and pdfCluster, respectively.

Keywords

Density-based clustering, Clustering, Optimum bandwidth, Asymmetric kernels, Boundary bias, Density-based Silhouette, Kernel density estimation

Sustainable Development Goals

None

Citation

Salehi, M., Bekker, A. & Arashi, M. A Semi-parametric Density Estimation with Application in Clustering. Journal of Classification 40, 52–78 (2023). https://doi.org/10.1007/s00357-022-09425-9.

URI

http://hdl.handle.net/2263/95232

Collections

Research Articles (Statistics)
Research Articles (University of Pretoria)

Full item page

A semi-parametric density estimation with application in clustering

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections