A semi-parametric density estimation with application in clustering

dc.contributor.authorSalehi, Mahdi
dc.contributor.authorBekker, Andriette, 1958-
dc.contributor.authorArashi, Mohammad
dc.date.accessioned2024-03-15T12:21:11Z
dc.date.available2024-03-15T12:21:11Z
dc.date.issued2023-04
dc.descriptionDAT AVAILABILITY : The geyser and the olive oil data sets are available in the R packages sm and pdfCluster, respectively.en_US
dc.description.abstractThe idea behind density-based clustering is to associate groups to the connected components of the level sets of the density of the data to be estimated by a nonparametric method. This approach claims some advantages over both distance- and model-based clustering. Some researchers developed this technique by proposing a graph theory–based method for identifying local modes of the underlying density being estimated by the well-known kernel density estimation (KDE) with normal and t kernels. The present work proposes a semi-parametric KDE with a more flexible family of kernels including skew-normal (SN) and skew-t (ST). We show that the proposed estimator not only reduces boundary bias but it is also closer to the actual density compared to that of the usual estimator employing the Gaussian kernel. Finding optimal bandwidth for one-dimensional and multidimensional cases under the mentioned asymmetric kernels is another main result of this paper where we shrink the bandwidth more than the one obtained under the normal assumption. Finally, through a comprehensive numerical study, we will illustrate the application of the proposed semi-parametric KDE on the density-based clustering using some simulated and real data sets.en_US
dc.description.departmentStatisticsen_US
dc.description.librarianhj2024en_US
dc.description.sdgNoneen_US
dc.description.sponsorshipThe South African National Research Foundation SARChI Research Chair in Computational and Methodological Statistics, STATOMET at the Department of Statistics at the University of Pretoria.en_US
dc.description.urihttps://link.springer.com/journal/357en_US
dc.identifier.citationSalehi, M., Bekker, A. & Arashi, M. A Semi-parametric Density Estimation with Application in Clustering. Journal of Classification 40, 52–78 (2023). https://doi.org/10.1007/s00357-022-09425-9.en_US
dc.identifier.issn0176-4268 (print)
dc.identifier.issn1432-1343 (online)
dc.identifier.other10.1007/s00357-022-09425-9
dc.identifier.urihttp://hdl.handle.net/2263/95232
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.rights© The Author(s) under exclusive licence to The Classification Society 2022. The original publication is available at https://link.springer.com/journal/357.en_US
dc.subjectDensity-based clusteringen_US
dc.subjectClusteringen_US
dc.subjectOptimum bandwidthen_US
dc.subjectAsymmetric kernelsen_US
dc.subjectBoundary biasen_US
dc.subjectDensity-based Silhouetteen_US
dc.subjectKernel density estimationen_US
dc.titleA semi-parametric density estimation with application in clusteringen_US
dc.typePostprint Articleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Salehi_SemiParametric_2023.pdf
Size:
7.83 MB
Format:
Adobe Portable Document Format
Description:
Postprint Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: