Abstract:
The setting where an unknown number m of the largest data
is missing from an underlying Pareto-type distribution is considered. Solutions
are provided for estimating the extreme value index, the number
of missing data and extreme quantiles. Asymptotic results of the parameter
estimators and an adaptive selection method for the number of top
data used in the estimation are proposed for the case where all missing
data are beyond the observed data. An estimator of the number of missing
extremes spread over the largest observed data is also proposed. To this
purpose, a key component is a likelihood solution based on exponential
representations of spacings between the largest observations. An effective
and fast optimization procedure is established using regularization, and
simulation experiments are provided. The methodology is illustrated with
a dataset from the diamond mining industry, where large-carat diamonds
are expected to be missing.