Research Articles (Statistics)
Permanent URI for this collectionhttp://hdl.handle.net/2263/1835
A collection containing some of the full text
peer-reviewed/
refereed articles published by researchers from
the
Department
of Statistics
Browse
Recent Submissions
Now showing 1 - 20 of 569
Item Growth, neurodevelopmental outcomes and micronutrient intake in 18-month-old children with exposure to maternal human immunodeficiency virus and placental insufficiency : the UmbiGodisa cross-sectional study(Elsevier, 2025-08) Nyofane, Mothusi; Hoffman, Marinel; Mulol, Helen, Percival; Botha, Tanita , Gladys; Pattinson, Robert Clive; Feucht, Ute Dagmar; mothusi.nyofane@tuks.co.zaBACKGROUND AND AIM : Maternal human immunodeficiency virus (HIV) and intrauterine growth restriction (IUGR) are both associated with suboptimal childhood growth and neurodevelopment. This study assessed growth and neurodevelopmental outcomes and micronutrient intakes in children who are HIV-exposed-uninfected (CHEU), compared to HIV-unexposed-uninfected children (CHUU), stratified based on evidence of placental insufficiency. METHODS : Placental insufficiency, as proxy for IUGR, was identified using abnormal umbilical artery resistance indices (UmA-RI) on pregnancy Doppler ultrasound. At 18-months postpartum, 264 mother–child pairs were evaluated and categorized into four subgroups: CHUU with normal UmA-RI (control group), CHEU with normal UmA-RI (HIV exposure only), CHUU with abnormal UmA-RI (placental insufficiency only) and CHEU with abnormal UmA-RI (double-exposure). Dietary intake was assessed using a single 24-h dietary recall, and dietary intake of iron, zinc, and iodine was quantified by meal analysis on FoodFinder™ 3.0. Anthropometric data were collected and converted into z-scores. The Bayley Scales of Infant and Toddler Development (Bayley-III) assessed cognitive, language, and motor function. Statistical comparisons used t-test or Mann–Whitney U-tests; associations were analyzed with Spearman's correlation. RESULTS : Children with dual exposure (CHEU/AbN-RI) had significantly lower z-scores compared to the control group, including length-for-age z-score (1.4 ± 1.4 vs 0.0 ± 1.3; p = 0.001), weight-for-age zscore (0.6 ± 1.0 vs 0.0 ± 1.2; p = 0.024) and head circumference-for-age z-score (0.4 ± 0.7 vs 0.9 ± 1.2; p = 0.035). Mean cognitive scores were also lower in this group (93.9 ± 12.9 vs 100.1 ± 10.8; p = 0.042). Language composite scores were low across all groups. Higher zinc intake was positively associated with language scores (r = 0.10; p = 0.042) and weight-for-age z-scores were associated with motor outcomes (r = 0.10; p = 0.028). Among CHEU, better growth parameters were positively associated with cognitive and motor developmental domains.Item Comparisons of cox semi-parametric and parametric shared frailty models : application for under-five children survival in sub-Saharan Africa(BioMed Central, 2025-08-22) Fenta, Haile Mekonnen; Chen, Ding-Geng (Din); Zewotir, Temesgen T.; Rad, Najmeh Nakhaei; Belay, Denekew Bitew; Yilema, Seyifemickael AmareBACKGROUND : The under-five child mortality in sub-Saharan African (sSA) countries is a persistent problem with limited effort being made to explore the determinants of disparities across countries and their lower administrative districts. A child’s survival may depend on several known and unknown covariates and vary across the study areas. The main objective of this study is to assess the time to death of under-five children and its associated risk factors by comparing the performance of semiparametric and parametric frailty models across sSA regions. METHODS : We used a dataset from the Demographic and Health Survey (DHS) across 33 sSA countries. The semiparametric and parametric models with different frailty distributions were used to model the under-five survival time of children across the administrative districts of 33 sSA countries. RESULTS : A total of 330,373 under-five children were included in the study, of whom 19,893 (6.02%) died before reaching their 5th birthday. Unobserved country-level variance (0 .421) and district-level variance (0.183) effects considerably impacted the survival time of under-five children in sSA countries. Under-five children born to mothers aged 25–29 and 30–49 were 16% and 20% less likely to die compared to children born to mothers younger than 24 years. Moreover, children born in rural areas were 8.3% more likely to die than those who were born in urban areas. Children who were born from mothers with better access to improved water sources and clean fuel were 9% and 11% less likely to die than their counterparts, respectively. CONCLUSIONS : The exponential shared frailty hazard model with lognormal frailty distribution demonstrated better performance compared to the Cox semiparametric model for identifying risk factors for under-five children across sSA countries. Place of residence, wealth index, media exposure, birth order, birth interval, access to improved water, and use of clean fuels for cooking were the significant risk factors on time to death of under-five children in sSA.Item Computationally efficient Bayesian inference for semi-parametric joint models of competing risks survival and skewed longitudinal data using integrated nested Laplace approximation(BioMed Central, 2025-09) Ferede, Melkamu Molla; Nakhaei Rad, Najmeh; Chen, Ding-Geng (Din)BACKGROUND : Joint modeling is widely used in medical research to properly analyze longitudinal biomarkers and survival outcomes simultaneously and to guide appropriate interventions in public health. However, such models become increasingly complex and computationally intensive when accounting for multiple features of these outcomes. The need for computationally efficient methods in joint modeling of competing risks survival outcomes and longitudinal biomarkers is particularly critical in clinical and epidemiological settings, where prompt decision-making is essential. Moreover, there is very little literature on joint modeling of competing risks survival and skewed longitudinal data using Integrated Nested Laplace Approximations (INLA), despite its growing popularity in Bayesian inference. This paper presents a computationally efficient inference approach for modeling competing risks survival and skewed longitudinal data using INLA. METHODS : We propose cause-specific competing risks joint models with a semi-parametric mixed-effects longitudinal submodel and second-order random walk baseline hazards. The proposed models are reformulated as latent Gaussian models to enable efficient Bayesian inference using INLA. The INLA approach and its R packages are also presented. Various smoothing spline functions, distributions, and association structures were evaluated for both approaches. The INLAjoint and R2WinBUGS R packages were employed for the INLA and Markov-Chain Monte-Carlo (MCMC) approaches, respectively, to approximate the posterior marginals of the proposed joint models. Model comparisons and performance evaluations were performed using the deviance information criterion, relative bias, coverage probability, and root mean squared error. RESULTS : We evaluated the computational efficiency and estimation performance of the INLA and MCMC approaches using real-world chronic kidney disease (CKD) follow-up data and an extensive confirmatory simulation study. We also conducted several model comparisons by considering different specifications related to smoothing spline approximations, non-Gaussian (skewed) distributions, and association structures to identify the best-fitting models for the CKD data and ensure robust statistical inference. CONCLUSION : The application and simulation results revealed that both approaches provide accurate statistical estimation and inference. However, INLA significantly reduces the computational burden of the proposed joint models.Item Pseudo‐observation approach for length‐biased Cox proportional hazards model(Wiley, 2025-12) Akbari, Mahboubeh; Rad, Najmeh Nakhaei; Chen, Ding-Geng (Din); mahboubeh.akbarilakeh@up.ac.zaPseudo-observations are used to estimate the expectation of a function of interest in a population when survival data are incomplete due to censoring or truncation. Length-biased sampling is a special case of a left-truncation model, in which the truncation variable follows a uniform distribution. This phenomenon is commonly encountered in various fields such as survival analysis and epidemiology, where the event of interest is related to the length or duration of an underlying process. In such settings, the probability of observing a data point is higher for longer lengths, leading to biased sampling. The goal of this paper is to apply pseudo-observations to estimate the regression coefficients in the Cox proportional hazards model under length-biased right-censored (LBRC) data. We assess the accuracy and efficiency of two approaches that differ in their generation of pseudo-observations, comparing them with two prominent standard methods in the presence of LBRC data. The results demonstrate that the two proposed pseudo-observation methods are comparable to the standard methods in terms of standard error, with advantages in providing confidence intervals that are closer to the nominal level in large sample sizes and specific scenarios. Additionally, although length-biased data are a special case of left-truncated data, they must be addressed separately by utilizing the information that the left-truncation variable follows a uniform distribution, as the simulation results show. We also establish the consistency and asymptotic normality of one of the proposed estimators. Finally, we applied the method to analyze a real dataset from LBRC.Item Mapping the covariate-adjusted spatial effects of childhood anemia in Ethiopia using a semi-parametric additive model(Frontiers Media, 2025-08-21) Yilema, Seyifemickael Amare; Shiferaw, Yegnanew A.; Nakhaeirad, Najmeh; Chen, Ding-Geng (Din)BACKGROUND : Globally, anemia poses a serious health challenge for children under the age of five, and Ethiopia is one of the countries significantly affected by this issue. The 2016 Ethiopian Demographic and Health Survey (DHS) data sets were employed to evaluate anemia risk among children aged 6–59 months. Due to limited research has been conducted on childhood anemia spatial disparities at the Ethiopian zonal level, and it is essential for developing zonal-level interventions for inform policy recommendations. METHODS : This study was examined the geospatial disparities in anemia prevalence among children aged 6–59 months. We used a semi-parametric additive model with spatial smoothing to assess zone-level variation in anemia risk while adjusting for key covariates. Each predictor variable was spatially adjusted using non-parametric smoothing techniques based on geolocation parameters, and corresponding maps for each predictor. RESULTS : A regularized random forest techniques was employed to identify the most influential predictors of childhood anemia and enhance the model predictive performance. Our findings revealed that the regional states of Somalia, Afar, and Dire Dawa exhibit the highest risk levels for childhood anemia. Furthermore, the risk of anemia in children varies spatially across different zones in Ethiopia. The most prominent hotspots for childhood anemia were in the country's Northeastern, Eastern, and Southeastern regions. In contrast, the areas with the lowest risk were in Northwestern, Western, and Southwestern zones of Ethiopia. CONCLUSION : The significant spatial disparities in anemia risk across the administrative zones of Ethiopia, indicating that the distribution of each predictor variable is not uniform. These findings provide valuable insights for policymakers, enabling the development of geographically targeted interventions to mitigate anemia risk at the zonal level.Item The outcomes of a multifaceted educational intervention to reduce moral distress among critical care nurses(Wiley, 2025-11) Aljabery, Mohannad; Coetzee-Prinsloo, Isabel M.; Van der Wath, Anna Elizabeth; Al-Awabdeh, Eman; Masenge, AndriesAIM : To measure the outcome of the implementation of a multifaceted educational intervention on the impact of moral distress among critical care nurses. BACKGROUND : The complex nature of critical care settings exaggerates different morally distressing situations that require ongoing development of interventions to mitigate the impact of moral distress. Despite the availability of research that has addressed moral distress among nurses in the literature, there is a debate about the effectiveness of the applied interventions in reducing moral distress. DESIGN : A quasi-experimental pretest-posttest control group study design. METHODS : Critical care nurses in two public hospitals in the Emirate of Abu Dhabi, UAE enrolled in a study that extended over 6 months. Hospital A was assigned as an experimental group (n = 76) and received four educational sessions and three booster sessions. Hospital B was assigned as a control group (n = 82) and didn't receive any moral distress-related education. The Measure of Moral Distress for Health Care Professionals questionnaire and the Moral Distress Thermometer were utilised to measure the participants' moral distress frequency, intensity, and composite scores pre- and post-intervention and identify the outcomes. RESULTS : The multifaceted educational intervention exhibited statistically significant reductions in the experimental group frequency, intensity, and composite moral distress scores post-test. Conversely, moral distress scores were increased among the control group. Moreover, the intervention significantly reduced the number of nurses who intended to leave their positions from 58 nurses to 47 nurses in the experimental group. CONCLUSION : The multifaceted educational intervention exerts positive outcomes in reducing moral distress across all the dimensions and improving the nurses' retention. RELEVANCE TO CLINICAL PRACTICE : The intervention provides materials that could enhance the nurses' moral knowledge and skills. It provides different tools, techniques, and strategies to help the nurses address and manage their moral distress. SUMMARY What does this paper contribute to the wider global clinical community? ○ The effectiveness of the multifaceted educational intervention in mitigating the moral distress of critical care nurses in a diverse setting like the United Arab Emirates makes it suitable to be adopted and implemented in other countries with diverse healthcare settings. ○ The developed intervention could be adopted by hospitals to be a part of their continuous education to enhance the moral knowledge and skills among nurses in other disciplines. ○ The developed moral distress self-reflection form provides an alternative method to act against a morally distressing situation. The form can be adopted by healthcare institutions and added to their portal to facilitate anonymous reporting and solving of morally distressing situations. ○ The developed self-screening Moral Distress Pathway guides the nurses in the field step-by-step to promptly recognise their moral distress, take proactive measures to seek proper support, and determine the appropriate action to take.Item A contaminated regression model for count health data(Sage, 2025-02) Otto, Arnoldus F.; Ferreira, Johannes Theodorus; Tomarchio, Salvatore Daniele; Bekker, Andriette, 1958-; Punzo, Antonio; arno.otto@up.ac.zaIn medical and health research, investigators are often interested in countable quantities such as hospital length of stay (e.g., in days) or the number of doctor visits. Poisson regression is commonly used to model such count data, but this approach can’t accommodate overdispersion—when the variance exceeds the mean. To address this issue, the negative binomial (NB) distribution (NB-D) and, by extension, NB regression provide a well-documented alternative. However, real-data applications present additional challenges that must be considered. Two such challenges are (i) the presence of (mild) outliers that can influence the performance of the NB-D and (ii) the availability of covariates that can enhance inference about the mean of the count variable of interest. To jointly address these issues, we propose the contaminated NB (cNB) distribution that exhibits the necessary flexibility to accommodate mild outliers. This model is shown to be simple and intuitive in interpretation. In addition to the parameters of the NB-D, our proposed model has a parameter describing the proportion of mild outliers and one specifying the degree of contamination. To allow available covariates to improve the estimation of the mean of the cNB distribution, we propose the cNB regression model. An expectation-maximization algorithm is outlined for parameter estimation, and its performance is evaluated through a parameter recovery study. The effectiveness of our model is demonstrated via a sensitivity analysis and on two health datasets, where it outperforms well-known count models. The methodology proposed is implemented in an R package which is available at https://github.com/arnootto/cNB.Item Leave-group-out cross-validation for latent Gaussian models(Institut d'Estadistica de Catalunya, 2025-07-04) Liu, Zhedong; Van Niekerk, Janet; Rue, HåvardEvaluating the predictive performance of a statistical model is commonly done using cross-validation. Among the various methods, leave-one-out cross-validation (LOOCV) is frequently used. Originally designed for exchangeable observations, LOOCV has since been extended to other cases such as hierarchical models. However, it focuses rimarily on short-range prediction and may not fully capture long-range prediction scenarios. For structured hierarchical models, particularly those involving multiple random effects, the concepts of short- and long-range predictions become less clear, which can complicate the interpretation of LOOCV results. In this paper, we propose a complementary cross-validation framework specifically tailored for longer-range prediction in latent Gaussian models, including those with structured random effects. Our approach differs from LOOCV by excluding a carefully constructed set from the training set, which better emulates longer-range prediction conditions. Furthermore, we achieve computational efficiency by adjusting the full joint posterior for this modified cross-validation, thus eliminating the need for model refitting. This method is implemented in the R-INLA package (www.r-inla.org) and can be adapted to a variety of inferential frameworks.Item Testing exponentiality based on Gini-index characterization(Springer, 2025-10) Akbari, Mahboubeh; Akbari, Masoumeh; Chen, Ding-Geng (Din); mahboubeh.akbarilakeh@up.ac.zaThe exponential distribution possesses several important properties that make it valuable in statistical inference and applications, such as reliability analysis, queueing theory, and survival analysis. Based on a Gini-index characterization for the exponential distribution, we propose different statistics for testing exponentiality under both complete data and right-censored data. Asymptotic results of the proposed test statistics are studied and a large Monte-Carlo simulation study is designed and performed to evaluate the performance of these statistics and to compare them against the best existing tests. Simulation studies indicate that the proposed tests are comparable to the best existing methods for complete data, while offering simple implementation and robust performance across various alternatives–including IFR, DFR, and UFR–and showing particular effectiveness for small sample sizes and under IFR and UFR alternatives in right-censored data. Finally, three real data sets are used to demonstrate the applicability of the proposed tests.Item A generalized homogeneously weighted moving average monitoring scheme for monitoring the process mean(Universal Wiser Publisher, 2025-07) Thanwane, Maonatlala; Malela-Majika, Jean-Claude; Kanfer, Frans H.J.; Chatterjee, Kashinath; malela.mjc@up.ac.zaPlease read abstract in the article.Item A framework for analysing point patterns on nonconvex domains using visibility graphs and multidimensional scaling(Elsevier, 2025-12) Mahloromela, Kabelo; Fabris-Rotelli, Inger Nicolette; kabelo.mahloromela@up.ac.zaA point pattern is typically analysed to understand the first- and second-order properties of the underlying point process. These properties are usually inferred using estimation procedures that depend on interpoint distance and are thus sensitive to the choice of distance metric. Euclidean distance is conventionally used to quantify proximity between points, but it does not accurately reflect spatial relationships when points are constrained within irregular, nonconvex spatial domains. Herein, we propose a strategy to embed visibility graph distances into Euclidean metric space using multidimensional scaling. The aim is to simplify analyses, leverage well-developed methods based on Euclidean distance, and retain, as far as possible, the true proximity relationships on a nonconvex spatial domain. The kernel smoothed intensity estimate and the K-function are computed in this new spatial context and used to validate the effectiveness of the embedding strategy.Item Soft computing for the posterior of a matrix t graphical network(Elsevier, 2025-05) Pillay, Jason; Bekker, Andriette, 1958-; Ferreira, Johannes Theodorus; Arashi, Mohammad; andriette.bekker@up.ac.zaModeling noisy data in a network context remains an unavoidable obstacle; fortunately, random matrix theory may comprehensively describe network environments. Noisy data necessitates the probabilistic characterization of these networks using matrix variate models. Denoising network data using a Bayesian approach is not common in surveyed literature. Therefore, this paper adopts the Bayesian viewpoint and introduces a new version of the matrix variate t graphical network. This model's prior beliefs rely on the matrix variate gamma distribution to handle the noise process flexibly; from a statistical learning viewpoint, such a theoretical consideration benefits the comprehension of structures and processes that cause network-based noise in data as part of machine learning and offers real-world interpretation. A proposed Gibbs algorithm is provided for computing and approximating the resulting posterior probability distribution of interest to assess the considered model's network centrality measures. Experiments with synthetic and real-world stock price data are performed to validate the proposed algorithm's capabilities and show that this model has wider flexibility than the model proposed by [13]. HIGHLIGHTS • Expanding the framework for denoising financial data inside the realm of graphical network theory, where the assumption of normality in the model is inadequate to account for the variation. • Introduction of the matrix variate gamma and inverse matrix variate gamma as priors for the covariance matrices; the univariate scale parameter β may be fixed or subject to a prior. • Following Bayesian inference with more flexible priors, there is an improvement based on relevant accuracy measures. • Experimental results indicate that our proposed framework and results outperform those of [13].Item Trend of malaria parasites infection in Ethiopia along an international border : a Bayesian spatio-temporal study(BioMed Central, 2025-07) Chol, Changkuoth Jock; Belay, Denekew Bitew; Fenta, Haile Mekonnen; Chen, Ding-Geng (Din)BACKGROUND : Malaria is a major worldwide health concern that impacts many individuals worldwide. P. falciparum is Africa’s main malaria cause. However, P. vivax share a large number in Ethiopia than any other countries in Africa, followed by the closest countries. This research aims to examine the spatiotemporal trends in the risk of malaria caused by P. falciparum and P. vivax in Ethiopia and other countries that share borders between 2011 and 2020. METHODS : This study was carried-out in seven East African countries in 115 administration level 1 (region) settings. We used secondary data on two plasmodium parasites, P. falciparum, and P. vivax, between 2011 and 2020 from the Malaria Atlas Project. This study used a Bayesian setup with an integrated nested Laplace approximation to adopt spatiotemporal models. RESULTS : We analyzed P. falciparum and P. vivax malaria incidence data from 2011 to 2020 in 115 regions. Between 2011 and 2020, all of South Sudan's areas, Ethiopia's Gambella region, and Kenya’s Homa Bay, Siaya, Busia, Kakamega, and Vihita regions were at a higher risk of contracting P. falciparum malaria than their neighbors in seven East African nations. However, the Southern Nations, nationalities, and people, as well as the Oromia, Harari, Afar, and Amhara areas in Ethiopia, and the Blue Nile in Sudan, are the regions with a higher risk of P. vivax malaria than their bordering regions. For both P. falciparum and P. vivax, the spatially coordinated main effect and the unstructured spatial effect show minimal fluctuation across and within 115 regions during the study period. Through a random walk across 115 regions, the time-structured effect of P. falciparum malaria risk shows linear increases, whereas the temporally structured effect of P. vivax shows increases from 2011 to 2014 and decreases from 2017 to 2020. CONCLUSIONS : The global malaria control and eradication effort should concentrate particularly on the South Sudan and Ethiopia regions to provide more intervention control to lower the risk of malaria incidence in East African countries, as both countries have high levels of P. falciparum and P. vivax, respectively.Item Bayesian geo-additive modeling of zonal level crop production in Ethiopia(Elsevier, 2025-09) Mare, Yidnekachew; Zewotir, Temesgen; Belay, Denekew BitewCrop production plays an important role in global food security, economic stability, and sustainable development, so it is important to identify covariates that linearly and nonlinearly affect it to ensure sustainable food security and economic stability. In this study, we have used a Bayesian geo-additive mixed model to analyze the spatially structured agricultural sample survey data of eight years (Meher seasons from 2012/13 to 2019/20) collected annually by the Central Statistics Agency of Ethiopia (the current Ethiopian Statistical Service). The posterior estimates of the linear fixed effects showed that the proportion of farmers preventing soil erosion, the proportion of educated farmers, the percentage of crop damage, and the number of oxen all have a significant negative effect, while the proportion of farmers who practice pure agriculture and the area used have a significant positive effect on log crop production per household in the zone. The posterior estimates of the non-linear fixed effects showed that year, the proportion of female farmers, the proportion of farmers who practice other agriculture, the proportion of farmers who used broadcast sowing, household age, farmer association crop production, and UREA fertilizer used have significant non-linear effects on log crop production. Pure agricultural farming, cluster farming, farmers’ associations, and UREA fertilizer usage are recommended to increase crop production at the zone level. To attain the main objective of this study, we considered only the spatial structure or dependency of the sample survey data.Item Predicting precipitation using dynamic distributed lag models in arid and sub-humid regions of South Africa(Elsevier, 2025-09) Chaka, Lyson; Abd Elbasit, Mohamed A.M.; Jombo, Simbarashe; lyson.chaka@up.ac.zaOcean characteristics have contributed to a series of unusual rainfall patterns and floods, leading to severe land degradation, loss of life and infrastructure in various regions. Modelling and prediction of precipitation using in-situ data and oceanographic variables is possible. There are limited studies to substantiate this approach in less-developed countries. This study aims to model and predict precipitation in the arid, semi-arid and sub-humid regions of South Africa using dynamic linear regression (DLR) models, with sea surface temperature (SST) anomalies, evaporation-precipitation differences, longwave radiation (lwRad), net surface heat flux and relative humidity as input variables. The prediction accuracy of the autoregressive integrated moving average model with extra data (ARIMAX) and dynamic distributed lag (DDL) models was compared on the mean monthly rainfall data for the period 2008 to 2022. The results highlighted that the DDL models predict better than the other ARIMAX models, with SST anomalies and lwRad having a significant contribution (p-values < 0.05). These models had the smallest root mean squared error (RMSE) values for the arid (8.27 mm), semi-arid (19.15 mm) and the sub-humid (26.77 mm) regions, indicating that DDL models are suitable tools for the prediction of precipitation in these regions. However, additional oceanographic predictors such as sea surface salinity, ocean heat content, and upper-ocean current patterns may further enhance precipitation prediction accuracy, particularly in regions with strong ocean-atmosphere coupling, such as coastal or monsoon-influenced areas. HIGHLIGHTS • Rainfall distribution shows clear gradients across the arid, semi-arid and sub-humid zones. • Semi-arid and sub-humid regions share seasonal patterns with occasional heavy rainfall. • Oceanographic predictors, especially sea surface temperature (SST) anomalies and longwave radiation, drive rainfall variability. • Dynamic regression models improved rainfall prediction in multiple climatic regions. • Rainfall seasonal forecast will complement the decision support system for climatological, agricultural and hydrological management across the Southern African region.Item Population affinity estimation in forensic anthropology : a South African perspective(Springer, 2025-09) Mbonani, Thandolwethu Mbali; L'Abbe, Ericka Noelle; Chen, Ding-Geng (Din); Ridel, Alison Fany, Ridel, Alison Fany; u18059385@tuks.co.zaForensic anthropologists face the complex task of estimating population affinity from skeletal remains, a process that involves inferring culturally constructed “social race” from biological tissues, a challenge further complicated by the nuanced distinction between population affinity and “race”. The difficulty in making these estimations arises from the complex interplay between social constructs of race, skeletal morphology, and geographic origin. These factors are further influenced by elements such as assortative mating and institutional racism in regions such as South Africa and the United States. The interaction between cultural factors and biological traits raises the question of whether the challenges in estimating population affinity are inevitable or due to a limited understanding of human variation. To address this knowledge gap, this paper presents a review of population affinity estimation in forensic anthropology, with a focus on the South African context. It provides foundational background and historical insights, explores the medico-legal significance of population affinity, and critically evaluates both traditional and emerging estimation methods. By highlighting regional challenges and recent advancements, this review aims to enhance understanding and contribute to ongoing debates in the field.Item Forecasting South African grain prices and assessing the non-linear impact of inflation and rainfall using a dynamic Bayesian generalized additive model(Frontiers Media, 2025-07) Antwi, Albert; Kammies, Emelia Thembile; Chaka, Lyson; Arasomwan, Martins AkugbeINTRODUCTION : Accurate price forecasts and the evaluation of some of the factors that affect the prices of grains are crucial for proper planning and food security. Various methods have been designed to model and forecast grain prices and other time-stamped data. However, due to some inherent limitations, some of these models do not produce accurate forecasts or are not easily interpretable. Although dynamic Bayesian generalized additive models (GAMs) offer potential to overcome some of these problems, they do not explicitly model local trends. This may lead to biased fixed effects estimates and forecasts, thus highlighting a significant gap in literature. METHODS : To address this, we propose the use of random intercepts to capture localized trends within the dynamic Bayesian GAM framework to forecast South African wheat and maize prices. Furthermore, we examine the complex underlying relationships of the prices with inflation and rainfall. RESULTS : Evidence from the study suggests that the proposed method is able to adequately capture the dynamic localized trends consistent with the underlying local trends in the prices. It was observed that the estimated localized variations are significant, which led to improved and efficient fixed-effect parameter estimates. This led to better posterior predictions and forecasts. A comparison to the static trend Bayesian GAMs and the autoregressive integrated moving average (ARMA) models indicates a general superiority of the proposed approach for the posterior predictions and long-term posterior forecasts and has potential for short-term forecasts. The static trend Bayesian GAMs were found to generally outperform the ARMA models in long-term posterior forecasts and also have potential for short-term forecasts. However, for 1-step ahead posterior forecasts, the ARMA models consistently outperformed all the Bayesian models. The study also unveiled a significant direct nonlinear impact of inflation on wheat and maize prices. Although the impacts of rainfall on wheat and maize prices are indirect and nonlinear, only the impact on maize prices is significant. DISCUSSION : The improved efficiency and forecasts of our proposed method suggest that researchers and practitioners may consider the approach when modelling and forecasting long-term prices of grains, other agricultural commodities, speculative assets and general single-subject time series data exhibiting non-stationarity.Item Optimal design of risk-based average charts for autocorrelated measurements(Elsevier, 2025-12) Saghir, Aamir; Khan, Zahid Younas; Malela-Majika, Jean-Claude; Kosztyán, Zsolt TiborPlease read abstract in the article. HIGHLIGHTS • Developed two Risk-Based (RB) average charts for monitoring autoregressive processes • Design improves RB chart cost-efficiency under autocorrelated conditions • TCharts validated via real-world data and autocorrelated simulations • Sensitivity analysis conducted to support practical implementationItem Attribute based spatial segmentation for optimising POI placement(Elsevier, 2025-08) De Klerk, Michelle; Fabris-Rotelli, Inger NicoletteEffective spatial planning and resource optimisation require precise demarcation of potential spatial accessible areas and optimal placement of points of interest (POIs). Our approach introduces a novel attribute based spatial segmentation methodology that utilises an iterative clustering approach to create unique macro-regions, each associated with key structural and attribute specific properties. By integrating a probabilistic attribute based structure with k-means clustering, we adaptively segment spatial regions to balance area based attributes and topological characteristics. The full geographical network is segmented into attribute based macro-regions for all spatially accessible and spatially disjoint regions. Attribute based spatial segmentation offers insights into why certain areas may be spatially disjoint and if it is identified as potential spatially accessible areas to determine which POIs can be placed to maximise accessibility. This approach transforms city planning and resource allocation by aligning POI placement with regional needs and characteristics.Item A new look at the dirichlet distribution : robustness, clustering, and both together(Springer, 2025-03) Tomarchio, Salvatore D.; Punzo, Antonio; Ferreira, Johannes Theodorus; Bekker, Andriette, 1958-Compositional data have peculiar characteristics that pose significant challenges to traditional statistical methods and models. Within this framework, we use a convenient mode parametrized Dirichlet distribution across multiple fields of statistics. In particular, we propose finite mixtures of unimodal Dirichlet (UD) distributions for model-based clustering and classification. Then, we introduce the contaminated UD (CUD) distribution, a heavy-tailed generalization of the UD distribution that allows for a more flexible tail behavior in the presence of atypical observations. Thirdly, we propose finite mixtures of CUD distributions to jointly account for the presence of clusters and atypical points in the data. Parameter estimation is carried out by directly maximizing the maximum likelihood or by using an expectation-maximization (EM) algorithm. Two analyses are conducted on simulated data to illustrate the effects of atypical observations on parameter estimation and data classification, and how our proposals address both aspects. Furthermore, two real datasets are investigated and the results obtained via our models are discussed.
