Discriminatory Gleason grade group signatures of prostate cancer : an application of machine learning methods

dc.contributor.authorMokoatle, Mpho
dc.contributor.authorMapiye, Darlington
dc.contributor.authorMarivate, Vukosi
dc.contributor.authorHayes, Vanessa M.
dc.contributor.authorBornman, Maria S. (Riana)
dc.date.accessioned2022-11-03T13:05:34Z
dc.date.available2022-11-03T13:05:34Z
dc.date.issued2022-06-09
dc.description.abstractOne of the most precise methods to detect prostate cancer is by evaluation of a stained biopsy by a pathologist under a microscope. Regions of the tissue are assessed and graded according to the observed histological pattern. However, this is not only laborious, but also relies on the experience of the pathologist and tends to suffer from the lack of reproducibility of biopsy outcomes across pathologists. As a result, computational approaches are being sought and machine learning has been gaining momentum in the prediction of the Gleason grade group. To date, machine learning literature has addressed this problem by using features from magnetic resonance imaging images, whole slide images, tissue microarrays, gene expression data, and clinical features. However, there is a gap with regards to predicting the Gleason grade group using DNA sequences as the only input source to the machine learning models. In this work, using whole genome sequence data from South African prostate cancer patients, an application of machine learning and biological experiments were combined to understand the challenges that are associated with the prediction of the Gleason grade group. A series of machine learning binary classifiers (XGBoost, LSTM, GRU, LR, RF) were created only relying on DNA sequences input features. All the models were not able to adequately discriminate between the DNA sequences of the studied Gleason grade groups (Gleason grade group 1 and 5). However, the models were further evaluated in the prediction of tumor DNA sequences from matched-normal DNA sequences, given DNA sequences as the only input source. In this new problem, the models performed acceptably better than before with the XGBoost model achieving the highest accuracy of 74 ± 01, F1 score of 79 ± 01, recall of 99 ± 0.0, and precision of 66 ± 0.1.en_US
dc.description.departmentComputer Scienceen_US
dc.description.departmentSchool of Health Systems and Public Health (SHSPH)en_US
dc.description.librariandm2022en_US
dc.description.sponsorshipThe South African Medical Research Council (SAMRC) through its Division of Research Capacity Development under the Internship Scholarship Program from funding received from the South African National Treasury.en_US
dc.description.urihttp://www.plosone.orgen_US
dc.identifier.citationMokoatle, M., Mapiye, D., Marivate, V., Hayes, V.M. & Bornman, R. (2022) Discriminatory Gleason grade group signatures of prostate cancer: An application of machine learning methods. PLoS One 17(6): e0267714. https://doi.org/10.1371/journal.pone.0267714.en_US
dc.identifier.issn1932-6203 (online)
dc.identifier.other10.1371/ journal.pone.0267714
dc.identifier.urihttps://repository.up.ac.za/handle/2263/88134
dc.language.isoenen_US
dc.publisherPublic Library of Scienceen_US
dc.rights© 2022 Mokoatle et al. This is an open access article distributed under the terms of the Creative Commons Attribution License.en_US
dc.subjectProstate canceren_US
dc.subjectMachine learningen_US
dc.subjectGleason grade groupen_US
dc.subjectDNA sequencesen_US
dc.titleDiscriminatory Gleason grade group signatures of prostate cancer : an application of machine learning methodsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mokoatle_Discriminatory_2022.pdf
Size:
1.3 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: