Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

Show simple item record

dc.contributor.author Baichoo, Shakuntala
dc.contributor.author Souilmi, Yassine
dc.contributor.author Panji, Sumir
dc.contributor.author Botha, Gerrit
dc.contributor.author Meintjes, Ayton
dc.contributor.author Hazelhurst, Scott
dc.contributor.author Bendou, Hocine
dc.contributor.author De Beste, Eugene
dc.contributor.author Mpangase, Phelelani T.
dc.contributor.author Souiai, Oussema
dc.contributor.author Alghali, Mustafa
dc.contributor.author Yi, Long
dc.contributor.author O’Connor, Brian D.
dc.contributor.author Crusoe, Michael
dc.contributor.author Armstrong, Don
dc.contributor.author Aron, Shaun
dc.contributor.author Joubert, Fourie
dc.contributor.author Ahmed, Azza E.
dc.contributor.author Mbiyavanga, Mamana
dc.contributor.author Van Heusden, Peter
dc.contributor.author Magosi, Lerato E.
dc.contributor.author Zermeno, Jennie
dc.contributor.author Mainzer, Liudmila Sergeevna
dc.contributor.author Fadlelmola, Faisal M.
dc.contributor.author Jongeneel, C. Victor
dc.contributor.author Mulder, Nicola
dc.date.accessioned 2019-10-11T08:20:21Z
dc.date.available 2019-10-11T08:20:21Z
dc.date.issued 2018-11-29
dc.description.abstract BACKGROUND : The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS : H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION : The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network. en_ZA
dc.description.department Biochemistry en_ZA
dc.description.department Genetics en_ZA
dc.description.department Microbiology and Plant Pathology en_ZA
dc.description.librarian am2019 en_ZA
dc.description.sponsorship National Human Genome Research Institute (NHGRI) and the Office Of The Director (OD), National Institutes of Health under award number U41HG006941. en_ZA
dc.description.uri https://bmcbioinformatics.biomedcentral.com en_ZA
dc.identifier.citation Baichoo, S., Souilmi, Y., Panji, S. et al. 2018, 'Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics', BMC Bioinformatics, vol. 19, art. 457, pp. 1-13. en_ZA
dc.identifier.issn 1471-2105 (online)
dc.identifier.other 10.1186/s12859-018-2446-1
dc.identifier.uri http://hdl.handle.net/2263/71795
dc.language.iso en en_ZA
dc.publisher BioMed Central en_ZA
dc.rights © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License. en_ZA
dc.subject Workflows en_ZA
dc.subject Pipeline en_ZA
dc.subject Bioinformatics en_ZA
dc.subject Africa en_ZA
dc.subject Genomics en_ZA
dc.subject Docker en_ZA
dc.subject Reproducibility en_ZA
dc.title Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics en_ZA
dc.type Article en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record