Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod

Show simple item record

dc.contributor.author Donhauser, Jonathan
dc.contributor.author Domenech-Pascual, Anna
dc.contributor.author Han, Xingguo
dc.contributor.author Jordaan, Karen
dc.contributor.author Ramond, Jean-Baptiste
dc.contributor.author Frossard, Aline
dc.contributor.author Romani, Anna M.
dc.contributor.author Prieme, Anders
dc.date.accessioned 2024-11-07T10:58:14Z
dc.date.available 2024-11-07T10:58:14Z
dc.date.issued 2024-11
dc.description DATA AVAILABILITY : ampliconTraits trait sequence databases and files for database construction are available at https://erda.ku.dk/archives/f5d4b1d41f74ba3d6f73b212dbb11591/published-archive.html . Code to create databases and documentation for ampliconTraits are hosted at https://github.com/jdonhauser/ampliconTraits. The R package MicEncMod is available at https://github.com/jdonhauser/MicEnvMod. A markdown for all analyses in this manuscript is available in the supplementary information. Raw sequences were deposited in the NCBI Sequence Read Archive under the accession number PRJNA1073882. en_US
dc.description SUPPLEMENTARY MATERIAL 1 : FIGURE S1: Overview of sites in Europe, Greenland and South Africa as well as distribution of climatic, vegetation and soil parameters across the dataset. MAT = mean annual temperature, aw= water activity, MAP = mean annual precipitation, BIO5 = maximum temperature warmest month, BIO7 = annual temperature range, BIO15 = precipitation seasonality, WHC = water holding capacity, SOM = soil organic matter. FIGURE S2 Bootstrap values as a function of the sequence identity with the top hit in the reference database as scatterplot (top) and as violin plot for 10 intervals of sequence identity (bottom). Intervals: [54.2,58.8] (58.8,63.4] (63.4,67.9] (67.9,72.5] (72.5,77.1] (77.1,81.7] (81.7,86.3] (86.3,90.8] (90.8,95.4] (95.4,100]. SUPPLEMENTARY METHODS. en_US
dc.description SUPPLEMENTARY MATERIAL 2 : Code for cross validation of database. en_US
dc.description SUPPLEMENTARY MATERIAL 3 : Code for analyses with environmental sequences. en_US
dc.description.abstract We present a comprehensive, customizable workflow for inferring prokaryotic phenotypic traits from marker gene sequences and modelling the relationships between these traits and environmental factors, thus overcoming the limited ecological interpretability of marker gene sequencing data. We created the trait sequence database ampliconTraits, constructed by cross-mapping species from a phenotypic trait database to the SILVA sequence database and formatted to enable seamless classification of environmental sequences using the SINAPS algorithm. The R package MicEnvMod enables modelling of trait – environment relationships, combining the strengths of different model types and integrating an approach to evaluate the models' predictive performance in a single framework. Traits could be accurately predicted even for sequences with low sequence identity (80 %) with the reference sequences, indicating that our approach is suitable to classify a wide range of environmental sequences. Validating our approach in a large trans-continental soil dataset, we showed that trait distributions were robust to classification settings such as the bootstrap cutoff for classification and the number of discrete intervals for continuous traits. Using functions from MicEnvMod, we revealed precipitation seasonality and land cover as the most important predictors of genome size. We found Pearson correlation coefficients between observed and predicted values up to 0.70 using repeated split sampling cross validation, corroborating the predictive ability of our models beyond the training data. Predicting genome size across the Iberian Peninsula, we found the largest genomes in the northern part. Potential limitations of our trait inference approach include dependence on the phylogenetic conservation of traits and limited database coverage of environmental prokaryotes. Overall, our approach enables robust inference of ecologically interpretable traits combined with environmental modelling allowing to harness traits as bioindicators of soil ecosystem functioning. en_US
dc.description.department Biochemistry, Genetics and Microbiology (BGM) en_US
dc.description.librarian hj2024 en_US
dc.description.sdg SDG-15:Life on land en_US
dc.description.sponsorship The Swiss National Science Foundation; the Spanish State Research Agency; the Innovation Fund Denmark; and the Department of Science and Innovation of the Republic of South Africa. en_US
dc.description.uri https://www.elsevier.com/locate/ecolinf en_US
dc.identifier.citation Donhauser, J., Doménech-Pascual, A., Han, X. et al. 2024, 'Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod', Ecological Informatics, vol. 83, art. 102817, pp. 1-13, doi : 10.1016/j.ecoinf.2024.102817. en_US
dc.identifier.issn 1574-9541 (print)
dc.identifier.issn 1878-0512 (online)
dc.identifier.other 10.1016/j.ecoinf.2024.102817
dc.identifier.uri http://hdl.handle.net/2263/98972
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.rights © 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). en_US
dc.subject Trait sequence database en_US
dc.subject Deoxyribonucleic acid (DNA) en_US
dc.subject DNA sequencing en_US
dc.subject Microbial community en_US
dc.subject Cross validation en_US
dc.subject Weighted ensemble model en_US
dc.subject SDG-15: Life on land en_US
dc.title Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record