Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod

dc.contributor.authorDonhauser, Jonathan
dc.contributor.authorDomenech-Pascual, Anna
dc.contributor.authorHan, Xingguo
dc.contributor.authorJordaan, Karen
dc.contributor.authorRamond, Jean-Baptiste
dc.contributor.authorFrossard, Aline
dc.contributor.authorRomani, Anna M.
dc.contributor.authorPrieme, Anders
dc.date.accessioned2024-11-07T10:58:14Z
dc.date.available2024-11-07T10:58:14Z
dc.date.issued2024-11
dc.descriptionDATA AVAILABILITY : ampliconTraits trait sequence databases and files for database construction are available at https://erda.ku.dk/archives/f5d4b1d41f74ba3d6f73b212dbb11591/published-archive.html . Code to create databases and documentation for ampliconTraits are hosted at https://github.com/jdonhauser/ampliconTraits. The R package MicEncMod is available at https://github.com/jdonhauser/MicEnvMod. A markdown for all analyses in this manuscript is available in the supplementary information. Raw sequences were deposited in the NCBI Sequence Read Archive under the accession number PRJNA1073882.en_US
dc.descriptionSUPPLEMENTARY MATERIAL 1 : FIGURE S1: Overview of sites in Europe, Greenland and South Africa as well as distribution of climatic, vegetation and soil parameters across the dataset. MAT = mean annual temperature, aw= water activity, MAP = mean annual precipitation, BIO5 = maximum temperature warmest month, BIO7 = annual temperature range, BIO15 = precipitation seasonality, WHC = water holding capacity, SOM = soil organic matter. FIGURE S2 Bootstrap values as a function of the sequence identity with the top hit in the reference database as scatterplot (top) and as violin plot for 10 intervals of sequence identity (bottom). Intervals: [54.2,58.8] (58.8,63.4] (63.4,67.9] (67.9,72.5] (72.5,77.1] (77.1,81.7] (81.7,86.3] (86.3,90.8] (90.8,95.4] (95.4,100]. SUPPLEMENTARY METHODS.en_US
dc.descriptionSUPPLEMENTARY MATERIAL 2 : Code for cross validation of database.en_US
dc.descriptionSUPPLEMENTARY MATERIAL 3 : Code for analyses with environmental sequences.en_US
dc.description.abstractWe present a comprehensive, customizable workflow for inferring prokaryotic phenotypic traits from marker gene sequences and modelling the relationships between these traits and environmental factors, thus overcoming the limited ecological interpretability of marker gene sequencing data. We created the trait sequence database ampliconTraits, constructed by cross-mapping species from a phenotypic trait database to the SILVA sequence database and formatted to enable seamless classification of environmental sequences using the SINAPS algorithm. The R package MicEnvMod enables modelling of trait – environment relationships, combining the strengths of different model types and integrating an approach to evaluate the models' predictive performance in a single framework. Traits could be accurately predicted even for sequences with low sequence identity (80 %) with the reference sequences, indicating that our approach is suitable to classify a wide range of environmental sequences. Validating our approach in a large trans-continental soil dataset, we showed that trait distributions were robust to classification settings such as the bootstrap cutoff for classification and the number of discrete intervals for continuous traits. Using functions from MicEnvMod, we revealed precipitation seasonality and land cover as the most important predictors of genome size. We found Pearson correlation coefficients between observed and predicted values up to 0.70 using repeated split sampling cross validation, corroborating the predictive ability of our models beyond the training data. Predicting genome size across the Iberian Peninsula, we found the largest genomes in the northern part. Potential limitations of our trait inference approach include dependence on the phylogenetic conservation of traits and limited database coverage of environmental prokaryotes. Overall, our approach enables robust inference of ecologically interpretable traits combined with environmental modelling allowing to harness traits as bioindicators of soil ecosystem functioning.en_US
dc.description.departmentBiochemistry, Genetics and Microbiology (BGM)en_US
dc.description.librarianhj2024en_US
dc.description.sdgSDG-15:Life on landen_US
dc.description.sponsorshipThe Swiss National Science Foundation; the Spanish State Research Agency; the Innovation Fund Denmark; and the Department of Science and Innovation of the Republic of South Africa.en_US
dc.description.urihttps://www.elsevier.com/locate/ecolinfen_US
dc.identifier.citationDonhauser, J., Doménech-Pascual, A., Han, X. et al. 2024, 'Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod', Ecological Informatics, vol. 83, art. 102817, pp. 1-13, doi : 10.1016/j.ecoinf.2024.102817.en_US
dc.identifier.issn1574-9541 (print)
dc.identifier.issn1878-0512 (online)
dc.identifier.other10.1016/j.ecoinf.2024.102817
dc.identifier.urihttp://hdl.handle.net/2263/98972
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rights© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).en_US
dc.subjectTrait sequence databaseen_US
dc.subjectDeoxyribonucleic acid (DNA)en_US
dc.subjectDNA sequencingen_US
dc.subjectMicrobial communityen_US
dc.subjectCross validationen_US
dc.subjectWeighted ensemble modelen_US
dc.subjectSDG-15: Life on landen_US
dc.titleModelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvModen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
Donhauser_Modelling_2024.pdf
Size:
8.84 MB
Format:
Adobe Portable Document Format
Description:
Article
Loading...
Thumbnail Image
Name:
Donhauser_ModellingSuppl1_2024.pdf
Size:
757.24 KB
Format:
Adobe Portable Document Format
Description:
Supplementary Material 1
Loading...
Thumbnail Image
Name:
Donhauser_ModellingSuppl2_2024.html
Size:
2.99 MB
Format:
Hypertext Markup Language
Description:
Supplementary Material 2
Loading...
Thumbnail Image
Name:
Donhauser_ModellingSuppl3_2024.html
Size:
10.31 MB
Format:
Hypertext Markup Language
Description:
Supplementary Material 3

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: