dc.contributor.author |
Donhauser, Jonathan
|
|
dc.contributor.author |
Domenech-Pascual, Anna
|
|
dc.contributor.author |
Han, Xingguo
|
|
dc.contributor.author |
Jordaan, Karen
|
|
dc.contributor.author |
Ramond, Jean-Baptiste
|
|
dc.contributor.author |
Frossard, Aline
|
|
dc.contributor.author |
Romani, Anna M.
|
|
dc.contributor.author |
Prieme, Anders
|
|
dc.date.accessioned |
2024-11-07T10:58:14Z |
|
dc.date.available |
2024-11-07T10:58:14Z |
|
dc.date.issued |
2024-11 |
|
dc.description |
DATA AVAILABILITY :
ampliconTraits trait sequence databases and files for database construction are available at https://erda.ku.dk/archives/f5d4b1d41f74ba3d6f73b212dbb11591/published-archive.html
. Code to create databases and documentation for ampliconTraits are hosted at https://github.com/jdonhauser/ampliconTraits. The R package MicEncMod is available at https://github.com/jdonhauser/MicEnvMod. A markdown for all analyses in this manuscript is available in the supplementary information. Raw sequences were deposited in the NCBI Sequence Read Archive under the accession number PRJNA1073882. |
en_US |
dc.description |
SUPPLEMENTARY MATERIAL 1 : FIGURE S1: Overview of sites in Europe, Greenland and South Africa as well as distribution of climatic, vegetation and soil parameters across the dataset. MAT = mean annual temperature, aw= water activity, MAP = mean annual precipitation, BIO5 = maximum temperature warmest month, BIO7 = annual temperature range, BIO15 = precipitation seasonality, WHC = water holding capacity, SOM = soil organic matter. FIGURE S2 Bootstrap values as a function of the sequence identity with the top hit in the reference database as scatterplot (top) and as violin plot for 10 intervals of sequence identity (bottom). Intervals: [54.2,58.8] (58.8,63.4] (63.4,67.9] (67.9,72.5] (72.5,77.1] (77.1,81.7] (81.7,86.3] (86.3,90.8] (90.8,95.4] (95.4,100]. SUPPLEMENTARY METHODS. |
en_US |
dc.description |
SUPPLEMENTARY MATERIAL 2 : Code for cross validation of database. |
en_US |
dc.description |
SUPPLEMENTARY MATERIAL 3 : Code for analyses with environmental sequences. |
en_US |
dc.description.abstract |
We present a comprehensive, customizable workflow for inferring prokaryotic phenotypic traits from marker gene sequences and modelling the relationships between these traits and environmental factors, thus overcoming the limited ecological interpretability of marker gene sequencing data. We created the trait sequence database ampliconTraits, constructed by cross-mapping species from a phenotypic trait database to the SILVA sequence database and formatted to enable seamless classification of environmental sequences using the SINAPS algorithm. The R package MicEnvMod enables modelling of trait – environment relationships, combining the strengths of different model types and integrating an approach to evaluate the models' predictive performance in a single framework. Traits could be accurately predicted even for sequences with low sequence identity (80 %) with the reference sequences, indicating that our approach is suitable to classify a wide range of environmental sequences. Validating our approach in a large trans-continental soil dataset, we showed that trait distributions were robust to classification settings such as the bootstrap cutoff for classification and the number of discrete intervals for continuous traits. Using functions from MicEnvMod, we revealed precipitation seasonality and land cover as the most important predictors of genome size. We found Pearson correlation coefficients between observed and predicted values up to 0.70 using repeated split sampling cross validation, corroborating the predictive ability of our models beyond the training data. Predicting genome size across the Iberian Peninsula, we found the largest genomes in the northern part. Potential limitations of our trait inference approach include dependence on the phylogenetic conservation of traits and limited database coverage of environmental prokaryotes. Overall, our approach enables robust inference of ecologically interpretable traits combined with environmental modelling allowing to harness traits as bioindicators of soil ecosystem functioning. |
en_US |
dc.description.department |
Biochemistry, Genetics and Microbiology (BGM) |
en_US |
dc.description.librarian |
hj2024 |
en_US |
dc.description.sdg |
SDG-15:Life on land |
en_US |
dc.description.sponsorship |
The Swiss National Science Foundation; the Spanish State Research Agency; the Innovation Fund Denmark; and the Department of Science and Innovation of the Republic of South Africa. |
en_US |
dc.description.uri |
https://www.elsevier.com/locate/ecolinf |
en_US |
dc.identifier.citation |
Donhauser, J., Doménech-Pascual, A., Han, X. et al. 2024, 'Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod', Ecological Informatics, vol. 83, art. 102817, pp. 1-13, doi : 10.1016/j.ecoinf.2024.102817. |
en_US |
dc.identifier.issn |
1574-9541 (print) |
|
dc.identifier.issn |
1878-0512 (online) |
|
dc.identifier.other |
10.1016/j.ecoinf.2024.102817 |
|
dc.identifier.uri |
http://hdl.handle.net/2263/98972 |
|
dc.language.iso |
en |
en_US |
dc.publisher |
Elsevier |
en_US |
dc.rights |
© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
en_US |
dc.subject |
Trait sequence database |
en_US |
dc.subject |
Deoxyribonucleic acid (DNA) |
en_US |
dc.subject |
DNA sequencing |
en_US |
dc.subject |
Microbial community |
en_US |
dc.subject |
Cross validation |
en_US |
dc.subject |
Weighted ensemble model |
en_US |
dc.subject |
SDG-15: Life on land |
en_US |
dc.title |
Modelling soil prokaryotic traits across environments with the trait sequence database ampliconTraits and the R package MicEnvMod |
en_US |
dc.type |
Article |
en_US |