Mokoatle, MphoMarivate, VukosiMapiye, DarlingtonBornman, Maria S. (Riana)Hayes, Vanessa M.2025-11-142025-11-142025-10Mokoatle, M., Marivate, V., Mapiye, D. et al. Fine-tuning a sentence transformer for DNA. BMC Bioinformatics 26, 267: 1-13 (2025). https://doi.org/10.1186/s12859-025-06291-1.1471-2105 (online)10.1186/s12859-025-06291-1http://hdl.handle.net/2263/105294DATA AVAILABILITY : The benchmark datasets can be accessed here [23, 24]. For the other tasks (T1 and T2), the data can be accessed at the host database (The European Genome-phenome Archive at the European Bioinformatics Institute, accession number: EGAD00001004582 Data access). We share the DNA-based model on Hugging Face [36].BACKGROUND : Sentence-transformers is a library that provides easy methods for generating embeddings for sentences, paragraphs, and images. Sentiment analysis, retrieval, and clustering are among the applications made possible by the embedding of texts in a vector space where similar texts are located close to one another. This study fine-tunes a sentence transformer model designed for natural language on DNA text and subsequently evaluates it across eight benchmark tasks. The objective is to assess the efficacy of this transformer in comparison to domain-specific DNA transformers, like DNABERT and the Nucleotide transformer. RESULTS : The findings indicated that the refined proposed model generated DNA embeddings that exceeded DNABERT in multiple tasks. However, the proposed model was not superior to the nucleotide transformer in terms of raw classification accuracy. The nucleotide transformer excelled in most tasks; but, this superiority incurred significant computing expenses, rendering it impractical for resource-constrained environments such as low- and middle-income countries (LMICs). The nucleotide transformer also performed worse on retrieval tasks and embedding extraction time. Consequently, the proposed model presents a viable option that balances performance and accuracy.en© The Author(s) 2025. Open Access. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Low- and middle-income countries (LMICs)Sentence transformersBERTDNABERTSimCSEnucleotide transformerFine-tuning a sentence transformer for DNAArticle