Optimizing theranostics chatbots with context-augmented large language models

Koller, Pia; Clement, Christoph; Van Eijk, Albert; Seifert, Robert; Zhang, Jingjing; Prenosil, George; Sathekge, Mike Machaba; Herrmann, Ken; Baum, Richard; Weber, Wolfgang A.; Rominger, Axel; Shi, Kuangyu

Optimizing theranostics chatbots with context-augmented large language models

dc.contributor.author	Koller, Pia
dc.contributor.author	Clement, Christoph
dc.contributor.author	Van Eijk, Albert
dc.contributor.author	Seifert, Robert
dc.contributor.author	Zhang, Jingjing
dc.contributor.author	Prenosil, George
dc.contributor.author	Sathekge, Mike Machaba
dc.contributor.author	Herrmann, Ken
dc.contributor.author	Baum, Richard
dc.contributor.author	Weber, Wolfgang A.
dc.contributor.author	Rominger, Axel
dc.contributor.author	Shi, Kuangyu
dc.date.accessioned	2025-07-11T07:20:30Z
dc.date.available	2025-07-11T07:20:30Z
dc.date.issued	2025-04
dc.description.abstract	IINTRODUCTION : Nuclear medicine theranostics is rapidly emerging, as an interdisciplinary therapy option with multi-dimensional considerations. Healthcare Professionals do not have the time to do in depth research on every therapy option. Personalized Chatbots might help to educate them. Chatbots using Large Language Models (LLMs), such as ChatGPT, are gaining interest addressing these challenges. However, chatbot performances often fall short in specific domains, which is critical in healthcare applications. METHODS : This study develops a framework to examine the use of contextual augmentation to improve the performance of medical theranostic chatbots to create the first theranostic chatbot. Contextual augmentation involves providing additional relevant information to LLMs to improve their responses. We evaluate five state-of-the-art LLMs on questions translated into English and German. We compare answers generated with and without contextual augmentation, where the LLMs access pre-selected research papers via Retrieval Augmented Generation (RAG). We are using two RAG techniques: Naïve RAG and Advanced RAG. RESULTS : A user study and LLM-based evaluation assess answer quality across different metrics. Results show that Advanced RAG techniques considerably enhance LLM performance. Among the models, the best-performing variants are CLAUDE 3 OPUS and GPT-4O. These models consistently achieve the highest scores, indicating robust integration and utilization of contextual information. The most notable improvements between Naive RAG and Advanced RAG are observed in the GEMINI 1.5 and COMMAND R+ variants. CONCLUSION : This study demonstrates that contextual augmentation addresses the complexities inherent in theranostics. Despite promising results, key limitations include the biased selection of questions focusing primarily on PRRT, the need for comprehensive context documents. Future research should include a broader range of theranostics questions, explore additional RAG methods and aim to compare human and LLM evaluations more directly to enhance LLM performance further.
dc.description.department	Nuclear Medicine
dc.description.librarian	hj2025
dc.description.sdg	SDG-03: Good health and well-being
dc.description.sdg	SDG-09: Industry, innovation and infrastructure
dc.description.sponsorship	ITM Radiopharma grant and by the Swiss National Science Foundation (SNSF).
dc.description.uri	https://www.thno.org/
dc.identifier.citation	Koller, P., Clement, C., Van Eijk, A. et al. 2025, 'Optimizing theranostics chatbots with context-augmented large language models', Theranostics, vol. 15, no. 12, pp. 5693-5704, doi : 10.7150/thno.107757.
dc.identifier.issn	1838-7640 (online)
dc.identifier.other	10.7150/thno.107757
dc.identifier.uri	http://hdl.handle.net/2263/103308
dc.language.iso	en
dc.publisher	Ivyspring International Publisher
dc.rights	© The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/).
dc.subject	Large language model (LLM)
dc.subject	Contextual augmentation
dc.subject	Retrieval augmented generation (RAG)
dc.subject	Nuclear medicine
dc.subject	Theranostics
dc.subject	Artificial intelligence (AI)
dc.subject	Health care professional (HCP)
dc.title	Optimizing theranostics chatbots with context-augmented large language models
dc.type	Article