Improving Accuracy and Source Transparency in Responses to Soft Tissue Sarcoma Queries Using GPT-4o Enhanced with German Evidence-Based Guidelines.
Published In: Oncology Research & Treatment, 2025, v. 48, n. 6. P. 351 1 of 3
Database: Academic Search Ultimate 2 of 3
Authored By: Li, Cheng-Peng; Jia, Wei-Wei; Chu, Yuan; Menge, Franka; Speer, Tobias; Reißfelder, Christoph; Hohenberger, Peter; Jakob, Jens; Yang, Cui 3 of 3
Abstract
Introduction: This study aimed to evaluate the effectiveness of GPT-4o, with and without retrieval-augmented generation (RAG), in responding to soft tissue sarcoma (STS)-related queries. Methods: The study used a 20-question dataset derived from clinical scenarios related to adult STS. The responses were generated by GPT-4o with and without the RAG approach. The RAG system incorporated the English version of German evidence-based S3 guidelines through an embedding-based retrieval system. Two sarcoma experts evaluated the responses for accuracy, comprehensiveness, and safety using a Likert scale. Statistical analyses were conducted to compare the performances. Results: GPT-4o with RAG outperformed the model without RAG across all evaluated areas (p < 0.05). GPT-4o without RAG had a 40% error rate, which was reduced to 10% by the RAG approach. In 90% of the questions, the pages with the relevant information that addressed the questions were correctly cited using the retrieval system. Conclusion: The RAG approach significantly enhanced the performance of GPT-4o in answering STS-related questions. However, the model still produced incorrect responses in certain complex scenarios. GPT-4o, even with RAG, should be used cautiously in clinical settings, particularly for rare diseases like sarcoma. Human expertise remains irreplaceable in medical decision-making. Plain Language Summary: We evaluated how well the artificial intelligence (AI) model GPT-4o performed when responding to questions on soft tissue sarcoma (STS), a rare form of cancer. We developed 20 questions based on actual medical scenarios involving STS and tested the model's capacity to deliver thorough and accurate answers both with and without using a retrieval-augmented generation (RAG) system, which uses German guidelines for STS to help the model find relevant information. The correctness, thoroughness, and safety of the model's replies were assessed by two sarcoma specialists. The outcomes demonstrated that GPT-4o's performance was enhanced by the RAG system. The AI committed mistakes on 40% of the questions without RAG, but with RAG, the error rate decreased to 10%. In 90% of cases, the RAG system correctly identified the information needed to answer the questions. Although the RAG system improved the model's accuracy, it still struggled with some complex cases. The study suggests that while GPT-4o with RAG can assist in medical decision-making, it cannot replace human expertise, especially for rare diseases like sarcoma. [ABSTRACT FROM AUTHOR]
Additional Information
- Source:Oncology Research & Treatment. 2025/06, Vol. 48, Issue 6, p351
- Document Type:Article
- Subject Area:Health and Medicine
- Publication Date:2025
- ISSN:2296-5270
- DOI:10.1159/000544978
- Accession Number:185905875
- Copyright Statement:Copyright of Oncology Research & Treatment is the property of Karger AG and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Looking to go deeper into this topic? Look for more articles on EBSCOhost.