Background: Large language models (LLMs), such as ChatGPT, hold significant promise in diabetes self-management and information assessment. However, their accuracy and reliability vary as LLMs rely on training data of mixed quality and self-supervised learning and may lack domain-specific knowledge, risking inaccurate responses. Given the need for diabetes patients to manage diabetes distress and the limited mental health services available for this, they commonly seek information online and could benefit from using LLMs.
Objective: This study aims to develop and evaluate a retrieval augmented generation (RAG) tool to improve the performance of LLMs in providing accurate and safe responses to queries from people with diabetes distress.
Methods: A mixed method study design will be employed and a systematic review on LLMS and RAG in diabetes management will be conducted. Medical databases will be searched for scientific studies and clinical guidelines on diabetes distress and data from the D-Stress programme (https://www.kcl.ac.uk/research/d-stress-study) gathered to create a local vector database of domain-specific knowledge. Next, a retrieval-augmented generation (RAG) framework will be developed using encoder-decoder architecture and transformer model techniques to query and extract the most relevant knowledge from the database. Finally, the system will be validated and improved via a diverse group of targeted users and domain experts. Focus groups with participants in the D-Stress programme will help generate questions commonly asked by people experiencing diabetes distress. These will be posed to three base language models – GPT5, Claude 2, and Google Bard as well as their respective versions enhanced by RAG. Model responses will be evaluated in blinded, randomized manner using clinician assessment (focusing on accuracy, safety and comprehensiveness) and patient evaluation (focusing on understandability). Statistical analysis will be performed using SPSS to examine the accuracy, safety, comprehensiveness and understandability scores with and without RAG.
Outcomes: This approach using RAG augmented LLMs is likely to enhance the accuracy, safety, comprehensiveness and understandability of the responses of the Chat D-Stress chatbot to diabetes distress related inquiries. These improvements could enhance mental health support for people experiencing diabetes distress and improve diabetes self-management.

