Background
Inflammatory bowel disease (IBD) is a chronic, relapsing condition that affects thousands of patients across the UK. Because the disease course and severity vary greatly between individuals, modern treatment is increasingly personalised. Numerous new medications are now available, and many more are being developed through clinical trials. However, identifying suitable patients for these trials is challenging. Trial eligibility criteria are often complex, requiring access to multiple data sources such as endoscopy, histopathology, and clinical correspondence within the electronic health record (EHR). This means that potentially eligible patients are frequently overlooked.
Novelty and Importance
Natural language processing (NLP) has shown promise in extracting clinical information from unstructured text within EHRs, but its application to identifying IBD trial candidates remains limited. Existing studies have largely focused on identifying single disease features at a single time point. In contrast, this project will address the much more complex task of integrating data across multiple time points and sources to automatically identify patients who meet trial inclusion and exclusion criteria. Improving clinical trial recruitment in IBD has the potential to accelerate the development of novel therapies and improve patient outcomes.
Aims and Objectives
This PhD will develop and validate an NLP-driven system to automate the identification of IBD patients eligible for pharmacotherapy trials. The first aim is to build algorithms capable of extracting longitudinal clinical features from EHR text. The second is to design a clinician-facing interface that enables real-time identification of potential participants for any IBD trial. The project will be developed in collaboration with the gastroenterology and scientific computing teams at Guy’s and St Thomas’ NHS Foundation Trust.

