Addressing missing data in medical time series is a critical step to designing useful applications using this rich data resource. Current deep learning methods for clinical time series imputation fail show great promise in moving beyond simple impuation methods, but are yet to account for the structured, informative nature of missing data in Electronic Health Records (EHRs). Instead of treating missingness as random noise, our project will develop a novel framework that models these patterns as a source of valuable clinical information. This research will address three critical gaps: the inadequate representation of structured missingness, the lack of robust uncertainty quantification in leading models, and the disconnect between computational methods and clinical practice.
This project aims to create a structured missingness taxonomy that goes beyond traditional statistical classifications to reflect the systematic, protocol-driven nature of clinical data. Building on this, we will design a domain-informed deep learning architecture that integrates this understanding directly into the learning process. The model will treat missing data patterns as informative features and will be a part of the open-source PyPOTS community, the only available Python package to encompass most existing deep learning imputation models for time series (https://pypots.com/). All developed approaches will be validated on real-world EHR datasets, with a focus on not only statistical accuracy but also clinical plausibility and utility for decision support. Expected outcomes include a new theoretical framework for understanding EHR missingness, a practical deep learning system that provides clinically meaningful uncertainty estimates, and a demonstrable improvement in downstream clinical prediction tasks.

