Machine learning and artificial intelligence (AI) are exploding in popularity in fields ranging from art to science and everything in between—medicine and bioengineering included. While these tools have the potential to bring about significant improvements in health care, the systems aren’t perfect. How can we identify when machine learning and artificial intelligence are suggesting solutions that aren’t effective in the real world?
Carle Illinois College of Medicine (CI MED) faculty member and bioengineering professor Yogatheesan Varatharajah is working towards answering that question through his body of research, which is geared towards understanding when and how specific AI-generated models will fail. Varatharajah and his team recently presented a paper on the subject, titled Evaluating Latent Space Robustness and Uncertainty of EEG-ML Models under Realistic Distribution Shifts, at the prestigious Conference on Neural Information Processing Systems, or NeurlPS.
“Every domain in health care is using machine learning in one way or another, and so they’re becoming the mainstay of computational diagnostics and prognostics in healthcare,” Varatharajah said. “The problem is that when we do studies based on machine learning—to develop a diagnostic tool, for example—we run the models, and then we say, okay, the model performs well in a limited test setting and therefore it's good to go. But when we actually deploy it in the real world to make clinical decisions in real time, many of these approaches don't work as expected.”
Varatharajah explained that one of the most common reasons for this disconnect between models and the real world is the natural variability between collected data that is used to create a model and the data that is collected after a model is deployed. That variability might come from the hardware or protocol used to collect the data, or just differences between patients inside and outside the model. These small differences can add up to significant changes in model predictions, and, potentially, a model that fails to help patients.
“If we can identify those differences ahead of time, then we may be able to develop some additional tools to prevent those failures or at least know that these models are going to fail in certain scenarios,” Varatharajah said. “And that is the goal of this paper.”
To do this, Varatharajah and his students focused their efforts on machine-learning models based on electrophysiological data, specifically EEG recordings that are collected from patients with neurological diseases. From there, the team analyzed clinically-relevant applications, like comparing normal EEGs to abnormalities to determine whether it was possible to differentiate the two.
“We looked at what kind of variability can occur in the real world, especially those variabilities which could cause problems to machine learning models,” said Varatharajah. “And then we modeled those variabilities and developed some ‘diagnostic’ measures to diagnose the models themselves, to know when and how they are going to fail. As a result, we can be aware of these errors and take steps to mitigate them ahead of time, so the models are actually able to help clinicians with clinical decision making.”
Paper co-author and CI MED student Sam Rawal says this study can help clinicians make better decisions about patient care by bridging the gaps between large-scale study findings and factors that pertain to local populations. "The significance of this work lies in identifying the disconnect between data that AI models are trained on, compared to the real-world scenarios that they interact with when they are deployed in hospitals," Rawal said. "Being able to identify such scenarios in the real world, where models may fail or perform unexpectedly, can help guide their deployment and ensure they are being utilized in a safe and effective manner."
Presenting the team's research at NeurIPS -- one of the premiere machine learning conferences in the world -- was particularly significant. “It’s quite an achievement to have a publication accepted at this venue—it gives us a name in this community,” Varatharajah said. “This will also give us the opportunity to further develop this tool into something that can be used in the real world.” Bioengineering PhD student Neeraj Wagh presented this work at the NeurIPS conference.
Contributors to the work included co-authors Sam Rawal from CI MED; from bioengineering, Neeraj Wagh, Jionghao Wei, and Brent Berry. Varatharajah also credited the partnership between Illinois bioengineering and the Mayo Clinic’s Department of Neurology. This project was also facilitated through the Mayo Clinic and supported by the National Science Foundation.
Editor's notes: The original version of this article by Bethan Owen of the UIUC Department of Bioengineering can be found here.