Research Area

Data and its analysis are invaluable in the healthcare sector for confident diagnosis, treatment planning, and social policy determination especially in the face of new healthcare challenges as posed by the recent COVID19 pandemic. However, the development of computational analysis techniques, such as machine learning models, often follows the trends of computational domain fields instead of prioritizing the needs and challenges of the application field (e.g. healthcare). Based on our previous works, done in association with various medical collaborators, we observe that:

Computationally optimal solutions are not necessarily medically optimal solutions. A current and prominent example of computational-medical optimality mismatch is the use of complex, “black-box” models (e.g. deep neural networks) for analyzing healthcare data. This means that while computer science application researchers are focused on using large and complex models to solve difficult problems, what the healthcare field needs and cares about are explanations for decisions made by the ML model.

The current state-of-the-art machine learning models that are pushing the frontiers of AI are mostly classification models. While these classification models perform extremely well for disease diagnosis, there is no known, well-developed model or even computational framework that can scientifically handle and study disease progression over time. Disease progression is not only important for improving patient care improvements through treatment strategies and disease management but also critical as live-saving instruments for certain diseases that require early detection such as cancer, sepsis, and covid19. Closely related is the concept of patient profiling.

Data scarcity and class imbalance are common occurrences in healthcare datasets and have an adverse effect on the classification performance of machine learning models. And while there are methods to handle class imbalance, these methods are not always suitable or even compatible when dealing with the complex, multidimensional, and correlated nature of healthcare data. Similarly, while there are data augmentation methods to handle data scarcity, it has been shown that those methods don’t work well with many healthcare datasets. What healthcare data needs are a synthesizer that can i) synthesize data from small datasets while generating quality synthetic data of high practical use, ii) account for the level of medical expertise present in research teams, and iii) minimize the number of tunable hyperparameters in the synthesizer model.

As ML models have evolved, so have the methods for causing breaches in data security and privacy. It is now well established that few techniques and design decisions help to keep ML models secure and robust, and attacks on services that use ML can compromise the security and privacy of millions of customers using those services. This can be detrimental for healthcare patients since healthcare data confidentiality is critical, and as such prevents the use of computational to further healthcare goals and improvements.

The current state-of-the-art ML models are complex, requiring over 10000 parameters to define the model. Thus, these models are quite computation-intensive, often running on expensive GPUs, and not likely to be widely accessible, especially in rural parts of the country. Thus, what healthcare needs is a technology that emphasizes low-complexity models, which are easily accessible to groups with limited computing power, and which reduces the need for a healthcare organization to invest too much in IT infrastructure to support complex, data-hungry models.

Publications