Results to date

Publications

Book Chapters

Lung cancer is the most common cause of cancer deaths in the UK emphasizing the critical need for early diagnosis. Survival rates vary significantly according to the stage of diagnosis. This study aims to develop machine learning models to classify between lung cancer and non-lung cancer cases using data from the Clinical Practice Research Datalink (CPRD) which includes UK primary care records. Both interpretable and post hoc explainable approaches are explored including RuleFit, a rule-based method; decision tree, an inherently interpretable model; and random forest and eXtreme Gradient Boosting, tree-based ensemble models. The model performance is assessed using metrics such as accuracy, Area Under the Receiver Operating Characteristic Curve, sensitivity, and specificity. The models performed similarly across all measures. Additionally, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) are employed to enhance model interpretability. These insights contribute to better understanding the leading risk factors for lung cancer. Using SHAP, it is found that age and smoking status play a crucial role in lung cancer prediction for all tree-based models. Then, LIME is used to evaluate individual-level explanations and identify any discrepancies in their explanations between different models. Our study combines robust evaluation with prominent interpretability techniques to gain valuable insights into lung cancer prediction.

https://doi.org/10.1007/978-3-031-91379-2_6

Journal Publications

Accurate and automated analysis of chest Computed Tomography (CT) scans is critical for early detection and risk stratification of lung cancer, the leading cause of cancer-related mortality worldwide. However, the development of robust deep learning models for lung nodule analysis is hindered by the limited availability of large, diverse, and well-annotated 3D CT datasets. This work presents an anatomically guided latent diffusion framework for synthesizing high-quality three-dimensional chest CT volumes. The proposed approach, termed LAND (Lung and Nodule Diffusion), conditions the generative process on 3D anatomical masks of the lungs and pulmonary nodules to ensure accurate spatial localization and realistic anatomical structure. A dedicated variational autoencoder (VAE) encodes anatomical masks into a latent representation that preserves fine-grained nodule morphology. In addition, conditional texture modeling within masked nodule regions enables controlled variation in lesion appearance. Compared with existing 3D diffusion-based methods, LAND substantially reduces computational requirements and generates 256 × 256 × 256 volumes at 1 mm isotropic resolution using 10–16 GB of GPU memory during training and less than 8 GB during inference. Experimental results demonstrate high visual fidelity and anatomical realism, further supported by improved performance in downstream lung nodule segmentation and classification tasks. These findings indicate that LAND provides a practical and efficient framework for anatomically guided 3D medical image synthesis and data augmentation.

https://doi.org/10.1038/s41598-026-51634-4

Introduction: The need for eXplainable Artificial Intelligence (XAI) in healthcare is more critical than ever, especially as regulatory frameworks such as the European Union Artificial Intelligence (EU AI) Act mandate transparency in clinical decision support systems. Post hoc XAI techniques such as Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP) and Partial Dependence Plots (PDPs) are widely used to interpret Machine Learning (ML) models for disease risk prediction, particularly in tabular Electronic Health Record (EHR) data. However, their reliability under real-world scenarios is not fully understood. Class imbalance is a common challenge in many real-world datasets, but it is rarely accounted for when evaluating the reliability and consistency of XAI techniques.

Methods: In this study, we design a comparative evaluation framework to assess the impact of class imbalance on the consistency of model explanations generated by LIME, SHAP, and PDPs. Using UK primary care data from the Clinical Practice Research Datalink (CPRD), we train three ML models: XGBoost (XGB), Random Forest (RF), and Multi-layer Perceptron (MLP), to predict lung cancer risk and evaluate how interpretability is affected under class imbalance when compared against a balanced dataset. To our knowledge, this is the first study to evaluate explanation consistency under class imbalance across multiple models and interpretation methods using real-world clinical data.

Results: Our main finding is that class imbalance in the training data can significantly affect the reliability and consistency of LIME and SHAP explanations when evaluated against models trained on balanced data. To explain these empirical findings, we also present a theoretical analysis of LIME and SHAP to understand why explanations change under different class distributions. It is also found that PDPs exhibit noticeable variation between models trained on imbalanced and balanced datasets with respect to clinically relevant features for predicting lung cancer risk.

Discussion: These findings highlight a critical vulnerability in current XAI techniques, i.e., their interpretability are significantly affected under skewed class distributions, which is common in medical data and emphasises the importance of consistent model explanations for trustworthy ML deployment in healthcare.

https://doi.org/10.3389/frai.2025.1682919

Background: Federated learning (FL) is a rapidly advancing technique that enables collaborative model training while preserving data privacy. This approach is particularly relevant in healthcare, where privacy concerns and regulatory restrictions often prevent centralized data sharing. FL has shown promise in tasks such as disease detection, achieving performance levels comparable to centralized systems. However, its practical usability in real-world applications remains underexplored.
Methods: We evaluate the practical effectiveness of FL in predicting whether patients suspected of prostate cancer require invasive biopsy procedures. The study uses 14 publicly available prostate cancer datasets from 10 countries. We propose and benchmark a novel FL evaluation strategy, Leave-Silo-Out (LSO), which quantifies the performance gap between federated training and free-riding (utilizing the federated model without contributing data). Additionally, we investigate whether locally trained models can outperform multi-hospital FL models. The results are assessed with a focus on improving the diagnosis of local patients.
Results: Our findings reveal that the benefits of FL vary with the amount of locally available annotated data. Hospitals with very small datasets see negligible improvements from FL compared to free-riding. Institutions with moderate datasets may achieve some gains through FL training. However, hospitals with extensive datasets often experience little to no advantage from FL and, in some cases, observe reduced performance compared to local training.
Conclusion: Federated learning shows potential in scenarios with limited data availability. However, its practical applicability is highly context-dependent, influenced by factors such as data availability and specific task requirements.

https://doi.org/10.1016/j.ijmedinf.2025.106046

We thank you for the opportunity to respond to the commentary letter by Dehaene et al on our recent article, “Does differentially private synthetic data lead to synthetic discoveries?”[1] published in Methods of Information in Medicine. We appreciate the commentators’ interest in our work and their contribution to an important and ongoing discussion on the utility of synthetic data and its implications for statistical inference.

The letter from Dehaene et al raises a concern about two possible interpretations of the results in our article, namely that the risk of unacceptably high false-positive findings from synthetic data can be simply countered by increasing the amount of original data enough, or by stepping away from differentially private (DP) synthetization methods. Referring to simulation results in Decruyenaere et al,[2] they note that even for non-DP methods and large original sample sizes, this risk can remain high, especially when using deep learning-based generation methods. We find that Dehaene et al raise an important point and their observations are compatible also with our results. While reducing the amount of DP noise and increasing the original sample size are positively correlated with the utility of generated synthetic data, these alone are not enough if the generator is a misspecified parametric model or suffers from what Decruyenaere et al[2] refers to as the regularization bias.

As the authors note, citing Chen et al: “synthetic data are artificial data that (attempt to) mimic the original data in terms of statistical properties, without revealing individual records.”[3] Obviously, if privacy would not be of concern and reliable prior information on the true distribution of data absent, this would be achieved simply by using the original data. Indeed, some DP data release methods reconstruct the original data in the limit of epsilon approaching infinity. In our experiments, the DP perturbed and DP smoothed histograms have such properties. Accordingly, these methods demonstrate a clear trade-off between similarity to original data, privacy level, and the amount of original data, with the inferential utility of the synthetic data typically increasing both with respect to original sample size and inversely with respect to privacy level. On the other hand, the synthetic data generated by Multiplicative Weights Exponential Mechanism (MWEM) and Private-PGM (Private-Probabilistic Graphical Model) may diverge from the distribution of original data in the limit due to approximating higher-dimensional data with low-dimensional marginals. Hence, the trade-off may be less clear, if the statistical property of interest changes not only due to privacy level but also due to approximation. In some of our results, this is reflected by the utility increasing as a function of decreasing privacy level only up to a certain limit but not achieving the utility of the original data. A similar effect can take place if the synthetization methods make incorrect parametric assumptions. At the other extreme of this continuum of methods, there are synthesizers having regularization bias aimed for purposes other than reproducing the original data. For example, in our experiments, the DP GAN method had very different behavior compared with the other methods, and the risk of false discoveries even increased as a function of decreasing privacy level.

Accordingly, we agree with the main message of Dehane et al that the inferential utility level of the original data is not necessarily achieved simply by decreasing the privacy level or with larger amounts of original data, but is very method-dependent. Hence caution is certainly always warranted when performing statistical inference on synthetic data, with different methods having different trade-offs and some demonstrating systematic biases that are not easy to counter.

https://doi.org/10.1055/a-2540-8284

Explainable artificial intelligence (XAI) has gained much interest in recent years for its ability to explain the complex decision-making process of machine learning (ML) and deep learning (DL) models. The Local Interpretable Model-agnostic Explanations (LIME) and Shaply Additive exPlanation (SHAP) frameworks have grown as popular interpretive tools for ML and DL models. This article provides a systematic review of the application of LIME and SHAP in interpreting the detection of Alzheimer’s disease (AD). Adhering to PRISMA and Kitchenham’s guidelines, we identified 23 relevant articles and investigated these frameworks’ prospective capabilities, benefits, and challenges in depth. The results emphasise XAI’s crucial role in strengthening the trustworthiness of AI-based AD predictions. This review aims to provide fundamental capabilities of LIME and SHAP XAI frameworks in enhancing fidelity within clinical decision support systems for AD prognosis.

https://doi.org/10.1186/s40708-024-00222-1

Differentially private (DP) synthetic data has emerged as a potential solution for sharing sensitive individual-level biomedical data. DP generative models offer a promising approach for generating realistic synthetic data that aims to maintain the original data’s central statistical properties while ensuring privacy by limiting the risk of disclosing sensitive information about individuals. However, the issue regarding how to assess the expected real-world prediction performance of machine learning models trained on synthetic data remains an open question. In this study, we experimentally evaluate two different model evaluation protocols for classifiers trained on synthetic data. The first protocol employs solely synthetic data for downstream model evaluation, whereas the second protocol assumes limited DP access to a private test set consisting of real data managed by a data curator. We also propose a metric for assessing how well the evaluation results of the proposed protocols match the real-world prediction performance of the models. The assessment measures both the systematic error component indicating how optimistic or pessimistic the protocol is on average and the random error component indicating the variability of the protocol’s error. The results of our study suggest that employing the second protocol is advantageous, particularly in biomedical health studies where the precision of the research is of utmost importance. Our comprehensive empirical study offers new insights into the practical feasibility and usefulness of different evaluation protocols for classifiers trained on DP-synthetic data.

https://doi.org/10.1109/ACCESS.2024.3446913

Background Synthetic data have been proposed as a solution for sharing anonymized versions of sensitive biomedical datasets. Ideally, synthetic data should preserve the structure and statistical properties of the original data, while protecting the privacy of the individual subjects. Differential Privacy (DP) is currently considered the gold standard approach for balancing this trade-off.

Objectives The aim of this study is to investigate how trustworthy are group differences discovered by independent sample tests from DP-synthetic data. The evaluation is carried out in terms of the tests’ Type I and Type II errors. With the former, we can quantify the tests’ validity, i.e., whether the probability of false discoveries is indeed below the significance level, and the latter indicates the tests’ power in making real discoveries.

Methods We evaluate the Mann–Whitney U test, Student’s t-test, chi-squared test, and median test on DP-synthetic data. The private synthetic datasets are generated from real-world data, including a prostate cancer dataset (n = 500) and a cardiovascular dataset (n = 70,000), as well as on bivariate and multivariate simulated data. Five different DP-synthetic data generation methods are evaluated, including two basic DP histogram release methods and MWEM, Private-PGM, and DP GAN algorithms.

Conclusion A large portion of the evaluation results expressed dramatically inflated Type I errors, especially at levels of ϵ ≤ 1. This result calls for caution when releasing and analyzing DP-synthetic data: low p-values may be obtained in statistical tests simply as a byproduct of the noise added to protect privacy. A DP Smoothed Histogram-based synthetic data generation method was shown to produce valid Type I error for all privacy levels tested but required a large original dataset size and a modest privacy budget (ϵ ≥ 5) in order to have reasonable Type II error levels.

https://doi.org/10.1055/a-2385-1355

Methylation is considered one of the proteins’ most important post-translational modifications (PTM). Plasticity and cellular dynamics are among the many traits that are regulated by methylation. Currently, methylation sites are identified using experimental approaches. However, these methods are time-consuming and expensive. With the use of computer modelling, methylation sites can be identified quickly and accurately, providing valuable information for further trial and investigation. In this study, we propose a new machine-learning model called MeSEP to predict methylation sites that incorporates both evolutionary and structural-based information. To build this model, we first extract evolutionary and structural features from the PSSM and SPD2 profiles, respectively. We then employ Extreme Gradient Boosting (XGBoost) as the classification model to predict methylation sites. To address the issue of imbalanced data and bias towards negative samples, we use the SMOTETomek-based hybrid sampling method. The MeSEP was validated on an independent test set (ITS) and 10-fold cross-validation (TCV) using lysine methylation sites. The method achieved: an accuracy of 82.9% in ITS and 84.6% in TCV; precision of 0.92 in ITS and 0.94 in TCV; area under the curve values of 0.90 in ITS and 0.92 in TCV; F1 score of 0.81 in ITS and 0.83 in TCV; and MCC of 0.67 in ITS and 0.70 in TCV. MeSEP significantly outperformed previous studies found in the literature. MeSEP as a standalone toolkit and all its source codes are publicly available at https://github.com/arafatro/MeSEP.

https://doi.org/10.1007/s12559-024-10268-2

Transformers have dominated the landscape of Natural Language Processing (NLP) and revolutionalized generative AI applications. Vision Transformers (VT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer’s Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models’ efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and Machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.

https://doi.org/10.1186/s40708-024-00238-7

In many countries around the world, the healthcare sector is facing difficult problems: the aging population needs more care at the same time as the workforce is not growing, the cost of treatments is going up, and the more and more technical medical products are placing serious challenges to the expertise of the healthcare professionals. At the same time, the field of artificial intelligence (AI) is making big leaps, and naturally, AI is also suggested as a remedy to these problems. In this article, we discuss some of the ethical and legal problems facing AI in the healthcare field, with case study of European Union (EU) regulations and the local laws in one EU member state, Finland. We also look at some of the directions that the AI research in medicine will develop in the next 3–10 years. Especially, Large Language Models (LLMs) and image analysis are used as examples. The potential of AI is huge and the potential has already become a reality in many fields, but in medicine, there remain obstacles. We discuss both technical and regulatory questions related to the expansion of AI techniques used in the clinical environment.

https://doi.org/10.5772/intechopen.1007443

Conference Publications

Artificial Intelligence (AI) and emerging technologies are revolutionising digital human models (DHMs), offering significant opportunities to enhance accessibility and inclusion in healthcare systems. This evolution is further amplified by the concept of the “digital twin”—a virtual representation of a human patient that is dynamically updated with real-world data. This paper explores the potential of explainable AI (XAI) in conjunction with digital twins to create transparent, interpretable, and responsible healthcare solutions, particularly through privacy-preserving techniques like federated learning. By integrating advanced technologies such as computer vision, natural language processing, and machine learning, DHMs can be designed to understand, predict, and simulate the behaviours and requirements of individuals with varying abilities and backgrounds, ultimately creating personalised digital twins for enhanced healthcare.

https://doi.org/10.1007/978-3-032-13022-8_25

Using artificial intelligence (AI) to advance data-driven innovations in preventive health care and clinical decision-making is an expanding field of development. A core feature of such innovation is to ensure that trustworthy and privacy-preserving methods are used. Europe is leading the way in this regard, with agreement on the European Health Data Space and the AI Act coming into force in 2024. PHASE IV AI seeks to advance the current state-of-the-art data synthesis methods, giving AI developers access to larger pools of decentralised, de-identified data through multiparty computing. It will also develop metrics for testing and validation, and protocols that enable synthetic data generation (through multi-party computation). Access to this data market and the data service ecosystem will be through a Health Data Hub in the European Health Data Space. Defining the requirements for the Health Data Hub and the wider system is essential for the success of PHASE IV AI, but it can be challenging when stakeholders have demanding professional vocations. Various methods could be adopted to gather input from the many stakeholders, but where time is valuable, the generation of user stories from hybrid focus group interviews is anticipated to be an effective and efficient method for capturing the range of interests expressed by multiple groups. The aim was to describe the process and outputs from consultations with medical professionals, software developers and small and medium-sized enterprise decision makers through the process of online and hybrid group interviews. The engagement of these professionals in the interview sessions, the interview analysis and extraction of user stories, their refinement and prioritisation and finally their use by the project developers were described. This process looked to provide constructive user stories that provide meaningful recommendations to the developers, and result in an effective product for use in the trans-European context, which will have meaningful impact beyond the end of the PHASE IV AI project.

https://doi.org/10.1007/978-3-032-13022-8_27

Machine learning models have been applied to various healthcare tasks. Such models include both inherently interpretable models and black-box models. In most cases, these models are capable of achieving high accuracy. It is also known that the model should also be well calibrated. Recently, the issues of algorithmic bias in clinical predictive models have attracted attention. This is because such bias would result in disparities in health care, introducing disadvantages to some subgroups of the population. The aim is to detect such disparities and then remove them afterwards. In this perspective, those predictors used by the model need to be differentiated between sensitive variables and the rest. Those sensitive variables include age, race among the others. Among these disparities, the most comprehensible one is so-called data disparities. It is known that a target population usually includes a large number of subgroups. Many of such subgroups could be quite small. When the population data is used for training a predictive model, the resulting characteristics of those outcomes will be largely dominated by a few major subgroups. On the other hand, when we fit the models with individual subgroup data, it is expected that the data in some small subgroups are not sufficient for a proper model training, thus producing disparately predicted outcomes. Most of clinical predictive models don’t include domain-specific knowledge. Causal inference allows for incorporating experts’ knowledge into the relation within the set of predictive variables. The model is referred to as causal-effect model. This approach can help mitigate those disparate outcomes from those small subgroups thanks to inclusion of domain knowledge. The principled approach is to find different but related data set. Generally, it can be done within the frame of transfer learning. Apart from the re-training approaches, domain adaptation can be used to project a number of source domains jointly to a target domain. It is expected that the resulting target domain should have sufficient data even for those small subgroups. It has been debated whether or no protected variables/characteristics (such as race and gender) should be used for clinical predictive models.Machine learning models have been applied to various healthcare tasks. Such models include both inherently interpretable models and black-box models. In most cases, these models are capable of achieving high accuracy. It is also known that the model should also be well calibrated. Recently, the issues of algorithmic bias in clinical predictive models have attracted attention. This is because such bias would result in disparities in health care, introducing disadvantages to some subgroups of the population. The aim is to detect such disparities and then remove them afterwards. In this perspective, those predictors used by the model need to be differentiated between sensitive variables and the rest. Those sensitive variables include age, race among the others. Among these disparities, the most comprehensible one is so-called data disparities. It is known that a target population usually includes a large number of subgroups. Many of such subgroups could be quite small. When the population data is used for training a predictive model, the resulting characteristics of those outcomes will be largely dominated by a few major subgroups. On the other hand, when we fit the models with individual subgroup data, it is expected that the data in some small subgroups are not sufficient for a proper model training, thus producing disparately predicted outcomes. Most of clinical predictive models don’t include domain-specific knowledge. Causal inference allows for incorporating experts’ knowledge into the relation within the set of predictive variables. The model is referred to as causal-effect model. This approach can help mitigate those disparate outcomes from those small subgroups thanks to inclusion of domain knowledge. The principled approach is to find different but related data set. Generally, it can be done within the frame of transfer learning. Apart from the re-training approaches, domain adaptation can be used to project a number of source domains jointly to a target domain. It is expected that the resulting target domain should have sufficient data even for those small subgroups. It has been debated whether or no protected variables/characteristics (such as race and gender) should be used for clinical predictive models.

https://doi.org/10.1007/978-3-032-13022-8_26

Cancer is a leading cause of mortality worldwide, with breast and lung cancer being the most prevalent globally. Early and accurate diagnosis is crucial for successful treatment, and medical imaging techniques play a pivotal role in achieving this. This paper proposes a novel pipeline that leverages generative artificial intelligence to enhance medical images by combining synthetic image generation and super-resolution techniques. The framework is validated in two medical use cases (breast and lung cancers), demonstrating its potential to improve the quality and quantity of medical imaging data, ultimately contributing to more precise and effective cancer diagnosis and treatment. Overall, although some limitations do exist, this paper achieved satisfactory results for an image size which is conductive to specialist analysis, and further expands upon this field’s capabilities.Cancer is a leading cause of mortality worldwide, with breast and lung cancer being the most prevalent globally. Early and accurate diagnosis is crucial for successful treatment, and medical imaging techniques play a pivotal role in achieving this. This paper proposes a novel pipeline that leverages generative artificial intelligence to enhance medical images by combining synthetic image generation and super-resolution techniques. The framework is validated in two medical use cases (breast and lung cancers), demonstrating its potential to improve the quality and quantity of medical imaging data, ultimately contributing to more precise and effective cancer diagnosis and treatment. Overall, although some limitations do exist, this paper achieved satisfactory results for an image size which is conductive to specialist analysis, and further expands upon this field’s capabilities.Cancer is a leading cause of mortality worldwide, with breast and lung cancer being the most prevalent globally. Early and accurate diagnosis is crucial for successful treatment, and medical imaging techniques play a pivotal role in achieving this. This paper proposes a novel pipeline that leverages generative artificial intelligence to enhance medical images by combining synthetic image generation and super-resolution techniques. The framework is validated in two medical use cases (breast and lung cancers), demonstrating its potential to improve the quality and quantity of medical imaging data, ultimately contributing to more precise and effective cancer diagnosis and treatment. Overall, although some limitations do exist, this paper achieved satisfactory results for an image size which is conductive to specialist analysis, and further expands upon this field’s capabilitiesCancer is a leading cause of mortality worldwide, with breast and lung cancer being the most prevalent globally. Early and accurate diagnosis is crucial for successful treatment, and medical imaging techniques play a pivotal role in achieving this. This paper proposes a novel pipeline that leverages generative artificial intelligence to enhance medical images by combining synthetic image generation and super-resolution techniques. The framework is validated in two medical use cases (breast and lung cancers), demonstrating its potential to improve the quality and quantity of medical imaging data, ultimately contributing to more precise and effective cancer diagnosis and treatment. Overall, although some limitations do exist, this paper achieved satisfactory results for an image size which is conductive to specialist analysis, and further expands upon this field’s capabilities.
Immune checkpoint inhibitors (ICIs) have improved outcomes in clinical trials for metastatic non-small cell lung cancer (mNSCLC). However, evidence of the uptake, treatment, and effectiveness patterns of ICIs in real-world settings, particularly across diverse healthcare systems and populations, remains scarce. Real-world data (RWD) is vital to complement clinical trial findings. Yet, barriers such as data standardization, lack of networks, especially across borders, and privacy concerns hinder comprehensive global assessments.
Data harmonization to the OMOP common data model (CDM) and federated data analysis frameworks address these challenges by enabling standardized, large-scale analyses while maintaining patient privacy.In March 2025, a mNSCLC study-a-thon held in Helsinki brought together a multidisciplinary team from 21 sites and nine countries, from academic and industry data partners, with the primary objective to characterize patients with mNSCLC and evaluate shifts in treatment patterns and outcomes after introduction of ICIs.

https://doi.org/10.1016/j.jtho.2025.09.452

Chest CT scans are essential in diagnosing lung abnormalities, including lung cancer, but their utility in training deep learning models is often pushed back by limited data availability, high labeling costs, and privacy concerns. To address these challenges, this study explores the use of score-based diffusion models for the conditional generation of lung CT scans slices. Two generation scenarios are explored: one limited to lung segmentation masks and another incorporating both lung and nodule segmentation mappings to guide the synthesis process. The proposed methods are custom U-Net architecture models trained to predict the scores in Variance Preserving (VP) and Variance Exploding (VE) Stochastic Differential Equations (SDEs), composing the primary ground for comparison in conditional sample generation. The results demonstrate the VP SDEs model’s superiority in generating high-fidelity images, as evidenced by high SSIM (0.894) and PSNR (28.6) values, as well as low domain-specific FID (173.4), MMD (0.0133) and ECS (0.78) scores. The generated images consistently followed the conditional mapping guidance during the generation process, effectively producing realistic lung and nodule structures, highlighting their potential for data augmentation in medical imaging tasks. While the models achieved notable success in generating accurate 2D lung CT scan slices given simple conditional image region mappings, future work surrounds the extension of these methods to 3D conditional generation and the use of richer conditional mappings to account for broader anatomical variations. Nevertheless, this study holds promise for improvement in computer-aided systems through the support in deep learning model training for lung disease diagnosis and classification.

https://doi.org/10.1109/EMBC58623.2025.11254813

Poster: https://zenodo.org/records/16234717

Prostate cancer (PCa) diagnosis often relies on biopsies, which can lead to unnecessary procedures and complications. Federated learning (FL) offers a privacy-preserving approach for training predictive models across hospitals without sharing sensitive patient data. In this study, we evaluate the feasibility of FL for PCa risk prediction by benchmarking different training strategies, including local, federated models, as well as free-riding (FR) on federated models. Using real-world heterogeneous datasets from 19 hospitals, we analyze the impact of data diversity and consortium size on predictive performance. Our results show that while FL improves model generalizability, local models often perform comparably, making direct participation in FL less beneficial for large hospitals. However, a small consortium of high-data-quality institutions could collaboratively develop robust models for broader clinical use. We discuss the practical implications of FL in healthcare and propose strategies for sustainable deployment in real-world hospital networks.

https://doi.org/10.1109/EMBC58623.2025.11252903

We propose a new framework for Bayesian estimation of differential privacy, incorporating evidence from multiple membership inference attacks (MIA). Bayesian estimation is carried out via a Markov Chain Monte Carlo (MCMC) algorithm, named MCMC-DP-Est, which provides an estimate of the full posterior distribution of the privacy parameter (e.g., instead of just credible intervals). Critically, the proposed method does not assume that privacy auditing is performed with the most powerful attack on the worst-case (dataset, challenge point) pair, which is typically unrealistic. Instead, MCMC-DP-Est jointly estimates the strengths of MIAs used and the privacy of the training algorithm, yielding a more cautious privacy analysis. We also present an economical way to generate measurements for the performance of an MIA that is to be used by the MCMC method to estimate privacy. We present the use of the methods with numerical examples with both artificial and real data.

https://doi.org/10.1007/978-3-032-06096-9_23

Clinical predictive models have played an important role in healthcare. An important task in lung cancer healthcare is to identify those participants involved in a screening program with higher lung cancer risk from a selected population. More interestingly, Electronic Healthcare Records (EHRs) data can be acquired from primary care and have been used to emulate a screening program. An example of such EHR dataset is Clinical Practice Research Datalink (CPRD) that covers 4.5% UK population. In this paper, we provide a worked example for such task while employing Explainable Boosting Machine (EBM) as the predictive model and using CPRD dataset as the EHRs.

EBM is a prominent example of inherently interpretable models (i.e., IIM). IIMs can predict target variables and model explanation simultaneously. More importantly, EBMs represent a family of non-linear IIMs. This kind of generalisation presents a significant extension of logistic regression. EBMs have been developed as an end-to-end system at Microsoft Research. It provide powerful visualisation tools for evaluating both model prediction and explanation. On the other hand, EBM users like to know more technical details about EBM itself. Thus, we provide a brief introduction to Generalised Additive Model, Gradient Boosting, Boosted Trees, and Bagging Ensemble. Finally, we further provide two EBM-based Use Cases in healthcare domain as well as an illustrative example of lung cancer prediction and explanation.

https://doi.org/10.1007/978-3-032-04657-4_13

Federated learning is a machine learning technique that allows multiple distributed clients to collaboratively train an ML model without sharing their private data with any of the parties involved. However, ensuring the privacy of client data during the FL process remains an ongoing concern. In this study, we propose a homomorphic-encryption-based privacy-preserving FL protocol for multilayer perceptrons, which is shown to be secure under the presence of colluding honest-but-curious clients. The possibility of client collusion attacks is eliminated by utilizing the inherent permutability of neural networks. Our results indicate that our protocol does not incur any considerable loss in accuracy during the training process. Furthermore, it offers minimal computation costs by utilizing the batching technique of homomorphic operation and employing only the inexpensive homomorphic addition operation for the aggregation process.

https://doi.org/10.1007/978-3-032-04657-4_34

Machine learning models have increasingly played an important role in medicine and healthcare. They can be readily adapted for clinical prognostic tasks. A prominent task in lung cancer healthcare is to select people with higher lung cancer risk from some population. The task can be undertaken using clinical predictive models along with real-world Electronic Healthcare Records. In this paper, we provide a worked example for such task using Logistic Regression as the model and using CPRD Dataset as the EHRs which cover 4.5% UK population [9].

Further, the use of clinical predictive models in cancer care has gone beyond cancer screening programme. That is, such models can also be employed to perform a variety of cancer healthcare management tasks. In this paper, we provide six “lung cancer”-related use cases to illustrate task diversity. It is also demonstrated that each of 6 use cases has chosen their appropriate set of prognostic predictors to optimally perform their task. Last, their task performance is also critically evaluated.

Domains such as medicine and healthcare require trustworthiness and accountability. To meet this challenge, Explainable Artificial Intelligence (XAI) techniques have been timely developed. In this paper, we introduced impurity-, permutation-, LIME-, and SHAP-based importance measures. These XAI techniques were applied to 6 use cases for variable importance analysis. Last, we used domain-specific knowledge to critically interpret their XAI results. We also briefly reviewed a model-specific XAI application. It relies on knowledge-based constraints.

https://doi.org/10.1007/978-981-96-6588-4_11

Machine learning (ML) models in healthcare are increasing but the lack of interpretability of these models results in them not being suitable for use in clinical practice. In the medical field, it is vital to clarify to clinicians and patients the rationale behind a model’s high probability prediction for a specific disease in an individual patient. This transparency fosters trust, facilitates informed decision-making, and empowers both clinicians and patients to understand the underlying factors driving the model’s output. This paper aims to incorporate explainability to ML models such as Random Forest (RF), eXtreme Gradient Boosting (XGBoost) and Multilyer Perceptron (MLP) for using with Clinical Practice Research Datalink (CPRD) data and interpret them in terms of feature importance to identify the top most features when distinguishing between lung cancer and non-lung cancer cases. The SHapley Additive exPlanations (SHAP) method has been used in this work to interpret the models. We use SHAP to gain insights into explaining individual predictions as well as interpreting them globally. The feature importance from SHAP is compared with the default feature importance of the models to identify any discrepancies between the results. Based on experimental findings, it has been found that the default feature importance from the tree-based models and SHAP is consistent with features ‘age’ and ‘smoking status’ which serve as the top features for predicting lung cancer among patients. Additionally, this work pinpoints that feature importance for a single patient may vary leading to a varied prediction depending on the employed model. Finally, the work concludes that individual-level explanation of feature importance is crucial in mission-critical applications like healthcare to better understand personal health and lifestyle factors in the early prediction of diseases that may lead to terminal illness.

https://doi.org/10.1109/IJCNN60899.2024.10650819

In recent times, the Visual Transformer (VT) has emerged as a powerful alternative to the conventional Convolutional Neural Networks (CNNs) for their superior attention mechanism and pattern recognition abilities. Within a short time, the VT paradigm has given rise to many variants, each showcasing enhanced accuracy and optimized performance for various computer vision applications. Our study introduces a multitransformer pipeline for optimal VT architecture exploration in AD detection and classification. Through a comparative evaluation among the VT variants, this study also aims to contribute valuable insights into the applicability of VTs in Alzheimer’s Disease (AD) classification using OASIS and ADNI datasets. Furthermore, VT performances are systematically compared with CNNs to determine the basic capabilities of the models and their limitations in capturing intricate patterns indicative of early AD stages under both data-rich and data-scarce situations. The results resonate with the fact that the attention mechanism of VTs is of pivotal importance for achieving superior performance in AD diagnosis. The codes used in the study are made publicly available.

https://doi.org/10.1109/IJCNN60899.2024.10650975

This study delves into the characterization of synthetic lung nodules using latent diffusion models applied to chest CT scans. Our experiments involve guiding the diffusion process by means of a binary mask for localization and various nodule attributes. In particular, the mask indicates the approximate position of the nodule in the shape of a bounding box, while the other scalar attributes are encoded in an embedding vector. The diffusion model operates in 2D, producing a single synthetic CT slice during inference. The architecture comprises a VQ-VAE encoder to convert between the image and latent spaces, and a U-Net responsible for the denoising process. Our primary objective is to assess the quality of synthesized images as a function of the conditional attributes. We discuss possible biases and whether the model adequately positions and characterizes synthetic nodules. Our findings on the capabilities and limitations of the proposed approach may be of interest for downstream tasks involving limited datasets with non-uniform observations, as it is often the case for medical imaging.

https://ebooks.iospress.nl/pdf/doi/10.3233/FAIA240408

Conference Proceedings

Despite major treatment innovations, cancer remains a leading cause of mortality worldwide, with breast and lung cancer being the most prevalent. Early and accurate diagnosis through medical imaging is essential for successful treatment. However, the development of robust AI-based diagnostic tools is often hindered by the limited availability of large, high-quality annotated datasets. Acquisition is expensive, time-consuming, and requires input from multiple experts, restricting the training and validation of generalisable deep learning models.
To address these challenges, this work proposes a unified pipeline combining synthetic image generation with single-image super-resolution to produce and enhance medical imaging samples as a scalable solution to data scarcity and quality limitations. The solution is trained and validated on two distinct clinical use cases: breast MRI (Duke Breast Cancer MRI and a private dataset); and lung CT scans (LIDC/IDRI and RIDER datasets). The combined approach is expected to
yield image samples of sufficient quality for specialist review and use within the clinical diagnostic framework.

https://zenodo.org/records/19814289

Poster: https://zenodo.org/records/19821508

Lung CT scans are essential for diagnosing lung cancer, but the development of deep learning applications is limited by the scarcity of annotated data, given the high costs of expert labeling and privacy concerns. This work explores the use of conditional score-based diffusion models to generate realistic synthetic lung CT scan slices guided by lung and nodule segmentation maps.
Using a subset of the LIDC-IDRI dataset, the approach optimizes a time-informed U-Net network to progressively denoise random noise into coherent lung structures that accurately follow the segmentation maps. The conditional guidance is performed by simply concatenating the segmentation maps to the noised input of the reverse diffusion process. During training, the model is firstly optimized to generate samples without nodule information. Then, the same model
is finetuned to the introduction of nodule segmentation maps in the conditional information.
It is possible to confirm from Fig. 1 that the generative process respects the boundaries provided by the respective segmentation maps of each sample. Moreover, the lung content in the synthetic samples resembles that of the originals’.
Future work should explore extending the method to 3D conditional generation and validate the synthetic data, for instance by incorporating it into downstream model training and assessing any resulting performance improvements.

https://zenodo.org/records/19813429

Poster: https://zenodo.org/records/19814061

Cancer remains a leading cause of mortality worldwide, with breast and lung cancers accounting for many cases. Accurate and early diagnosis relies on high-quality medical imaging, yet data scarcity and limited resolution constrain robust computational tools. This work presents a pipeline integrating synthetic image generation with super-resolution to improve the quality and availability of medical imaging. Our approach splits the problem into two: anatomical coherence and variability are handled by 3D generative adversarial networks (GANs), while fine structural detail is enhanced via Real-ESRGAN. The framework was evaluated on breast MRI and lung CT datasets. Despite some limitations, results demonstrate that combining generative modelling with super-resolution can expand medical imaging datasets and improve fidelity, contributing to more precise and reliable cancer diagnosis, as confirmed by quantitative metrics, perceptual evaluation, and expert feedback.

https://zenodo.org/records/17305661

Poster: https://zenodo.org/records/17451437

Chest CT scans are vital for diagnosing lung abnormalities, yet their use in Deep Learning is limited by data scarcity, labeling costs, and privacy concerns. This work explores Score-based Diffusion Models for conditional CT slice generation restricted to the lung area. Two cases are considered: conditioning on lung masks, and on both lung and nodule masks. Custom U-Net architectures are trained under Variance Preserving (VP) and Variance Exploding (VE) stochastic differential equations to compare diffusion trajectories. VP SDEs achieve higher fidelity, with better FID, MMD, and SSIM than VE. Generated images follow conditioning closely, producing realistic lung and nodule structures for data augmentation. While current success is limited to 2D slices, future work targets 3D generation and richer anatomical conditioning. In essence, these results highlight the promise of conditional diffusion models for computer-aided diagnosis and Deep Learning in lung disease analysis.

https://zenodo.org/records/17305553

Poster: https://zenodo.org/records/17387900

Although the use of AI models in medicine reveals great potential, the use of medical images for the training of models understandingly raises ethical and privacy concerns. This study aims to implement a WGAN-GP model that uses a set of lung CT scans for cancer-suffering patients to generate accurate 2D synthetic semantic segmentation masks, by segmenting each CT scan into semantic masks. To compare model’s performance, different sample resolutions and hyperparameters were experimented with. Results obtained demonstrate the model’s capability to correctly map lung anatomy and segment its different components, thus producing realistic and feasible semantic segmentation masks. While current findings are limited to 2D and sensitive to sample resolution, prospects envision the branching out into 3D medical-grade and more complex samples. Said results highlight the potential for such architectures to be used in tandem with mask-conditioned generative models and two-step data augmentation.

https://zenodo.org/records/17304869

Poster: https://zenodo.org/records/17451088

Research Preprints

Currently, a central challenge and bottleneck in the deployment and validation of computer-aided diagnosis (CAD) models within the field of medical imaging is data scarcity. For lung cancer, one of the most prevalent types worldwide, limited datasets can delay diagnosis and have an impact on patient outcome. Generative AI offers a promising solution for this issue, but dealing with the complex distribution of full Hounsfield Unit (HU) range lung CT scans is challenging and remains as a highly computationally demanding task. This paper introduces a novel decomposition strategy that synthesizes CT images one HU interval at a time, rather than modelling the entire HU domain at once. This framework focuses on training generative architectures on individual tissue-focused HU windows, then merges their output into a full-range scan via a learned reconstruction network that effectively reverses the HU-windowing process. We further propose multi-head and multi-decoder models to better capture textures while preserving anatomical consistency, with a multi-head VQVAE achieving the best performance for the generative task. Quantitative evaluation shows this approach significantly outperforms conventional 2D full-range baselines, achieving a 6.2% improvement in FID and superior MMD, Precision, and Recall across all HU intervals. The best performance is achieved by a multi-head VQVAE variant, demonstrating that it is possible to enhance visual fidelity and variability while also reducing model complexity and computational cost. This work establishes a new paradigm for structure-aware medical image synthesis, aligning generative modelling with clinical interpretation.

https://doi.org/10.48550/arXiv.2603.23041

Synthetic data generation (SDG) structured health data is increasingly promoted as a solution to longstanding barriers in health data access. It is offering the promise of privacy-preserving data reuse for research, innovation, and policy. Despite rapid technical advances, the adoption of synthetic health data in real-world settings remains limited. Shaped by challenges around data quality, representativeness, infrastructure readiness, trust, and legal uncertainty, this viewpoint draws on experiences from 7 European research initiatives within the HealthData4EU cluster to reflect on how SDG is being operationalized in practice. It synthesizes cross-project insights to highlight recurring methodological and governance tensions and to examine their implications for trust and responsible use. The analysis argues that trustworthy SDG cannot be achieved through technical optimization alone but requires alignment between evaluation practices, upstream data stewardship, regulatory clarity, and sustained stakeholder engagement. Addressing these conditions is essential for moving synthetic data from experimental pilots toward a credible and sustainable component of European health research ecosystems.

https://doi.org/10.2196/83369

The increasing use of ML models in cancer prediction has heightened the focus on their explainability, particularly in ranking risk factors. Although feature importance techniques like LIME and SHAP are widely used, their consistency in ranking predictive factors remains underexplored. This study examines the consistency of the explanations provided by these techniques in medical applications using UK primary care data CPRD. Their consistency is assessed both within the same ML model and across different ML models for a classification task to identify lung cancer and non-lung cancer cases. The experimental results show that LIME and SHAP produce significantly different rankings even when applied to the same models. However, rankings are more consistent when the same technique is used across different models. These inconsistencies are addressed with feature importance fusion, which integrates rankings from multiple techniques within and across different models. This has been formulated as an optimisation problem and solved using a closed-form solution and a heuristic approach. The fused rankings align more closely with model-generated LIME and SHAP rankings, reducing variability across feature importance techniques and improving the overall consistency of feature importance assessments in medical applications.

https://ssrn.com/abstract=5184191

This study presents a robust and efficient client selection protocol designed to optimize the Federated Learning (FL) process for the Federated Tumor Segmentation Challenge (FeTS 2024). In the evolving landscape of FL, the judicious selection of collaborators emerges as a critical determinant for the success and efficiency of collective learning endeavors, particularly in domains requiring high precision. This work introduces a recommender engine framework based on non-negative matrix factorization (NNMF) and a hybrid aggregation approach that blends content-based and collaborative filtering. This method intelligently analyzes historical performance, expertise, and other relevant metrics to identify the most suitable collaborators. This approach not only addresses the cold start problem where new or inactive collaborators pose selection challenges due to limited data but also significantly improves the precision and efficiency of the FL process. Additionally, we propose harmonic similarity weight aggregation (HSimAgg) for adaptive aggregation of model parameters. We utilized a dataset comprising 1,251 multi-parametric magnetic resonance imaging (mpMRI) scans from individuals diagnosed with glioblastoma (GBM) for training purposes and an additional 219 mpMRI scans for external evaluations. Our federated tumor segmentation approach achieved dice scores of 0.7298, 0.7424, and 0.8218 for enhancing tumor (ET), tumor core (TC), and whole tumor (WT) segmentation tasks respectively on the external validation set. In conclusion, this research demonstrates that selecting collaborators with expertise aligned to specific tasks, like brain tumor segmentation, improves the effectiveness of FL networks.

https://doi.org/10.48550/arXiv.2412.20250

Federated learning (FL) enables collaborative model training across decentralized datasets while preserving data privacy. However, optimally selecting participating collaborators in dynamic FL environments remains challenging. We present RL-HSimAgg, a novel reinforcement learning (RL) and similarity-weighted aggregation (simAgg) algorithm using harmonic mean to manage outlier data points. This paper proposes applying multi-armed bandit algorithms to improve collaborator selection and model generalization. By balancing exploration-exploitation trade-offs, these RL methods can promote resource-efficient training with diverse datasets. We demonstrate the effectiveness of Epsilon-greedy (EG) and upper confidence bound (UCB) algorithms for federated brain lesion segmentation. In simulation experiments on internal and external validation sets, RL-HSimAgg with UCB collaborator outperformed the EG method across all metrics, achieving higher Dice scores for Enhancing Tumor (0.7334 vs 0.6797), Tumor Core (0.7432 vs 0.6821), and Whole Tumor (0.8252 vs 0.7931) segmentation. Therefore, for the Federated Tumor Segmentation Challenge (FeTS 2024), we consider UCB as our primary client selection approach in federated Glioblastoma lesion segmentation of multi-modal MRIs. In conclusion, our research demonstrates that RL-based collaborator management, e.g. using UCB, can potentially improve model robustness and flexibility in distributed learning environments, particularly in domains like brain tumor segmentation.

https://doi.org/10.48550/arXiv.2412.20253

This work introduces a new latent diffusion model to generate high-quality 3D chest CT scans conditioned on 3D anatomical masks. The method synthesizes volumetric images of size 256x256x256 at 1 mm isotropic resolution using a single mid-range GPU, significantly lowering the computational cost compared to existing approaches. The conditioning masks delineate lung and nodule regions, enabling precise control over the output anatomical features. Experimental results demonstrate that conditioning solely on nodule masks leads to anatomically incorrect outputs, highlighting the importance of incorporating global lung structure for accurate conditional synthesis. The proposed approach supports the generation of diverse CT volumes with and without lung nodules of varying attributes, providing a valuable tool for training AI models or healthcare professionals.

https://doi.org/10.48550/arXiv.2510.18446