Artificial intelligence (AI) enables data-driven innovations in health care. AI systems, which process vast amounts of data quickly and in detail, show promise both as a tool for preventive health care and clinical decision-making. However, the distributed storage and limited access to health data form a barrier to innovation, as developing trustworthy AI systems requires large datasets for training and validation. Furthermore, the availability of anonymous datasets would increase the adoption of AI-powered tools by supporting health technology assessments and education. Secure, privacy compliant data utilization is key for unlocking the full potential of AI and data analytics. In this proposal, we will advance the current state-of-the-art data synthesis methods towards a more generalized approach of synthetic data generation. We will also develop metrics for testing and validation, as well as protocols that enable synthetic data generation without access to real-world data (through multi-party computation).
We aim to provide:
- Improved methods and technical pipelines for privacy-preserving data synthesis including different data formats such as EHRs and medical images,
- Easy to use and configurable data services to enable AI developers’ access to larger pools of decentralized de-identified data through multi-party computing,
- Provide anonymous data on demand or from a (temporary) repository,
- Establish a Data Market – facilitating data sharing and monetization incl. incentives-based provision of data to the services,
- Integrate the data market and the data service ecosystem as a X-European health data hub in the European Health Data Space, and
- Validate the results with real-world use-cases focusing on high impact diseases, cancer types in particular.