Big Data Infrastructure for Massive Data Management - WP3

Work Package 3 (WP3) — Big Data Infrastructure for Massive Data Management — is the technological backbone of the PROTECT-CHILD project, creating the secure and scalable systems needed to handle sensitive health and genomic data for rare pediatric transplant patients. This infrastructure is critical for enabling safe data sharing and collaborative research across Europe.
Our Ambition: A Federated, Secure Infrastructure
WP3 aims to bridge public and private cloud solutions to support the massive data needs of European health data spaces, with a strong focus on genomics. The key ambitions include:
- Providing environments that fully comply with GDPR and EHDS for pediatric transplant data.
- Building highly secure, scalable storage and computing resources designed for health and genomic workloads.
- Enabling fast, federated data sharing and analysis, laying a solid foundation for innovative AI, analytics, and clinical tools downstream.
What’s Happening in WP3 Right Now
Researching and Prototyping Secure Architectures:
- Zero Trust Architecture: Evaluating SPIRE for service mesh federation, Open Policy Agent (OPA) for policy enforcement, and WebAssembly (WASM) for efficient microservices.
- Genomic Workflows: Exploring both commercial tools and open-source workflows in line with ELIXIR international standards.
Deploying Core Infrastructure
- Container Orchestration: Implementing Kubernetes and Istio for scalable, high-availability deployments.
- Service Mesh and Policy: Testing SPIRE, OPA, and WASM for dynamic security enforcement.
- Genomic Analysis Workflows: Adapting workflows so tools like Elixir, Galaxy, and annotation tools (VEP, ANNOVAR) run securely within the platform’s service mesh, focusing on seamless VCF/FASTQ processing and annotation.
Big Data & Secure High-Performance Computing
- HPC Integration: Assessing and configuring hardware/software for genomic workload optimization, including tasks such as variant calling and annotation.
Over the next months, the goal is to integrate the core infrastructure, test it with real workflows for both raw and processed genomic data, compare open‑source tools to commercial ones, and validate that the system is secure, scalable, and easy for authorized users to access.
The Big Picture
By building this infrastructure, PROTECT-CHILD will allow hospitals, researchers, and policymakers to:
- Share data securely across borders: Collaborate on pediatric transplant research without moving sensitive patient data between countries.
- Make discoveries that can lead to better treatments, with full respect for each patient’s privacy and legal rights.
- Provide advanced, user-friendly tools that benefit both experts and non-specialists.
WP3 is developing the technological capacity that enables secure, collaborative, and innovative pediatric transplant research across Europe.
Author: Jessica Urrutia, Inetum