Better Software for better ScienceThe 13th Iberian Grid Conference will take place in Porto, Portugal, from Monday 28th to Wednesday 30th of October.
Important Dates
|
Organised by:
Supported by:
The DT-GEO project (2022-2025), funded under the Horizon Europe topic call INFRA-2021-TECH-01-01, is implementing an interdisciplinary digital twin for modelling and simulating geophysical extremes at the service of research infrastructures and related communities. The digital twin consists of interrelated Digital Twin Components (DTCs) dealing with geohazards from earthquakes to volcanoes to tsunamis and that harness world-class computational (FENIX, EuroHPC) and data (EPOS) Research Infrastructures, operational monitoring networks, and leading-edge research and academic partnerships in various fields of geophysics. The project is merging and assembling latest developments from other European projects and EuroHPC Centers of Excellence to deploy 12 DTCs, intended as self-contained containerized entities embedding flagship simulation codes, artificial intelligence layers, large volumes of (real-time) data streams from and into data-lakes, data assimilation methodologies, and overarching workflows for deployment and execution of single or coupled DTCs in centralized HPC and virtual cloud computing Research Infrastructures (RIs). Each DTC addresses specific scientific questions and circumvents technical challenges related to hazard assessment, early warning, forecasts, urgent computing, or geo-resource prospection. This presentation summarizes the results form the two first years of the project including the digital twin architecture and the (meta)data structures enabling (semi-)automatic discovery, contextualization, and orchestration of software (services) and data assets. This is a preliminary step before verifying the DTCs at 13 Site Demonstrators and starts a long-term community effort towards a twin on Geophysical Extremes integrated in the Destination Earth (DestinE) initiative.
The today's computational capabilities and the availability of large data volumes is allowing to develop Digital Twins able to provide unrivaled precision. Geophysics is one field that benefits from the ability to simulate the evolution of multi-physics natural system across a wide spatio-temporal scale range. This is also possible thanks to the access to HPC systems and programming frameworks that can provide paradigms to combine the massive data streams generated by observational systems with large-scale numerical simulations in HPC or cloud environments.
The DT-GEO initiative is implementing interdisciplinary and interrelated DT Components (DTCs) for geophysical hazards from earthquakes (natural or anthropogenically induced), volcanoes, and tsunamis. These DTCs are designed as self-contained and containerized entities embedding flagship codes, AI layers, data streams in workflows.
PyCOMPSs is used for the development and execution of the DTCs as parallel workflows on top of the FENIX Infrastructure that includes supercomputers as Leonardo at CINECA and MareNostrum at BSC. PyCOMPSs is a task-based programming model that simplifies the development of applications for distributed infrastructures, such as HPCs, clouds, and other managed clusters. Its integration enables efficient parallelization of tasks within DTCs workflows, optimizing computational resources and enhancing overall performance. It also provides a lightweight interface to implement dynamic HPC+AI workflows which can change their behaviour at execution time due to exceptions or faults. A parallel Machine Learning library built on top of PyCOMPSs, dislib, is also available to DT-GEO users.
Reproducibility of experiments has become very important in order to guarantee the validation of results in the publication of research papers. The recording of metadata in the form of provenance is one of the most effective ways to achieve reproducibility of workflow experiments.
This contribution provide insights on the programming interfaces to define the DTCs workflows in DT-GEO and explains how the DTCs can be packaged in order to reduce the effort required to deploy them by the underlying services on the computing infrastructure and how can be integrated and reused in different workflows using PyCOMPSs as orchestrator. We will also show how Workflow Provenance is recorded in PyCOMPSs, following the RO-Crate metadata specification, and how this metadata can be used to achieve FAIR workflows through their publication in WorkflowHub.
FAIRness is an important quality of all kinds of data and metadata with each of the principles FAIR standing for the different criteria: Findability, Accessibility, Interoperability and Reusabilty.
FAIR-EVA is a tool that reports the FAIRrness level of digital objects from different repositories or data portals implemented via different plugins, allowing the user to improve the maturity level of the digital object, In order to get the metadata the user only needs to provide the link to where the item is stored and its identifier.
To evaluate a digital object a set of tests is created for each of the FAIR categories, ranging from considering the persistence of the identifier, to searching in different controlled vocabularies .
In the last year, we have added a plugin that evaluates digital objects stored on the EPOS platform, where the metadata for the DT-GEO Project will be stored. In tandem, a new metadata schema has been created for the EPOS platform. To compare the new schema to the old one, both have been evaluated via datasets that were represented in the two of them. The results showed an improvement from the old schema to the new one, once more metadata is given the results will be even better.
In this presentation we’ll explain FAIR data, the fair evaluator and its development in the context of the EPOS platform and the DT- GEO project.
DT-GEO aims to provide digital twins of the earth system to mimic different system components and provide analysis, forecasts and what-if scenarios for geophysical extremes, enabling a deeper insight into these events. Addressing the complexity of mimicking the earth system, as well as the multitude of software codes required to realise the DT-GEO vision, demands a modular architecture, where containerization of the workflows provides a path towards easy deployment, maintainability and portability across infrastructures.
The use of containers as a means to deliver and execute applications targeting multiple heterogeneous computing environments is a key aspect of the DT-GEO architecture. In addition, containers also contribute to better reproducibility, facilitate digital preservation, and promote the reuse of the several codes. In this presentation we will discuss how containerization powers the realisation of the DT-GEO modular architecture for the workflows of digital twin components, while enabling the open science principles for the project software assets.
The data management team in the DT-GEO project is following a novel approach for the characterization of the Digital Twins for geophysical extremes being developed in the project. The approach relies on the use of rich metadata to describe the components of Digital Twins (DTCs): (i) the digital assets (DAs) --namely datasets, software-services, workflows and steps within those workflows--, and (ii) the relationships between such assets or entities. This conforms to the structure of CERIF (Common European Research Interchange Format) - a very early example of graph-structured metadata used as the storage form of the European Plate Observing System (EPOS) metadata catalog.
The resulting DT-GEO metadata schema responds to a requirement gathering process carried out jointly with the communities participating in the project. In the first phase, the schema was standarized according to the EPOS-DCAT-AP data catalog vocabulary for the DAs, which resulted in an extended version of the latter. The collected metadata from the communities was then serialized into RDF documents (Resource Description Framework) and exposed through a prototype metadata catalog, based on the EPOS ICS-C data portal, via REST API and web portal interfaces.
At the time of writing the second phase is being undertaken and aims at accommodating the metadata for the workflows operated by the Digital Twins, following again the CERIF data model. As a result of finding EPOS-DCAT-AP inappropriate for the task, the data management team decided to opt for the RO-Crate specification, used to package research objects. Thus, the metadata for the workflow descriptions are serialized into RO-Crate documents (in JSON-LD format) to enable their abstract characterisation, including the aforementioned relationships between DTCs, workflows, and steps within these workflows. This allows to populate the registries and catalogs required by the downstream workflow management system solution, i.e. PyCOMPSs in the DT-GEO case.
The metadata management follows a continuous improvement process where the maintainers of the Digital Twins contribute with new changes, which are then reviewed by the data management team in order to validate and harmonize the metadata values for the sake of interoperability. Throughout this process, the FAIR maturity and quality assurance of the digital assets is key, and thus, each DA is being evaluated through diverse means. In the particular case of the data, a plugin for the DT-GEO prototype catalog was developed for the FAIR-EVA tool, which enables the early evaluation of the FAIRness of the DT-GEO datasets. In the case of the DT-GEO software (code and services), the SQAaaS platform (Software Quality Assurance as a Service) provides accurate reporting about SQA characteristics. Last but not least, the validation of the workflows is done through a continuous integration setup that triggers the workflow execution in a HPC-like environment in the Cloud, once the workflow code has been checked by the SQAaaS service, and the containers that represent the steps in the workflow have been packaged.
Portugal and Spain share a common history when suffering from natural hazards. In 1755, we experienced the largest natural hazard ever happened in Europe in historical times. This event included an earthquake and the second most deadly tsunami in our records, just behind the Sumatra catastrophe in 2004. DT-GEO project (A Digital Twin for GEOphysical extremes) aims at analysing and forecasting the impact of tsunamis, earthquakes, volcanoes, and anthropogenic hazards. The EDANYA group of the University of Málaga (UMA) contributes to this project developing the DTC (Digital Twin Component) related with tsunamis and, therefore, our main contribution is in “WP6. Tsunamis”. The main objective within this work package is to develop and implement a DTC for data-informed Probabilistic Tsunami Forecasting (PTF) (DTC-T1). This DTC will be tested through four demonstrators at four relevant sites: Mediterranean Sea coast (SD4), Eastern Sicily (SD5), Chilean cost (SD6), and Eastern Honshu coast in Japan (SD7). The main aim of these demonstrators is to test the PTF for various earthquake sources across the world (Mediterranean Sea, Chile and Japan) evaluating how new functionalities such as real-time data fusion of seismic, GNSS, and tsunami data reduce source uncertainty. In particular, the SD5 Eastern Sicily, aims at testing the PTF for both earthquake and earthquake-induced landslide.
On the one hand, we contribute to the project with two European flagship codes: Tsunami-HySEA and Landslide-HySEA for the simulation of tsunamis generated by seismic sources and by landslides, respectively. In addition, continuous improvements to these codes are being performed within the framework of DT-GEO, where a new version of Tsunami-HySEA that includes a new initialization mode has been developed. This new mode allows integrating simulated data from external seismic or landslide models (such as BingClaw, SeisSol, etc.) to dynamically generate the initial water surface elevation in the tsunami simulation. On the other hand, to allow the aforementioned coupling, an interface module integrated in the workflow was developed as a joint effort between UHAM, UMA, LMU and IPGP. Current work, within the task T6.4 and deliverable D6.4, that we lead, consists on integrating the improved versions of the codes, including dispersion, in the PTF workflow.
The Software Quality as a Service (SQAaaS) platform allows the researcher to evaluate the quality characteristics of their source code, web services and the FAIR maturity level of scientific data. This platform exposes different interfaces ranging from a graphical web portal to APIs, libraries and integrations with code hosting platforms. In this tutorial we will dive into the current features of the platform and demonstrate its usage through a series of use cases.
Currently being the industry-leading on its segment, Kubernetes (K8s) is a container orchestration platform which offers to the organizations enhanced reliability, availability, and resilience for their services compared to monolithic architectures.
This highly practical workshop is designed for new or beginner users. Participants will be introduced to K8s and gain the essential knowledge to deploy an application on a K8s cluster through a hands-on, guided tutorial.
No prior knowledge is required, and there is no need for any technical setup on your laptop. You just need to be able to SSH to a remote host, and you're ready to go!
We will make a summary of the infrastructure status and the summary of research and development activities taking place under the umbrella of IBERGRID.
This presentation will provide a summary and details about the history of IBERGID in the EGI Federation, showing the complex provider, user and innovator relationships that exist between Spain, Portugal and the rest of EGI. The talk aims to socialise and promote the benefits and achievements of the IBERGRID - EGI partnership.
Spain and Portugal have been significant contributors to the EGI Federation since its beginning. Over the years, IBERGRID has demonstrated a solid commitment to advancing scientific research and technological innovation. With a continuously growing user base (over 3,300 registered users in 2023), and overall ten HTC and seven cloud sites, these countries have played a pivotal role in advancing science with the use of advanced compute services, and providing computational resources and services to the European research community. In 2023 IBERGRID providers delivered nearly 320 Million HTC CPU hours and 29 Million Cloud CPU hours to users via EGI.
Spanish and Portuguese institutions made significant contributions to the innovation of EGI as evidenced by their extensive involvement in numerous R&D collaborative projects and coordinated initiatives. Through over 43 joint projects (in 2023) and active participation in other international R&D efforts, IBERGRID institutes fostered knowledge exchange and technology development, and played a pivotal role in driving the evolution of EGI, and through EGI also of the European Open Science Cloud. The most significant innovations from Spain and Portugal contributed through EGI to breakthroughs in the fields of physics, medicine and climate research.
Spain and Portugal's collaborative spirit and technological expertise have positioned them as driving forces within the EGI Federation. Their contributions to scientific research and their potential for future breakthroughs solidify their status as key players in shaping the European research landscape.
Kubernetes (K8s) is the industry-leading container orchestration platform, offering organizations enhanced reliability, availability, and resilience for its services compared to monolithic architectures. Also, by adopting a declarative paradigm, K8s simplifies the management of multiple complex environments. When integrated with tools such as ArgoCD and Gitlab CI pipelines, K8s also makes it easier for organizations to implement the GitOps methodology. This ensures consistency and traceability across both the cluster and the entire application lifecycle. In this talk, we will discuss our experience with K8s deployment at INCD, we will describe the architecture, automation, and integrations that we made, and reflect on the lessons learned.
The third National Tripartite Event (NTE) organised by Spain took place on 24 September 2024. The event covered three blocks: the EOSC governance, new INFRAEOSC projects with Spanish partners, and updates on the EOSC Federation.
The event covered the last updates on the EOSC governance, including the new task forces, the consultation processes on the Strategic Research and Innovation Agenda and the Multi-Annual Report. The whole governance includes the EOSC Steering Board and the European Commission.
The Spanish participation on the INFRAEOSC calls has experimented a significant increase, with four projects coordinated by Spanish Institutions and an important fraction of the projects funded include Spanish institutions. Spanish participation had an important focus on Software.
An important part of the discussion focused on the EOSC federation and the concept of nodes, recently presented on the EU node launching event.
The presentation will reflect the outlines of this events and the last news on the EOSC collaboration, with a special focus on Spain and Portugal.
In the context of the CHAIMELEON project (https://chaimeleon.eu/) we have developed a secure processing environment to manage medical imaging data and their associated clinical data enabling researchers to share, publish, process and trace datasets in virtual environments, powered by intensive computing resources.
The environment is built on top of a Kubernetes cluster and leverages native objects such as namespaces, policies, service accounts and role based access mechanisms to define read-only views of the medical data, mounted on GUI-based virtual research environments in which the data is accessible without the possibility of downloading it outside of the platform borders. Additional functionality is implemented through custom resources and operators. Coarse actions such as dataset creation, dataset access or dataset updates are auditable and registered on a blockchain that the data holder who provide the data can consult.
Processing environments are powered by a set of partitions that act as job queues providing different flavours in terms of GPU, memory and cores. These resources are managed through a special component that facilitates the execution of containerised batch jobs, including the support of uDocker for custom containers.
The environment has been validated in the course of a public Open Challenge in which 10 users compete to train the most accurate AI models for addressing two cancer-oriented use cases.
Over the last two years, the Galician Marine Sciences Program (CCMM) has developed a Data Lake to support the collection and analysis of data related to Galicia’s marine ecosystem. The Data Lake architecture facilitates processing both structured and unstructured data, already integrating diverse datasets such as ocean currents velocity maps, species distribution data, upwelling indices, buoy-derived marine conditions, marine carbon-related datasets, SOCAT coastal and North Atlantic data and atmospheric models.
For the storage layer, the Data Lake utilizes Apache Hadoop’s HDFS distributed filesystem and Apache Parquet for efficient distributed and parallel processing.
For the analysis layer, Apache Spark enables high-performance, scalable data processing, combining multiple datasets to advance marine ecosystem research and support sustainable resource management.
Interactive processing is enabled through a web portal that uses JupyterLab notebooks tightly integrated with the Data Lake and customized for marine sciences usage.
The Data Lake not only accelerates data-driven insights but also provides a scalable infrastructure for future research, fostering collaboration and innovation in the sustainable management of Galicia’s marine resources.
OSCAR is an open-source framework built on Kubernetes (K8s) for event-driven data processing of serverless applications packaged as Docker containers. The execution of these applications can be triggered both by detecting events from object-storage systems, such as MinIO or dCache (asynchronous calls) or by directly invoking them (synchronous calls). OSCAR is being used in several research projects, for example, it is the main inference framework used in the AI4EOSC (Artificial Intelligence for the European Open Science Cloud) platform. Related with this project, iMagine, Imaging data and services for aquatic science, is using the AI4EOSC platform to support their eight marine use cases. In the context of the iMagine project, the need to process big amounts of files by AI (Artificial Intelligence) models on top of the project’s serverless platform has arisen.
For this purpose, we have designed OSCAR-Batch [https://github.com/grycap/oscar-batch]. OSCAR Batch includes a coordinator component developed in Python where the user provides a MinIO bucket containing files for processing. This component calculates the optimal number of parallel service invocations that can be accommodated within the OSCAR cluster and distributes the image processing workload accordingly among the services. OSCAR Batch uses a strategy based on sidecar containers, where the content of the MinIO bucket is mounted as a volume and accessible for the instance of the service (a K8s pod), preventing from moving large amounts of files. With this strategy, OSCAR-Batch ensures the efficient use of available CPU and memory resources in the OSCAR cluster and facilitates accessing to the output results, that will be stored also in MinIO.
This tool is mainly intended to process many files, such as historical data from an observatory service. For example, coming from the iMagine project, we support the analysis of historical data (hundreds of thousands of images) coming from the OBSEA underwater observatory by means of an AI-based fish detection and classification algorithm based on YOLOv8. With OSCAR-Batch, use case owners can reanalyze their data each time a new version of the AI model is produced.
This contribution presents OSCAR-Batch and exemplifies its usage with real use case scenarios coming from the iMagine project.
This work is partially funded by the project iMagine "Imaging data and services for aquatic science", which has received funding from the European Union Horizon Europe Programme – Grant Agreement number 101058625. Also, by the project AI4EOSC "Artificial Intelligence for the European Open Science Cloud", that has received funding from the European Union’s HORIZON-INFRA-2021-EOSC-01 Programme under Grant 101058593. Finally, Grant PID2020-113126RB-I00 funded by MICIU/AEI/10.13039/501100011033.
Recently, the computing continuum has emerged as an extension of existing cloud services. Existing cloud services enable users to access almost unlimited resources for the processing, storage, and sharing of data. Nevertheless, the increasing volume of data, as well as their constant production, have motivated the distribution of both data and computation through existing computational nodes. In the computing continuum, organizations organize similar computational nodes in the form of vertical layers. The more typical layers are the edge, the fog, and the cloud. The goal is to take advantage of the characteristics of the infrastructure in each layer, like the proximity of sensors with edge devices and the highest computational specifications of fog and cloud machines. Nevertheless, the construction of these systems is a complex task as it involves the coordination of multiple actors. From sensors and Internet-of-things (IoT) devices collecting the data to visualization services, passing through intelligence artificial and machine learning applications to process the data to produce information, as well as storage silos to preserve and share both raw data and information. The movement of data through these environments is another issue to consider as actors are distributed through different infrastructures. This produces delays in the delivery of data, as well as security and reliability concerns as the data is moved through uncontrolled networks. Here we introduce an architecture to manage the deployment and execution of computing continuum systems as well as the movement of data through the different infrastructures considered. Using a high-level self-similar scheme, organizations can create the abstract representation of a system by chaining different applications and choosing the infrastructure where they will be deployed. These applications are managed as serverless containers, which contain all the requirements to enable these applications to be loosely coupled and deployed on the selected infrastructure. Furthermore, these containers include an API Rest to enable their remote invocation. Data channels through the multiple applications and infrastructures in a system are created using abstractions called data containers, which orchestrate the movement of data through different sites, creating a content delivery network. To meet security and reliability requirements when moving data through data containers, we designed a configurable non-functional scheme that allows organizations to couple computing continuum systems with applications to fulfil non-functional requirements. Moreover, to mitigate bottlenecks in a system, we include an adaptive mechanism that includes a monitoring scheme and an autoscaling mechanism based on parallel patterns. First, the monitoring scheme measures the performance of the applications by collecting metrics such as their service times, average waiting time in queue of objects, the volume of data processed, and throughput of the applications. Next, the autoscaling mechanism identifies the bottlenecks in the system and creates a parallelism strategy to mitigate them. The goal is to produce a steady data flow through the system without modifying the code of applications and without knowing the characteristics of the input workload and the hardware where applications are deployed.
The presentation will discuss the point of view and opportunities offered by the EuroHPC JU to support the computing needs of SMEs and large industries. This will be a remote presentation.
HPCNow! provides its customers with solutions and technologies for dealing with the most complex problems in High Performance Computing (HPC). Installing and managing a HPC cluster and deploying user applications involves a wide range of software packages. A cluster manager provides a unified way to install all the nodes, manage them, and synchronize configurations across the entire cluster. A parallel distributed file system acts as a massive cluster-wide hard drive, enhancing usability and enabling scalable I/O. A workload management system enables efficient execution of various workflows by optimizing the available resources. Scientific application compilation and installation systems maximize the performance of both processor architectures and the file system. Finally, user environments that offer simple and intuitive access will encourage end users to make better and greater use of available HPC resources.
This talk will present examples and tools for each of these points that HPCNow utilizes daily.
The amount of data gathered, shared, and processed in frontier research is set to increase steeply in the coming decade, leading to unprecedented data processing, simulation, and analysis needs. In particular, high-energy physics and radio astronomy are gearing up for groundbreaking instruments, necessitating infrastructures many times larger than current capabilities. In this context, the SPECTRUM project (https://spectrumproject.eu/) has brought together leading European science organisations and e-infrastructure providers to formulate a Strategic Research, Innovation, and Deployment Agenda (SRIDA) along with a Technical Blueprint for a European computer and data continuum.
With a consortium composed of leading European science organisations in High Energy Physics and Radio Astronomy, and leading e-Infrastructure providers covering HTC, HPC, Cloud and Quantum technologies,the project has set up a Community of Practice (https://www.spectrumproject.eu/spectrumcop) composed of external experts in several working groups. This collaborative effort will facilitate the creation of an exabyte-scale research data federation and compute continuum, fostering data-intensive scientific collaborations across Europe.
This presentation aims to introduce SPECTRUM project to IBERGRID community, including the early results already obtained including survey and scientific use cases, and encourage IBERGRID community to engage in the effort that will define the roadmap at the end of the project.
Through the National Recovery and Resilience Program (NRRP), Italy has funded the constitution of an unprecedented national infrastructure targeting digital resources and services for science and industry. Specifically, the National Center on HPC, Big Data and Quantum Computing (“ICSC”) is an initiative funded with €320M to evolve existing public state-of-the-art network, data, and compute services in the Country, establish new facilities and solutions, and drive an ambitious program for fundamental as well as industrial research. In this contribution, the current state of work of ICSC will be given, exploring the instruments and collaborations that ICSC has been enacting to maximize the impact of this initiative at the national and international levels.
With the increase in microbial resistance to therapeutics, there is a higher demand for finding new pathways or molecular targets to treat bacterial infections. The Pseudomonas quinolone system (PQS) is a part of the quorum sensing (QS) communication system of Pseudomonas aeruginosa, which controls the production of biofilms and several other virulence factors. Inhibiting quorum sensing does not kill the bacteria but decreases its virulence and prevents biofilm formation.
Using a combination of in silico methods such as protein-ligand docking, structured-based virtual screening of large virtual databases of chemical entities, extensive molecular dynamics simulations and accurate free energy calculations, new drugs with potential activity toward two specific targets of the PQS system (PqsD and PqsR) were identified. After developing and validating the docking and virtual screening protocol, 221,146 molecules were screened against both targets. Subsequently, the top 25 candidates were selected for MD simulations and free energy calculations to validate the predictions and estimate binding free energies[1], [2].
We demonstrate that a multi-level computational approach can identify strong candidates for QS inhibition and provide solid structural information on the protein-ligand interactions. Moreover, using HPC resources provides the computational power necessary for processing vast amounts of data at unprecedented speeds, significantly reducing simulation times.
[1] T. F. Vieira, R. P. Magalhães, N. M. F. S. A. Cerqueira, M. Simões, and S. F. Sousa, “Targeting Pseudomonas aeruginosa MvfR in the battle against biofilm formation: a multi-level computational approach,” Mol Syst Des Eng, vol. 7, no. 10, pp. 1294–1306, 2022, doi: 10.1039/D2ME00088A.
[2] D. Lapaillerie et al., “In Silico, In Vitro and In Cellulo Models for Monitoring SARS-CoV-2 Spike/Human ACE2 Complex, Viral Entry and Cell Fusion,” Viruses, vol. 13, no. 3, p. 365, Feb. 2021, doi: 10.3390/v13030365.
This work was supported by national funds from Fundação para a Ciência e a Tecnologia (grant numbers: SFRH/BD/137844/2018, UIDP/04378/2020, UIDB/04378/2020 and 2020.01423.CEECIND/CP1596/CT0003). Some of the calculations were produced with the support of the INCD funded by the FCT and FEDER under project 01/SAICT/2016 number 022153 and projects CPCA/A00/7140/2020 and CPCA/A00/7145/2020.
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has led to a global health crisis, triggering an urgent need for effective therapeutic interventions to mitigate its impact. The virus primarily infects human cells by binding its spike protein (S-RDB) to the ACE2 receptor, making this interaction a key target for drug discovery. In response, this study aimed to identify novel compounds capable of blocking the S-RDB/ACE2 interaction to prevent viral entry into cells. This work utilized high-performance computing (HPC) to study the S/ACE2 complex and discover drugs that may target the S/ACE2 interface.
The interaction between spike and ACE2 was studied by performing molecular dynamics simulations with a length of 400 ns using the AMBER 21 software. The interfacial binding pocket was then defined using FPocket 2.0. Subsequently, a virtual screening protocol was employed. Using Autodock Vina and GOLD molecular docking software, 139,146 compounds were screened. These compounds belonged to different chemical libraries: the Chimiothèque Nationale, MuTaLig Virtual Chemotheca, and the Inhibitors of Protein-Protein Interactions Database.
From the results of the virtual screening experiment, 10 compounds were selected for experimental validation. Experimental validation, including binding assays such as AlphaLISA and Biolayer Interferometry, as well as cellular tests, confirmed the the effectiveness of two compounds in human lung cells. RT-qPCR and cytotoxicity assays demonstrated dose-dependent effects. Finally, the compounds also showed activity against SARS-CoV-2 variants.
This work underscores the importance of computational drug discovery, highlighting the critical role of HPC resources in accelerating the development of potential therapeutics for COVID-19.
The AI4EOSC and iMagine projects are closely related initiatives under the European Open Science Cloud (EOSC), both designed to support research communities in leveraging artificial intelligence (AI).
The AI4EOSC project is dedicated to provide researchers with easy access to a comprehensive range of AI development services and tools. It focuses on enabling the development and deployment of AI models, including federated learning, zero-touch deployment, MLOps and composite AI pipelines. The architecture of the platform is designed to be user-friendly and covers different needs by integrating different frameworks and technologies. It also enables seamless sharing and deployment of AI models across distributed resources. By offering real-world examples and active community involvement, including collaboration with the iMagine project, AI4EOSC showcases how researchers can effectively use the platform to advance their AI development, highlighting its comprehensive capabilities and impact on the scientific community.
The iMagine AI platform is specifically developed for researchers in the field of aquatic science and based on the AI4OS software stack developed in AI4EOSC. It provides AI-powered tools for image analysis that contribute to the understanding and conservation of aquatic ecosystems. The same as AI4EOSC platform, iMagine platform supports the entire machine learning lifecycle, from model development to deployment, using data from multiple sources. iMagine is driven by ten core use cases for image analysis available to researchers via Virtual Access, as well as additional new use cases developed through recent open calls. The iMagine Competence Center plays a key role in supporting model development and deployment for these applications.
Together, AI4EOSC and iMagine are showing the power of AI within EOSC. They support research in various disciplines by providing robust computational tools and fostering collaboration for scientific progress.
AI models require extensive computing power to perform scalable inferences on distributed computing platforms to cope with increased workloads. This contribution summarises the work done in the AI4EOSC and iMagine projects to support AI model inference execution with OSCAR and AI4Compose. AI4EOSC delivers and enhanced set of services to create and manage the lifecycle of AI models (develop, train, share, serve) targeting use cases in automated thermography, agrometeorology and integrated plant protection. In turn, iMagine provides imaging data and services for aquatic science, and leverages the platform created in AI4EOSC.
On the one hand, OSCAR provides the serverless computing platform to run AI model inference on elastic Kubernetes clusters deployed on Cloud infrastructures. Several execution modes are supported to tackle different use cases (synchronous executions to achieve fast AI model inference with pre-provisioned infrastructure, scalable asynchronous executions, to execute multiple inference jobs on auto-scaled Kubernetes clusters and exposed services, to leverage pre-loaded in-memory AI models). Two production OSCAR clusters have been deployed in distributed Cloud sites (INCD and Walton) to support the different use cases.
On the other hand, AI4Compose allows users to visually design AI inference pipelines on Node-RED or Elyra. UPV operates a FlowFuse instance for users to deploy their own Node-RED instances on which they can visually craft the AI inference pipelines. For that, custom nodes have been created to facilitate the execution of the model inference steps in distributed OSCAR clusters for enhanced scalability. The AI models are available in the AI4EOSC Dashboard. These nodes have been contributed to the Node-RED library for enhanced outreach. AI4Compose also supports Elyra to create these AI inference pipelines from a Jupyter Notebook environment. The support has been integrated in EGI Notebooks a popular managed Jupyter Notebook service, thus facilitating adoption of these techniques beyond the project realm.
Together, they provide the ability to craft custom AI inference pipelines using custom nodes from a visual canvas which have been created to facilitate the usage of pre-trained models with the AI4OS platform, the distributed AI platform powering both AI4EOSC and iMagine.
Grant PID2020-113126RB-I00 funded by MICIU/AEI/10.13039/501100011033.
This work was supported by the project AI4EOSC ‘‘Artificial Intelligence for the European Open Science Cloud’’ that has received funding from the European Union’s Horizon Europe Research and Innovation Programme under Grant 101058593.
This work was supported by the project iMagine ‘‘AI-based image data analysis tools for aquatic research’’ that has received funding from the European Union’s Horizon Europe Research and Innovation Programme under Grant 101058625.
Machine Learning (ML) is one of the most widely used technologies in the field of Artificial Intelligence (AI). As ML applications become increasingly ubiquitous, concerns about data privacy and security have also grown. The presentation is about applied ML landscape concerning the evolution of ML/DL from various aspects including data quality, data privacy awareness and federated learning. It also presents a relation to privacy-enhancing technologies (PETs), where the synergy between ML and PETs responds to the need for data privacy and data protection in AI application development and deployment in responsible AI direction.
Dataverse is an open source data repository solution being
increasingly adopted by research organizations and user
communities for data sharing and preservation. Datasets
stored in Dataverse are cataloged, described with metadata,
and can be easily shared and downloaded. In the context of
the development of a pilot catchall data repository for the
Portuguese research community we have studied performance,
availability and recovery aspects for such installation.
In this presentation we will focus on the performance
measurements we obtained with different kinds of storage
systems and the backup and recovery architecture which we
developed. We aim at shedding some light on storage and
backup solutions for Dataverse that can be also applied
to other systems.
This tutorial provides a comprehensive guide on efficiently and securely managing applications and services in a Federated Cloud environment. Users will learn how to:
- Retrieve and use access tokens without exposing them in plaintext
- Manage computing resources within the Federated Cloud
- Leverage cloud-native tools in a federated ecosystem
- Configure TLS access to services using Dynamic DNS
- Deliver secrets to applications and services securely
This demo will show the Galician Marine Sciences Program Data Lake in action, focusing on how it streamlines the collection, integration and analysis of diverse marine ecosystem datasets.
Participants will see how easy it is to use the Data Lake for various marine science use cases but this could be extrapolated also to other fields.
The demo will emphasize how the Data Lake enables researchers to process large-scale datasets efficiently—tasks that would be unmanageable on local systems—while showcasing its scalability and real-time processing capabilities through Apache Spark.
In this tutorial, attendees will have the opportunity to learn how to use a real quantum computer—the 32-qubit superconducting quantum computer deployed at CESGA.
Integrating classical and quantum computers is a significant challenge. We will show how we have achieved this integration, enabling seamless hybrid computing. Participants will discover how to easily submit quantum circuits using Python, facilitating efficient interaction between high-performance computing (HPC) and the quantum processing unit (QPU). This tutorial starts with a presentation of the QMIO Quantum Computer
The Horizon Europe interTwin project is developing a highly generic yet powerful Digital Twin Engine (DTE) to support interdisciplinary Digital Twins (DT). Comprising thirty-one high-profile scientific partner institutions, the project brings together infrastructure providers, technology providers, and DT use cases from Climate Research and Environmental Monitoring, High Energy and AstroParticle Physics, and Radio Astronomy. This group of experts enables the co-design of the DTE Blueprint Architecture and the prototype platform benefiting end users like scientists and policymakers but also DT developers. It achieves this by significantly simplifying the process of creating and managing complex Digital Twins workflows.
As part of our contribution, we'll share the latest updates on our project, including the DTE Blueprint Architecture, whose latest version will be under finalisation in Q4/2024. The contribution will also cover the status of the DT use cases we currently support and describe the software components of the DTE focusing on the activities of the Spanish and Portuguese partners in the project.
InterTwin co-designs and implements a prototype of an interdisciplinary Digital Twin Engine (DTE) - an open source platform based on open standards that offers the capability to integrate with application-specific Digital Twins (DTs).
While there are many components that are part of the DTE, this contribution focuses on OSCAR and DCNiOS and how they are being used in InterTwin to support the creation of DTs.
First, OSCAR is an open-source serverless framework that provides event-driven computing on scalable Kubernetes clusters. It supports the ability to run data processing containers in response to file uploads to an object store, as is the case of MinIO, a high-performance object storage that is deployed as part of an OSCAR cluster. InterTwin’s data management layer involves the usage of dCache, a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods. To this aim, we integrated the ability to react to file upload events into dCache to trigger data processing inside the OSCAR cluster. For this, we created DCNiOS, a Data Connector between Apache NiFi and OSCAR, which facilitates the integration between the systems. The usage of NiFi allows us to buffer the data-processing requests coming from dCache to cope with the potentially different rates between data producer and data consumer. We integrated DCNiOS support for Apache Kafka as well to support Pub/Sub mechanisms for triggering the data processing.
To take advantage of hardware accelerators available in HPC facilities, we performed the integration between OSCAR and the INFN’s interLink development, which provides a gateway to offload the execution of Kubernetes jobs into an HPC cluster. This allows to create an event-driven elastic computing platform which can offload the execution into HPC facilities, as was done for the HPC Vega supercomputer at IZUM, Slovenia.
In addition, the integration of itwinai, interTwin’s AI platform for advanced AI/ML workflows in digital twin applications, with OSCAR, paved the way for an integrative approach to support general purpose event-driven computing with automated workload offloading, thus bridging Cloud and HPC facilities.
Finally, the ability to run generic services exposed in OSCAR clusters allows us to run Jupyter Notebooks that expose MinIO’s storage system in the Notebook sandbox, thus facilitating data ingestion, processing, and visualization within the same platform.
This contribution will summarise the experiences in this area and lessons learnt during this process, as well as its applications to existing use cases in InterTwin.
This work was partially supported by the project “An interdisciplinary Digital Twin Engine for science’’ (interTwin) that has received funding from the European Union’s Horizon Europe Programme under Grant 101058386.
The Horizon Europe interTwin project is developing a highly generic Digital Twin Engine (DTE) to support interdisciplinary Digital Twins(DT). The project brings together infrastructure providers, technology providers and DT use cases from High Energy and AstroParticle Physics, Radio Astronomy, Climate Research and Environmental Monitoring. This group of experts enables the co-design of both the DTE Blueprint Architecture and the prototype platform; not only benefiting end users like scientists and policymakers but also DT developers. It achieves this by significantly simplifying the process of creating and managing complex Digital Twins workflows.
In our presentation, we'll focus on the design and implementation of the interTwin DataLake, which is based on the ESCAPE Data Lake concept and the extensions and integrations done and ongoing in the project to accommodate from one side the heterogeneous resource providers ranging from HTC, HPC to Cloud and the Requirements from the User communities.
The DataLake services have been integrated with EGI Check-in, including Rucio, FTS and storage technologies deployed at the sites already part of the testbed ( VEGA EuroHPC, EODC, DESY, INFN), and under integration (PSNC, Julich, KBFI)
New developments have been performed to ease the integration at sites. One of the new developments is Teapot, an application that provides multi-tenant WebDAV support. Teapot is built on StoRM-WebDAV and includes a manager that accepts requests, authenticates users, maps them to local usernames, and launches a dedicated StoRM-WebDAV server where the manager then forwards the request to.
And finally the DataLake is also available for exploitation via JupyterHub thanks to the Rucio jupyterLab Plugin developed by CERN in the ESCAPE project and further enhanced in interTwin.
Details of integration by some of the DTs and upper architecture layers will also be presented.
In this presentation we will outline the role of the SQAaaS platform as the architectural building block for quality assurance (QA) within two ongoing EC-funded projects that are prototyping Digital Twins in diverse scientific domains: DT-GEO and Intertwin.
The individual requirements of each project have shaped the SQAaaS platform to be a flexible engine that is able to evaluate both the individual digital objects produced and consumed by the Digital Twin, i.e. datasets and software, as well as the workflows that control its operation.
Consequently, the SQAaaS platform is being extended to integrate with an increasing number of platforms and tools used by the use cases in the framework of these projects. These range from code hosting platforms (e.g. GitHub and GitLab), (meta)data repositories and catalogs (e.g. EPOS data portal), to tools and standards for workflow management (e.g. CWL, PyCOMPs, PyOphidia). We will describe the integration efforts done to guarantee that the SQAaaS successfully reacts, and thus the quality checks could be triggered, to varying circumstances such as (i) changes in the workflow code hosted in GitHub, (ii) CRUD operations in (meta)data hosted at (meta)data repositories, and (iii) QA checks embedded in the workflow execution.
The range of QA checks in the SQAaaS platform provide both comprehensive and on-demand assessments. On the one hand, generic QA checks include FAIR evaluation of data, and software QA characteristics verification and validation of code and services. More are expected in terms of FAIR-related checks for software, and data quality evaluation. On the other hand, on-demand checks are those currently done for the workflows, including validation of workflow specification and execution. Here as well, new capabilities are being considered such as data provenance-related checks.
In-place and near future features of the SQAaaS platform, resulting from the 2-year of work within DT-GEO and Interwin projects, are shaping the platform as a key building block in the engine of interdisciplinary Digital Twins by providing a valuable range of QA-related checks that contribute to their successful operation.
This presentation provides an overview of the architecture and implementation of the new artefacts repositories for EGI.
The EGI repositories are developed, maintained and operated by LIP and IFCA/CSIC. The new repositories will host RPMs for (RHEL and compatible distributions), DEBs (for Ubuntu and compatible distributions) and Docker images for container-based services and micro-services.
The presentation will describe the architecture of the new repositories and their components and capabilities.
DESY, one of Europe's leading synchrotron facilities, is active in various scientific fields, including High Energy and Astro particle Physics, Dark matter research, Physics with Photons, and Structural Biology. These fields generate large amounts of data, which are managed according to specific policies that respect restrictions on ownership, licenses, and embargo periods. Currently there is a move to make this data publicly available, as requested by funding agencies and scientific journals. To facilitate this, DESY IT is developing solutions to publish Open Data sets, making them easily discoverable, accessible, and reusable for the wider scientific community, particularly those not supported by large e-infrastructures.
In line with Open and FAIR data principles, DESY's Open Data solution will provide a metadata catalog to enhance discoverability. Access will be granted through federated user accounts via eduGAIN, HelmholtzID, NFDI, and later EOSC-AAI, enabling community members to access data using their institutional accounts. Interoperability will be ensured by using commonly accepted data formats such as HDF5, specifically NeXuS, openPMD and ORSO. Providing technical and scientific metadata will make the Open Data sets reusable for further analysis and research. The blueprint for DESY's Open Data solution will be shared through HIFIS and the wider community upon successful evaluation.
The initial prototype will consist of three interconnected solutions: the metadata catalog SciCat, the storage system dCache, and the VISA (Virtual Infrastructure for Scientific Analysis) portal. Scientific data, along with its metadata, will be stored in a specific directory on dCache and ingested into SciCat which provides direct access and download options. To ensure harmonization of scientific metadata among similar experiments, experiment-specific metadata schemata will be created for metadata validation before ingestion. A subset of technical and scientific metadata will be integrated into the VISA portal, allowing scientists to access datasets within it. The VISA portal allows to create computing environments with pre-installed analysis tools and mounted datasets, providing easy access to data and tools.
During the presentation, the system's architecture, its components, and their interactions will be discussed, focusing on the harmonization of metadata schemata and the roadmap for developing tooling and processes for ingestion and validation of metadata.
In the current era of Big Data data management practices are an increasingly important consideration when doing scientific research. The Scientific community's aspiration for FAIR data depends on good data management practices and policies, and interTwin's DataLake has been designed with these goals in mind. I will present the status of an application of the DataLake to the particular field of Lattice QCD simulations within physics. I will discuss how the DataLake is being co-developed with the lattice use case, focusing on how the lattice community's data requirements have influenced its development. I will talk about the integration of the DataLake with the external ILDG metadata and file catalogues. I will end by discussing the potential ways in which the DataLake concept could change and improve how lattice collaborations do their research in future.
Effective water resources management depends on accurate river flow forecasts, which affect hydroelectric power generation, flood control, and agriculture, among other sectors. Achieving consistent projections is challenging, though, because the complex characteristics defining the river flow—which are influenced by factors including precipitation, reservoir management, and changes in land use—are tough. Given the increasing frequency of extreme weather events linked to climate change, our research emphasizes the importance of applying deep learning in river flow forecasting for operational decision-making and public safety. This has a significant impact on public safety.
Deep learning algorithms are particularly well-suited for river flow prediction because they attract more attention in the application of time series forecasting. These models are good at identifying complex patterns across enormous amounts of data. This work evaluates various models multilayer perceptrons (MLPs), support vector machines (SVMs), recurrent neural networks (RNNs), long short-term memory (LSTM), gated recurrent units (GRUs), and hybrid models. We evaluated each model based on its potential to improve river flow forecasting accuracy. MLP emerged as one of the best models.
Using data from Portugal's National Water Resources Information System (SNIRH), our analysis currently emphasizes the Tejo River watershed from October 1, 1984, until September 26, 2023. The used dataset comprises daily data on river discharge, water levels, and precipitation. In terms of model construction, given that we used a supervised learning approach, we prepared different training, testing, and validation datasets. We followed typical deep learning model construction steps, which include data preprocessing, training and validation, and application deployment. The data preprocessing step includes feature selection, missing data management, resampling, and time series data conversion into a supervised learning form. Additionally, we combine common periods and interpolate terms, selecting common periods with a maximum of 10 missing values to ensure data consistency. The training phase ensures that the models train on past data, and we apply grid search to refine the hyperparameters. We use the root mean squared error (RMSE) loss function and the Adam optimizer throughout the training process. The deployment or forecasting process consists of generating projections for the next three days, and, for the time being, we verify these forecasts with real data when they are obtained.
The models created with the SNIRH dataset demonstrate promising results. Through comparison with similar approaches, we can conclude that the models clearly capture temporal dependencies and generate valid river flow predictions according to the performance metrics. The findings show that deep learning methods provide important new perspectives for water resource management and decision-making since they can increase the accuracy and dependability of river flow estimates under emergency conditions.
Future activities will include assessing the performance of these models in related basins and developing a customized tool that enables hydrologists to build AI-driven models for variable construction within the SNIRH dataset.
Workflow languages have become indispensable for defining reproducible and scalable data analysis pipelines across various scientific domains, including bioinformatics, medical imaging, astronomy, high-energy physics, and machine learning. In recent years, languages such as the Common Workflow Language (CWL), Workflow Description Language (WDL), and Nextflow have gained significant traction alongside established solutions like Snakemake and Galaxy workflows.
Despite these advancements, resource allocation and monitoring in cloud environments remain significant challenges. Scientific tools often utilize assigned resources irregularly, leading to inefficiencies. Each analytical task specifies its required resources—such as CPUs, memory, and disk space—but selecting appropriate values is critical to ensure sufficient resources without over-provisioning.
To address these issues, the Cloud Monitoring Kit (CMK) was designed as a flexible, event-driven architecture, to generate uniform, aggregated metrics from containerized workflow tasks originating from different workflow management systems in a cloud environment. CMK offers essential insights through intuitive dashboards that display individual and aggregated metrics relevant to job performance. Developers can leverage CMK to monitor resource consumption and adjust system configurations during development or tool integration, enhancing efficiency and performance. Operations staff benefit from continuous performance monitoring and troubleshooting capabilities, crucial for maintaining system reliability. Scientists gain a robust analytical platform to scrutinize data, facilitating informed decisions regarding system configurations. The adaptability of CMK makes it particularly valuable where precise resource management and systematic optimization are essential.
In this contribution, we discuss our experiences implementing CMK in an industrial AWS cloud environment for processing bioinformatics data. We summarize the lessons learned during this process, highlighting the benefits and limitations of using CMK in a real-world setting. Furthermore, we explore how the data collected by CMK can be utilized to improve and optimize resource usage by informing task resource assignments. By closing the loop between monitoring and resource allocation, it is possible to assign informed values to task resources, reducing inefficiencies caused by over-provisioning or underutilization. The implementation of the CMK architecture for AWS Batch is available at https://github.com/biobam/cmk
GM and RN would like to thank Grant PID2020-113126RB-I00 funded by MICIU/AEI/10.13039/501100011033.
Citizen Science (CS) is evolving rapidly, driven by digital technologies that make data collection and public participation more accessible. However, these technologies also introduce challenges such as complexity and fragmentation. Many projects addressing similar research questions use inconsistent methodologies, making it difficult to compare and integrate results. Fragmentation is worsened by budget constraints, limiting data management to individual projects and reducing the overall effectiveness of CS.
To address these issues, the European Commission has launched RIECS-Concept (Towards a Pan-European Research Infrastructure for Excellent Citizen Science 2025–2027). The initiative aims to establish a unified research infrastructure that can be integrated into the European Strategy Forum on Research Infrastructures (ESFRI). RIECS will provide services and resources to bridge gaps between CS communities, fostering cross-disciplinary collaboration and improving data quality. Co-designed over the next decade, this infrastructure seeks to prevent further fragmentation.
RIECS-Concept is led by Ibercivis and co-coordinated with ECSA, bringing together a consortium of 12 partners, including key organizations such as Citizen Science Global Partnership, IIASA, and CSIC. The project will last three years and aims to build a strong foundation for the future of Citizen Science in Europe.
RIECS-Concept has three key objectives. The first is to assess the feasibility of developing this infrastructure by addressing current challenges and opportunities in CS. This includes creating an open inventory of technological components—services and resources—that will form the foundation of RIECS. The catalogue will then be refined and interconnected to address both technological and scientific challenges, linking with other Research Infrastructures like EOSC. Additionally, a unified model for integrating data and metadata from diverse sources will be developed to ensure consistency and interoperability. The focus will initially be on three domains: environmental observations, health, and climate change.
The second objective is to create a strategic roadmap for the infrastructure's future lifecycle. This roadmap will provide actionable steps for decision-makers, focusing on governance, sustainability, and long-term viability. It will be co-designed with stakeholders, including funding agencies and end-users, to ensure the infrastructure meets their needs.
The third objective is to promote an open, participatory approach to governance. RIECS-Concept will engage diverse stakeholders from science, technology, policy, and society in the co-design and roadmapping processes, ensuring the infrastructure is shaped by those who will use it and serves the broader CS community.
In conclusion, RIECS-Concept represents a critical step towards creating a unified Citizen Science infrastructure in Europe and globally. By addressing fragmentation and ensuring long-term sustainability, the project will foster cross-disciplinary collaboration, improve data quality, and help CS projects achieve meaningful societal and scientific outcomes.
Biomolecular simulations have long been an important part of the drug discovery and development process, with techniques such as docking, virtual screening, molecular dynamics and quantum mechanics being routinely used in the study of the interaction and selection of small molecular drugs with their target proteins or enzymes.
More recently, the application of these techniques in aptamer selection and aptamer engineering has algo become a reality. Such methods can help to understand aptamer-target interaction and to rationally introduce modifications in selected aptamers to modulate their affinity, specificity or ability to carry other molecules.
Here, we present a computational protocol developed by us for the selection of specific aptamers for protein recognition and for an atomic-level understanding of target-aptamer interaction. The protocol takes advantage of HPC resources and GPUs and combines protein-DNA/RNA docking, atomistic molecular dynamics simulations and free energy calculations, including the conformation variability of the protein and aptamer in the selection process.
This is illustrated with the identification and experimental confirmation of a novel aptamer for Cathepsin B, a predictive prostate cancer biomarker [1], and by atomic level clarification of the mode of action of an aptamer-RNA conjugate that targets the human transferrin receptor [2].
Acknowledgements: This work received financial support from FCT/MCTES (UIDB/50006/2020 DOI 10.54499/UIDB/50006/2020) through national funds. This work received support and help from FCT/MCTES (LA/P/0008/2020 DOI 10.54499/LA/P/0008/2020, UIDP/50006/2020 DOI 10.54499/UIDP/50006/2020 and UIDB/50006/2020 DOI 10.54499/UIDB/50006/2020), through national funds. The author acknowledges FCT by funding 2020.01423.CEECIND/CP1596/CT0003. Calculations were performed with the support of INCD funded by FCT projects 01/SAICT/2016 number 022153, and projects CPCA/A00/7140/2020, and CPCA/A00/7145/2020.
References:
[1] Pereira, AC et al. - Identification of novel aptamers targeting cathepsin B-overexpressing prostate cancer cells - Molecular Systems Design & Engineering (2022) DOI: 10.1039/D2ME00022A
[2] Vasconcelos et al. -In silico analysis of aptamer-RNA conjugate interactions with human transferrin receptor - Biophysical Chemistry 314 (2024, DOI: 10.1016/j.bpc.2024.107308
The Infrastructure Manager (IM) is an open-source production-ready (TRL 8) service used for the dynamic deployment of customized virtual infrastructures across multiple Cloud back-ends. It has evolved in the last decade through several European projects to support the needs of multiple scientific communities. It features a CLI, a REST API and a web-based graphical user interface (GUI), called IM Dashboard that provides users with a set of customizable curated templates to deploy popular software (e.g. JupyterHub on top of a Kubernetes cluster). Users can use the IM to facilitate the deployment of these templates, which follow the TOSCA standard, on whichever Cloud they have access to. This allows easier reproducibility of computational environments and rapid deployment on multiple Cloud platforms such as AWS, Azure, Google Cloud Platform, OpenStack, etc.
The IM has been used in production in the EGI Federated Cloud, one of the largest distributed Cloud infrastructures in Europe, supporting deployment of popular execution environments for scientific users, ranging from data-processing SLURM-based clusters to big data Hadoop-based clusters and customizable elastic Kubernetes clusters. It supports the deployment of Virtual Machines, containers on Kubernetes clusters and functions on AWS Lambda and OSCAR, an open-source serverless platform, to support deployment of infrastructures along the computing continuum.
This contribution summarizes how the IM is being adopted by the current active projects to showcase its functionality. For example, in AI4EOSC, it supports the automated deployment of the Nomad clusters used for training and the OSCAR clusters used for inference of pre-trained AI models. In InterTwin, a rich set of TOSCA templates have been produced to deploy the required software stacks to support the activities of developing a Digital Twin Engine. This is the case of Apache Nifi, KubeFlow, Kafka, AirFlow, MLFlow, Horovod, STAC, etc. In DT-GEO, the IM deploys elastic virtual clusters which mimic the software configuration employed in real HPC (High Performance Computing) clusters so that users get trained in the virtual cluster instead of wasting precious computation time in the actual HPC facilities. In EOSC-Beyond, IM is used as the Deployment Service of the Execution Framework a new EOSC Core service that is technically compatible with the EOSC EU Node.
This work was partially supported by the project AI4EOSC (Grant 101058593), interTwin (Grant 101058386), DT-GEO (101058129) and EOSC-Beyond (101131875). Also, Grant PID2020-113126RB-I00 funded by MICIU/AEI/10.13039/501100011033.
A new INCD data centre has recently become operational located at UTAD (Universidade Trás os Montes e Alto Douro in Vila Real). It offers Cloud and HPC computing, as well as Data management services and repositories.
In this work we describe the architecture, deployment and configuration of the cloud IaaS infrastructure based on Openstack. We describe as well, the Ceph storage architecture and deployment that support the underlying block storage for the Openstack and Object storage. A Minio object storage solution was deployed as well.
The architecture and deployment of UTAD’s cloud infrastructure is being replicated for the upgrade of the Lisbon INCD infrastructure.
The demand for computing resources is growing everyday making the capability of expanding capacity to address user needs increasingly important. For research organisations, the Open Clouds for Research Environment (OCRE) provides an opportunity to exploit the extension of existing computing and platform resources to commercial providers under better conditions. This reality brings a challenge on how to go beyond the datacenter frontier and integrate additional resources from commercial providers to complement the research infrastructures. In this presentation, we will describe a Kubernetes based approach to move applications and expand capacity to any provider with almost zero downtime, minimising data storage and transfer costs, and ensuring customer protection. Kubernetes is supported by most cloud providers and has become a popular solution to manage services and applications over existing infrastructures both public and private. Therefore, Kubernetes can be used as a versatile layer to integrate resources across infrastructures. Ideally, the use of commercial resources should remain transparent to the end-users as if all these resources and applications are inside the research infrastructure. Our answer to this issue is a gateway that enables all deployed services to remain under the control of the research infrastructure. This approach merges a full-featured gateway doing reverse proxy for several protocols and a Domain Name System (DNS) Chain for managing the required resource records.
The new INFN datacenter.
In this presentation we will discuss our experience of migrating the
INCD Helpdesk ticketing system from Request Tracker (RT) to Zammad. We
will highlight the most relevant INCD requirements and Zammad features
that led to the choice of this platform. We will also describe the
required steps that were taken during the migration process, including
data extraction from RT and the import into Zammad using the APIs of
both platforms. To perform the migration of the tickets we took
existing open source code that was significantly enhanced. These
developments ensured the accurate migration of ~2500 tickets from
different queues of RT, including attachments, user information, and
preserving the integrity of each ticket history. We will also show the
structure of our current implementation.
INCD (www.incd.pt) provides computing and data services to the Portuguese scientific and academic community for research and innovation in all domains. The infrastructure is oriented to provide scientific computing and data oriented services, supporting researchers and their participation in national and international projects.
INCD operates an integrated infrastructure with services being provided from several geographic locations, interconnected by a state-of-the-art data network. The INCD services are integrated in international infrastructures with which it shares computing resources for the benefit of projects of national and international relevance. In this context, INCD participates in the European Grid Infrastructure (EGI), Iberian computing infrastructure (IBERGRID), the Worldwide LHC Computing Grid (WLCG), and the European Open Science Cloud and Portuguese advanced Computing Network (RNCA).
This presentation will provide an overview of the INCD infrastructure current status and evolution.