IBERGRID 2018: Towards the European Open Science Cloud - EOSC
Dates: 11th - 12th October.
Local host: LIP
Venue: ISCTE - University Institute of Lisbon (ISCTE-IUL)
The 9th Iberian Grid Conference will take place in Lisbon from Thursday 11th October to Friday 12th October. The conference will be held at the University Institute of Lisbon (ISCTE-IUL).
Following the spirit of the IBERGRID series of conferences, the 2018 edition is a timely opportunity for networking and fostering cooperative projects at the Iberian level, with a focus on the upcoming European Open Science Cloud (EOSC): challenges and opportunities for the Iberian region.
IBERGRID 2018 will run partially in parallel with Digital Infrastructures for Research - DI4R that takes place from 9th to 11th of October, hosted as well by LIP, and jointly organized by EOSC-Hub, GEANT and PRACE.
The list of topics of interest is:
Research communities at the Iberian level: examples of collaboration.
ESFRI projects in the Iberian area: LIFEWATCH, EMSO,SKA,...
Development of innovative software services oriented to the EOSC.
R&D for computing services, networking, and data-driven science at the Iberian level.
The event will consist of plenary, thematically oriented presentations by ESFRI / International Research communities in the landscape of the Iberian cooperation level, and key technology and infrastructure providers in the framework of the project EOSC-hub. The thematic presentations will be followed by a number of “lightning talks” by the participants.
In the last decades an exponential increase of the scientific and technical development in all the areas of science has become manifest, being more and more relevant the conflicts between the pure scientific advance of the society and the property of the researched knowledge. Particularly the brake that certain aspects of the established system suppose in the acquisition of some findings. The ever demanding scientific community aims for new platforms and services up to date in order to diminish the inconveniences that have arisen due this same growth. The guidelines provided by the Open Science (OS), despite being a concept already discuss for some time, set out the perfect framework for the creation of new applications and platforms in which the research centers might take over from the responsibility exercised so far by the publishers to allocate resources for disclosure and distribution without replace them. In particular Open Access (OA), Open Data (OD) and Open Methodologies (OM) are the guidelines that can fit the most in the current tasks performed by supercomputing and research centers where availability, reliability and security are concepts well implemented already and can be very powerful skills to them in order to step in the spotlight in this new framework which is OS. In the other hand, while universities and publishers can be tempted by recognition and self promotion at the time of deciding whether a job can benefit them or not, the public supercomputing centers, that are already providing services for both private and public projects, working in the same direction and as a set, could be considered more objectives. In this paper it is proposed a path to follow by the supercomputing centers, within the framework of European Open Science Cloud (EOSC), to share resources with the aim of providing an adequate infrastructure for the development of scientific research, adding to their current competences the ability to become neutral ground for scientific disclosure.
Large-scale analysis of medical images using biomarkers requires an infrastructure which commonly exceeds the resources available for research groups. Besides, some biomarkers can benefit from specific hardware accelerators. Additionally, medical data analysis may require using only certified environments in specific countries, due to legal constraints. Cloud platforms enable medical institutions to use, paying by utilisation, several services like powerful machines, specific hardware and the guarantee of the execution in certified environments. This work describes the designed architecture for large-scale medical images analysis using biomarkers in Cloud platforms. Docker containers provide the developers with a way to encapsulate and deliver their applications and its dependencies for convenient distribution, so the biomarkers are encapsulated into Docker containers. The architecture involves all process of biomarker distribution pipeline: from updating the biomarker in the code repository, building the Docker image of the biomarker and executing it on a Cloud infrastructure. This infrastructure includes dynamic horizontal elasticity according to the jobs queue. Moreover, the infrastructure uses a large-scale distributed storage for accessing the data to be analysed.
We will undertake the implementation of a genome annotation platform that will provide community access to tools and data-flows for marine genome annotation. The platform is designed to address the fragmented research landscape for genome annotation of marine organisms. We propose a portal to marine genomic resources and a community driven annotation platform for marine eukaryotes which would provide a focus for post-assembly genomic workflows and data access and complement access services such as EMBRIC Configurator, ELIXIR ontologies, and meta-data standards. Together these resources would expose the workflow from genome data collection to publication using open access and FAIR compliant standards and procedures. Although taxon agnostic, initially the platform will focus on pelagic fishes (the closely related Sardinha and Alosa) and the use primarily of comparative methods of gene prediction and validation.
Estuaries and coastal zones are among the most productive ecosystems on Earth, supporting
many human activities and providing multiple ecosystem services. The ability to simulate and
forecast the dynamics of estuarine and coastal zones is thus essential to support the sustainable
management of these regions, both for daily activities and for long-term strategies associated
with climate change.
Computational forecast systems are an important asset to address these concerns by providing
predictions of relevant variables, through the integration of numerical models and field data.
The reliability of the forecast predictions depends however on the accuracy of the models
behind them. Unstructured grid numerical models have been used for several decades to
simulate coastal zones at LNEC to address the need for adequate spatial and temporal
discretizations. For the past decade these models have been integrated in LNEC’s forecast
platform WIFF (WIFF - Water Information Forecast Framework) to predict water circulation
and water quality in coastal zones, taking advantage of the resources of the Portuguese National
Computational Infrastructure (INCD).
This communication summarizes and evaluates our experience of running forecast systems in
the INCD. The applications range from the inundation of estuarine margins to oil spill and water
contamination predictions. End-users include the Civil Protection agency, port authorities and
wastewater utilities. The major focus will be performance issues (comparing grid and cloud
resources), service level performance and user experience.
Scipion is an image processing framework used to obtain 3D maps of macromolecular complexes on Cryo Electron Microscopy. It has emerged as the solution offered by the Instruct Image Processing Center (I2PC), hosted by CNB-CSIC, to European scientists accessing the European Research Infrastructure for Structural Biology (Instruct).
Cryo-EM processing is very demanding in terms of computing resources requiring powerful servers and since recently the use of GPUs. Common desktop machines are clearly insufficient in computing capability and storage which could a problem for many scientists that might not have access to big servers or GPUs.
Cloud IaaS (Infrastructure as a Service) is a new form of accessing computing and storage resources on demand. To effectively use cloud infrastructures ScipionCloud was developed, resulting in a full installation of Scipion both in public and private clouds, accessible as public ‘‘images”, that include all needed cryoEM software and just requires a Web browser to work as if it was a local desktop. These images are available in the EGI Applications Database and in AWS public AMIs catalogue.
We present here a new service for Instruct users that would allow them to process the data acquired at any of the high end Instruct Facilities -focusing this initial work in own cryo EM Facility at the I2PC- on a virtual machine in one of the IberGRID sites. In this first scenario we are now presenting, the machine itself is setup by I2PC staff, but as we advance in our development we envision the opening of a web portal accessible to I2PC users to do that.
SKA is an international project, qualified as ESFRI Landmark Project, to build the largest and most sensitive radio telescope ever conceived, with the potential to achieve fundamental advances in Astrophysics, Physics and Astrobiology. Since 2011 the IAA-CSIC coordinates the Spanish participation in the SKA, closely collaborating with Portugal in SKA related activities during this time. Spain has recently become the eleventh Member of the SKA Organisation thus ensuring the participation of Spanish groups in the scientific exploitation of SKA data and in the construction of the telescope.
The SKA will also be the greatest data research public project, once complete. It will be composed of thousands of antennas distributed over distances of up to 3000 km, on both Africa and Australia and it will generate a copious data flux (around 1TB/s) that will turn the task of extracting scientifically relevant information into a scientific and technological Big Data challenge. The SKA Science Data Processor (SDP), will transform this flux of raw data into calibrated data products that will be delivered, at an average rate of about 150PB/year, to worldwide distributed data centres –called SKA Regional Centres (SRCs)- that not only will provide access to the SKA data but also to the analysis tools and processing power. The SRCs will have hence a key role in the exploitation of SKA data and the achievement of the SKA scientific goals.
An Alliance of SKA Regional Centres (SRCs) is being designed to address the challenge of scientifically exploiting the SKA data deluge. IAA-CSIC participates in different initiatives addressing this task, highlighting AENEAS, an H2020 on-going project. Its objective is to design a distributed and federated European SRC considering the existing services offered by European e-Infrastructures. AENEAS consortium includes a Portuguese partner, the Instituto de Telecomunicaçoes, thus being a framework to promote the Iberian collaboration in the context of the SRCs.
In addition, IAA-CSIC coordinates SKA-Link, a project that complements AENEAS efforts by studying how SRCs will face the challenge of supporting Open and Reproducible Science.
In this talk, we will present the contribution of IAA-CSIC to the AENEAS project and other activities related to the SRCs, including SKA-Link as well as our work studying how Distributed Computing Infrastructures (Ibercloud, EGI Federated Cloud and Amazon Web Services among others) fulfil requirements of a pipeline for calibrating data from LOFAR, one of the SKA pathfinders.
Complex simulations that require large amounts of computational resources have typically run in dedicated supercomputers. However some parts of these simulations don´t perform well in these computers or don´t need these highly costs resources and can be executed on cheaper hardware. Moving some parts of these simulations out of the supercomputers and running them in smaller clusters or cloud resources can improve the time to results and reduce the costs of the simulations, providing also higher flexibility and ease of usage. Both HPC and Cloud resources benefit and empower users who need to perform complex simulations that normally only take advantage of the capabilities of one of these infrastructures. We propose the combined usage of these platforms by using an orchestrator to coordinate the exploitation of these systems and container technology to enable interoperability between them. Such solution provides simulations as a service in a transparent way for end-users and software developers, as well as improves the efficiency in HPC resources usage. It has been proven to work with different HPC and Cloud providers, including EOSC Hub.
This presentation will outline the services currently in production, and also under development, oriented to support the access of researchers in Spain to the Digital Infrastructures of the EOSC era.The purpose of the presentation is also collecting feedback from the participants.
Deep Learning is a powerful tool for science, industry and other sectors that benefits from large datasets and computing capacity during models’ design and training phases. TensorFlow (TF) Google’s Machine Learning API is one of the tools most widely used for developing and training such deep learning models. There is a wide range of possibilities to configure a deep learning model however find the optimal model architecture can be a highly demanding computing task. Moreover, when involved datasets are very large the computing requirements increase and training processes can take a lot of time and hinder the design cycle. One of the most powerful capabilities of TF is its distributed computing capabilities, allowing portions of the automatic generated graph to be calculated on different computing nodes, and speeding up the training process. Deployment of distributed TF is not a straightforward task and it presents several issues, mainly related with its use under the control of local resources management systems and the usage of the right resources. In order to allow CESGA users to adapt their own TF codes to take advantage of TF and Finis Terrae II distributed computing capabilities, a complete python Toolkit has been developed. This Toolkit deals with several tasks that are not relevant in the models design, but necessary for exploiting the distributed capabilities, hiding the underlying complexity to final users. Additionally, an example of a successful industrial case, based on the Fortissimo 2 project experiment “Cyber-Physical Laser Metal Deposition (CyPLAM)”, that uses this Toolkit, is presented. Thanks to the TF distributed capabilities, the computing capability of Finis Terrae and the use of the developed Toolkit, the time needed for training the largest model of this industrial case has been decreased from 8 hours (non- distributed TF) to less than 20 minutes.
Trust is a central issue confronting men and women in contemporary society. In fact, the most difficult thing to achieve in this world is trust. It can take years to win and only a matter of seconds to lose it. This is also applicable in a computing environment, where users need to trust computing services to process and manage their data. This implies a broad spectrum of properties to be accomplished, such as Security, Privacy, Coherence, Isolation, Stability, Fairness, Transparency and Dependability.
Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-assuring Hybrid, Ecosystem for REsilient Cloud Computing1 (hereinafter “ATMOSPHERE”) is a European-Brazilian collaboration project aiming at measuring and improving the different trustworthiness dimensions of data analytics applications running on the cloud. To achieve trustworthy cloud computing services on a federated environment, ATMOSPHERE focuses on providing four components: i) a dynamically reconfigurable hybrid federated VM and container platform, to provide isolation, high-availability, Quality of Service (QoS) and flexibility; ii) Trustworthy Distributed Data Management services that maximise privacy when accessing and processing sensitive data; iii) Trustworthy Distributed Data Processing services to build up and deploy adaptive applications for Data Analytics, providing high-level trustworthiness metrics for computing fairness and explainability properties; and iv) a Trustworthy monitoring and assessment platform, to compute trustworthiness measures from the metrics provided by the different layers.
In this lightning session, we will focus our discussion in the integration of the federated cloud platform with the Trustworthy monitoring and assessment platform, in order to provide isolation, stability and Quality of Service performance guarantees. The cloud platform will enable the dynamic reconfiguration of resource allocation to applications running on federated networks on an intercontinental shared pool, while the trustworthiness monitoring and assessment platform will provide quantitative scores regarding the trustworthiness of an application running on the ATMOSPHERE ecosystem.
1 - ATMOSPHERE official website: www.atmosphere-eubrazil.eu