28–30 Oct 2024
Porto
Europe/Lisbon timezone

Open Data for DESY, HIFIS, NFDI and EOSC

29 Oct 2024, 16:10
20m
Auditório (Centro de Investigação Médica (CIM-FMUP))

Auditório

Centro de Investigação Médica (CIM-FMUP)

Presentation (15' + 5' for questions) Enabling and fostering Open Science adoption IBERGRID

Speaker

Tim Wetzel (DESY IT (Research and Innovation in Scientific Computing))

Description

DESY, one of Europe's leading synchrotron facilities, is active in various scientific fields, including High Energy and Astro particle Physics, Dark matter research, Physics with Photons, and Structural Biology. These fields generate large amounts of data, which are managed according to specific policies that respect restrictions on ownership, licenses, and embargo periods. Currently there is a move to make this data publicly available, as requested by funding agencies and scientific journals. To facilitate this, DESY IT is developing solutions to publish Open Data sets, making them easily discoverable, accessible, and reusable for the wider scientific community, particularly those not supported by large e-infrastructures.

In line with Open and FAIR data principles, DESY's Open Data solution will provide a metadata catalog to enhance discoverability. Access will be granted through federated user accounts via eduGAIN, HelmholtzID, NFDI, and later EOSC-AAI, enabling community members to access data using their institutional accounts. Interoperability will be ensured by using commonly accepted data formats such as HDF5, specifically NeXuS, openPMD and ORSO. Providing technical and scientific metadata will make the Open Data sets reusable for further analysis and research. The blueprint for DESY's Open Data solution will be shared through HIFIS and the wider community upon successful evaluation.

The initial prototype will consist of three interconnected solutions: the metadata catalog SciCat, the storage system dCache, and the VISA (Virtual Infrastructure for Scientific Analysis) portal. Scientific data, along with its metadata, will be stored in a specific directory on dCache and ingested into SciCat which provides direct access and download options. To ensure harmonization of scientific metadata among similar experiments, experiment-specific metadata schemata will be created for metadata validation before ingestion. A subset of technical and scientific metadata will be integrated into the VISA portal, allowing scientists to access datasets within it. The VISA portal allows to create computing environments with pre-installed analysis tools and mounted datasets, providing easy access to data and tools.

During the presentation, the system's architecture, its components, and their interactions will be discussed, focusing on the harmonization of metadata schemata and the roadmap for developing tooling and processes for ingestion and validation of metadata.

Primary authors

Paul Millar (DESY) Tim Wetzel (DESY IT (Research and Innovation in Scientific Computing)) Dr Uwe Jandt (DESY)

Co-authors

Dr Johannes Reppin (DESY) Dr Julia Kobus (CAU Kiel) Dr Linus Pithan (DESY) Patrick Fuhrmann (DESY)

Presentation materials