Skip to content

Developing parsers for FAIR computational data storage of spectroscopy simulations using the NOMAD-Simulations package

This documentation page will provide foundational knowledge for creating parser plugins in NOMAD to manage computational data of spectroscopy simulations. We will use the nomad-simulations package to populate our data schema and extend it when needed. The steps are more general and can be used to create any parser plugin, but in here we will use examples specifically related with theoretical spectroscopy simulations.

Hackathon information

The structure of this documentation page is as follows:

  1. Understanding the NOMAD-Simulations schema, what are its strengths and weaknesses, as well as learning how to extend it.
  2. How to create a parser plugin, importing the data schema from the NOMAD-Simulations package and matching the files to be parsed.

You can find Assignments throughout these documentation pages which will help you understand the main concepts.

To help facilitate discussions and provide prolonged assistance beyond the tutorial, we have created a #hackathon-fairspectra-sep2024 event channel in the NOMAD Discord server.

General background

NOMAD is an open-source, community-driven data infrastructure, focusing on materials science data. Originally built as a repository for data from DFT calculations, the NOMAD software can automatically extract data from the output of a large variety of simulation codes.

The key advantages of the NOMAD-Simulations schema are summed up in FAIRmat's core values:

  • Findable: a wide selection of the extracted data is indexed in a database, powering a the search with highly customizable queries and modular search parameters.
  • Accessible: the same database specifies clear API and GUI protocols on how retrieve the full data extracted.
  • Interoperable: we have a diverse team of experts who interface with various materials science communities, looking into harmonizing data representations and insights among them. Following the NOMAD standard also opens up the (meta)data to the "NOMAD apps" ecosystem.
  • Reproducible: data is not standalone, but has a history, a vision, a workflow behind it. Our schema aims to capture the full context necessary for understanding and even regenerating via metadata.

We have presented and prepared several resources that can be visited after this Hackathon. You can find more information in the Domain-specific NOMAD documentation page. If you want to know more, we also recommend you to check our latest Tutorial 14, the Youtube playlist as well as its documentation page.