Skip to content

How to write a parser

NOMAD uses parsers to convert raw data (for example, output from computational software, instruments, or electronic lab notebooks) into NOMAD's common Archive format. The following pages describe how to develop such a parser and integrate it within the NOMAD software. The goal is equip users with the required knowledge to contribute to and extend NOMAD.

Getting started

In principle, it is possible to develop a "local parser" that uses the nomad-lab package to parse raw data, without changing the NOMAD software itself. This allows a quick start for focusing on the parsing of the data itself, but is not relevant for full integration of your new parser into NOMAD. Here we are focused on developing parsers that will be integrated into the NOMAD software. For this, you will have to install a development version of NOMAD.

Parser organization

The NOMAD parsers can be found within your local NOMAD git repo under dependencies/parsers/. The parsers are organized into the following individual projects (dependencies/parsers/<parserproject>) with their own corresponding repositories:

  • atomistic - Parsers for output from classical molecular simulations, e.g., from Gromacs, Lammps, etc.
  • database - Parsers for various databases, e.g., OpenKim.
  • eelsdb - Parser for the EELS database (https://eelsdb.eu/; to be integrated in the database project).
  • electronic - Parsers for output from electronic structure calculations, e.g., from Vasp, Fhiaims, etc.
  • nexus - Parsers for combining various instrument output formats and electronic lab notebooks.
  • workflow - Parsers for output from task managers and workflow schedulers.

Within each project folder you will find a test/ directory, containing the parser tests, and also a directory containing the parsers' source code, <parserproject>parser or <parserproject>parsers, depending on if one or more parsers are contained within the project, respectively. In the case of multiple parsers, the files for individual parsers are contained within a corresponding subdirectory: <parserproject>parsers/<parsername> For example, the Quantum Espresso parser files are found in dependencies/parsers/electronic/electronicparsers/quantumespresso/.

Setting up your development branches

We will first focus on the case of adding a new parser to an existing parser project. Creating a new parser project will require a few extra steps.

The existing parser projects are stored within their own git repositories and then linked to the NOMAD software. All current parser projects are available at nomad-coe (see also individual links above).

You will first need to create new branches within both the NOMAD project and also within the corresponding parser project. Ideally, this should be done following the best practices for NOMAD development. Here, we briefly outline the procedure:

Create a new issue within the NOMAD project at NOMAD gitlab. On the page of the new issue, in the top right, click the arrow next to the Create merge request button and select Create branch. The branch name should be automatically generated with the corresponding issue number and the title of the issue (copy the branch name to the clipboard for use below), and the default source branch should be develop. Click the Create branch button.

Now, run the following commands in your local NOMAD directory:

git fetch --all      (to sync with remote)

git checkout origin/<new_branch_name> -b <new_branch_name>      (to checkout the new branch and create a local copy of the branch)

Unless you just installed the NOMAD development version, you should rerun ./scripts/setup_dev_env.sh within the NOMAD directory to reinstall with the newest development branch.

Now we need to repeat this process for the parser project that we plan to extend. As above, create a new issue at the relevant parser project GitHub page. Using the identical issue title as you did for the NOMAD project above is ideal for clarity. On the page of the new issue, in the right sidebar under the subsection Development, click Create a branch. As above, the branch name should be automatically generated with the corresponding issue number and the title of the issue, and the default source branch should be develop (which can be seen by clicking change source branch). Under What's next, the default option should be Checkout locally, which is what we want in this case. Click Create branch, and then copy the provided commands to the clipboard, and run them within the parser project folder within your local NOMAD repo, i.e., dependencies/parsers/<parserproject>.