Skip to content

Matching and creating a static entry

In this tutorial, we will build a parser that reads a raw instrument data file and populates a NOMAD archive entry with a custom schema. The result is a static entry in the GUI: the data is read and structured at upload time by the parser, with no further editing possible from the user.

The parser tutorial code is located in the src / nomad_plugin_tutorials / parsers / tutorial_1 directory.

Define the schema for the parser entry

Before writing the parser, we need a schema that describes the structure of the data we want to store. The schema is defined the same way as in the schema package tutorial, using ArchiveSection, Quantity, and SubSection. The root section of the entry inherits EntryData and Measurement basesection. The measurement settings and results are composed as sub-sections providing clean data nesting.

Schema package to define the structure of parsed entries
from nomad.datamodel.data import ArchiveSection, EntryData
from nomad.datamodel.metainfo.basesections import Measurement, MeasurementResult
from nomad.metainfo import Quantity, SchemaPackage, SubSection

m_package = SchemaPackage()


class OpticalMicroscopySettings(ArchiveSection):
    resolution = Quantity(
        type=int,
        shape=[2],
        description='Resolution of the resulting image in terms of pixels.',
    )
    magnification = Quantity(
        type=float,
        description='Magnification factor used for the resulting image.',
    )


class OpticalMicroscopyResults(MeasurementResult):
    image = Quantity(
        type=str,
        description='Image file generated by the microscope.',
        a_eln=ELNAnnotation(component=ELNComponentEnum.FileEditQuantity),
        a_browser=BrowserAnnotation(adaptor=BrowserAdaptors.RawFileAdaptor),
    )


class OpticalMicroscopy(Measurement, EntryData):
    """
    An example schema for optical microscopy measurements.
    """
    data_file = Quantity(
        type=str,
        description='Data file coming from the microscope.',
    )
    settings = SubSection(section_def=OpticalMicroscopySettings)
    results = SubSection(section_def=OpticalMicroscopyResults, repeats=True)


m_package.__init_metainfo__()

Implement the parser with MatchingParser

MatchingParser is a commonly-used base class for NOMAD parsers. It handles matching raw files to the correct parser (by file name pattern, MIME type, or content) and passes control to the parse method when a match is found.

To build our parser, we subclass MatchingParser and override its parse method:

from nomad.parsing.parser import MatchingParser

class OpticalMicroscopyParser(MatchingParser):
    def parse(
        self, mainfile: str, archive: 'EntryArchive', logger=None, child_archives=None
    ) -> None:
        ...

The parse method receives three key arguments:

  • mainfile: The absolute path to the matched raw file on disk.
  • archive: The EntryArchive object for the entry being created. This is where we write our structured data.
  • logger: A bound logger for reporting errors or warnings during parsing.

Read the raw file

The mainfile argument provides the full absolute path to the file. To access it through NOMAD's upload context (which correctly handles paths in the upload), we extract the relative path in the Upload raw folder and provide it to the helper function read_data_file (from nomad_plugin_tutorials.parsers.reader). The reader returns a dictionary of data read from the XML file into data_dict.

data_file_path = mainfile.rsplit('/raw/', maxsplit=1)[-1]
data_dict = read_data_file(data_file_path, archive, logger)

Populate archive.data with the custom schema

The central concept of a parser entry is writing to archive.data. This is the data section of the entry and must be set to an instance of a class that inherits from EntryData.

We instantiate our OpticalMicroscopy schema class and assign it to archive.data at the end of the parse method:

archive.data = measurement

In between, we populate the fields of the schema object by reading values from data_dict. NOMAD provides the m_setdefault helper to safely initialize nested sub-sections before assigning values to them:

measurement = OpticalMicroscopy(data_file=data_file_path)

# Populate top-level fields
if datetime := data_dict.get('datetime'):
    measurement.datetime = datetime

# Populate a nested sub-section using m_setdefault
measurement.m_setdefault('settings')
if resolution := data_dict.get('resolution'):
    measurement.settings.resolution = [float(x) for x in resolution.split('x')]

m_setdefault('settings') initializes measurement.settings with an empty OpticalMicroscopySettings instance if it does not already exist, preventing AttributeError on nested assignment.

For the repeatable results sub-section, use an index path:

measurement.m_setdefault('results/0')
measurement.results[0].image = 'path/to/image.jpeg'

Tutorial 1.1

Complete the parse method of OpticalMicroscopyParser to populate all fields of the OpticalMicroscopy schema from the parsed data dictionary and assign it to archive.data.

You can find this class in the tutorial-mode branch under src / nomad_plugin_tutorials / parsers / tutorial_1 / parsers / parser.py.

Tutorial 1.1: Solution
def parse(
    self, mainfile: str, archive: 'EntryArchive', logger=None, child_archives=None
) -> None:
    data_file_path = mainfile.rsplit('/raw/', maxsplit=1)[-1]
    data_dict = read_data_file(data_file_path, archive, logger)

    measurement = OpticalMicroscopy(data_file=data_file_path)
    if datetime := data_dict.get('datetime'):
        measurement.datetime = datetime
    if (
        'sample' in data_dict
        and isinstance(data_dict['sample'], dict)
        and 'sample_ID' in data_dict['sample']
    ):
        measurement.m_setdefault('samples/0')
        measurement.samples[0].lab_id = data_dict['sample']['sample_ID']
        if 'description' in data_dict['sample']:
            measurement.description = data_dict['sample']['description']

    measurement.m_setdefault('settings')
    if resolution := data_dict.get('resolution'):
        measurement.settings.resolution = [float(x) for x in resolution.split('x')]
    if magnification := data_dict.get('magnification'):
        measurement.settings.magnification = float(magnification[:-1])

    measurement.m_setdefault('results/0')
    if image_file_name := data_dict.get('imageFileName'):
        measurement.results[0].image = os.path.join(
            os.path.dirname(data_file_path), image_file_name
        )

    archive.data = measurement

Static entries: the outcome

Once the parser runs successfully, NOMAD creates a static entry from the raw file. This entry is read-only from the user's perspective - there are no ELN fields for direct editing.

Static entries are ideal for instrument or simulation output files where the structured data should be derived entirely from the raw file without any manual input.

Testing the parser

If you use nomad-distro-dev development environment, all functionality of the plugin can be tested within GUI by restarting the appworker and/or the GUI. For details, please see the README.md file of the nomad-distro-dev repository.

For a stand-alone installation of the plugin, please use a provided tutorial.ipynb jupyter notebook (you can find it under src / nomad_plugin_tutorials / parsers / tutorial_1 / tutorial.ipynb).

Before running the notebook, ensure that the plugin and all dependencies are installed by running

uv sync --extra dev

or, if you use pip:

pip install -e '.[dev]'

In step 1, you will use the parse() function from nomad.client to imitate uploading a file in GUI

In step 2, you can inspect the parsing results