Matching and creating a static entry¶
In this tutorial, we will build a parser that reads a raw instrument data file and populates a NOMAD archive entry with a custom schema. The result is a static entry in the GUI: the data is read and structured at upload time by the parser, with no further editing possible from the user.
The parser tutorial code is located in the src / nomad_plugin_tutorials / parsers / tutorial_1 directory.
Define the schema for the parser entry¶
Before writing the parser, we need a schema that describes the structure of the
data we want to store. The schema is defined the same way as in the schema
package tutorial, using ArchiveSection,
Quantity, and SubSection. The root section of the entry inherits
EntryData and Measurement basesection. The measurement settings
and results are composed as sub-sections providing clean data nesting.
Schema package to define the structure of parsed entries
from nomad.datamodel.data import ArchiveSection, EntryData
from nomad.datamodel.metainfo.basesections import Measurement, MeasurementResult
from nomad.metainfo import Quantity, SchemaPackage, SubSection
m_package = SchemaPackage()
class OpticalMicroscopySettings(ArchiveSection):
resolution = Quantity(
type=int,
shape=[2],
description='Resolution of the resulting image in terms of pixels.',
)
magnification = Quantity(
type=float,
description='Magnification factor used for the resulting image.',
)
class OpticalMicroscopyResults(MeasurementResult):
image = Quantity(
type=str,
description='Image file generated by the microscope.',
a_eln=ELNAnnotation(component=ELNComponentEnum.FileEditQuantity),
a_browser=BrowserAnnotation(adaptor=BrowserAdaptors.RawFileAdaptor),
)
class OpticalMicroscopy(Measurement, EntryData):
"""
An example schema for optical microscopy measurements.
"""
data_file = Quantity(
type=str,
description='Data file coming from the microscope.',
)
settings = SubSection(section_def=OpticalMicroscopySettings)
results = SubSection(section_def=OpticalMicroscopyResults, repeats=True)
m_package.__init_metainfo__()
Implement the parser with MatchingParser¶
MatchingParser is a commonly-used base class for NOMAD parsers. It handles matching raw files to the correct parser (by file name pattern, MIME type, or content) and passes control to the parse method when a match is found.
To build our parser, we subclass MatchingParser and override its parse method:
from nomad.parsing.parser import MatchingParser
class OpticalMicroscopyParser(MatchingParser):
def parse(
self, mainfile: str, archive: 'EntryArchive', logger=None, child_archives=None
) -> None:
...
The parse method receives three key arguments:
mainfile: The absolute path to the matched raw file on disk.archive: TheEntryArchiveobject for the entry being created. This is where we write our structured data.logger: A bound logger for reporting errors or warnings during parsing.
Read the raw file¶
The mainfile argument provides the full absolute path to the file. To access
it through NOMAD's upload context (which correctly handles paths in the
upload), we extract the relative path in the Upload raw folder and provide it
to the helper function read_data_file (from
nomad_plugin_tutorials.parsers.reader). The reader returns a dictionary of
data read from the XML file into data_dict.
data_file_path = mainfile.rsplit('/raw/', maxsplit=1)[-1]
data_dict = read_data_file(data_file_path, archive, logger)
Populate archive.data with the custom schema¶
The central concept of a parser entry is writing to archive.data. This is the data section of the entry and must be set to an instance of a class that inherits from EntryData.
We instantiate our OpticalMicroscopy schema class and assign it to archive.data at the end of the parse method:
In between, we populate the fields of the schema object by reading values from data_dict. NOMAD provides the m_setdefault helper to safely initialize nested sub-sections before assigning values to them:
measurement = OpticalMicroscopy(data_file=data_file_path)
# Populate top-level fields
if datetime := data_dict.get('datetime'):
measurement.datetime = datetime
# Populate a nested sub-section using m_setdefault
measurement.m_setdefault('settings')
if resolution := data_dict.get('resolution'):
measurement.settings.resolution = [float(x) for x in resolution.split('x')]
m_setdefault('settings') initializes measurement.settings with an empty OpticalMicroscopySettings instance if it does not already exist, preventing AttributeError on nested assignment.
For the repeatable results sub-section, use an index path:
Tutorial 1.1
Complete the parse method of OpticalMicroscopyParser to populate all fields of the OpticalMicroscopy schema from the parsed data dictionary and assign it to archive.data.
You can find this class in the tutorial-mode branch under src / nomad_plugin_tutorials / parsers / tutorial_1 / parsers / parser.py.
Tutorial 1.1: Solution
def parse(
self, mainfile: str, archive: 'EntryArchive', logger=None, child_archives=None
) -> None:
data_file_path = mainfile.rsplit('/raw/', maxsplit=1)[-1]
data_dict = read_data_file(data_file_path, archive, logger)
measurement = OpticalMicroscopy(data_file=data_file_path)
if datetime := data_dict.get('datetime'):
measurement.datetime = datetime
if (
'sample' in data_dict
and isinstance(data_dict['sample'], dict)
and 'sample_ID' in data_dict['sample']
):
measurement.m_setdefault('samples/0')
measurement.samples[0].lab_id = data_dict['sample']['sample_ID']
if 'description' in data_dict['sample']:
measurement.description = data_dict['sample']['description']
measurement.m_setdefault('settings')
if resolution := data_dict.get('resolution'):
measurement.settings.resolution = [float(x) for x in resolution.split('x')]
if magnification := data_dict.get('magnification'):
measurement.settings.magnification = float(magnification[:-1])
measurement.m_setdefault('results/0')
if image_file_name := data_dict.get('imageFileName'):
measurement.results[0].image = os.path.join(
os.path.dirname(data_file_path), image_file_name
)
archive.data = measurement
Static entries: the outcome¶
Once the parser runs successfully, NOMAD creates a static entry from the raw file. This entry is read-only from the user's perspective - there are no ELN fields for direct editing.
Static entries are ideal for instrument or simulation output files where the structured data should be derived entirely from the raw file without any manual input.
Testing the parser¶
If you use nomad-distro-dev development environment, all functionality of the plugin can be tested within GUI by restarting the appworker and/or the GUI. For details, please see the README.md file of the nomad-distro-dev repository.
For a stand-alone installation of the plugin, please use a provided tutorial.ipynb jupyter notebook (you can find it under src / nomad_plugin_tutorials / parsers / tutorial_1 / tutorial.ipynb).
Before running the notebook, ensure that the plugin and all dependencies are installed by running
or, if you use pip:
In step 1, you will use the parse() function from nomad.client to imitate uploading a file in GUI
In step 2, you can inspect the parsing results