Skip to content

Build your own pynxtools plugin

The pynxtools dataconverter is used to convert experimental data to NeXus/HDF5 files based on any provided NXDL schemas. The converter allows extending support to other data formats by allowing extensions called readers. There exist a set of built-in pynxtools readers as well as pynxtools plugins to convert supported data files for some experimental techniques into compliant NeXus files.

Your current data is not supported yet by the built-in pynxtools readers or the officially supported pynxtools plugins?

Don't worry, the following how-to will guide you through the steps of writing a reader for your own data.

Getting started

You should start by creating a clean repository that implements the following structure (for a plugin called pynxtools-plugin):

pynxtools-plugin
├── .github/workflows
├── docs
│   ├── explanation
│   ├── how-tos
│   ├── reference
│   ├── tutorial
├── src
│   ├── pynxtools_plugin
│       ├── reader.py
├── tests
│   └── data
├── LICENSE
├── mkdocs.yaml
├── dev-requirements.txt
└── pyproject.toml

To identify pynxtools-plugin as a plugin for pynxtools, an entry point must be established (in the pyproject.toml file):

[project.entry-points."pynxtools.reader"]
mydatareader = "pynxtools_plugin.reader:MyDataReader"

Note that it is also possible that your plugin contains multiple readers. In that case, each reader must have its unique entry point.

Here, we will focus mostly on the reader.py file and how to build a reader. For guidelines on how to build the other parts of your plugin, you can have a look here:

Writing a Reader

After you have established the main structure, you can start writing your reader. The new reader shall be placed in reader.py.

Then implement the reader function:

reader.py
"""MyDataReader implementation for the DataConverter to convert mydata to NeXus."""
from typing import Tuple, Any

from pynxtools.dataconverter.readers.base.reader import BaseReader

class MyDataReader(BaseReader):
    """MyDataReader implementation for the DataConverter to convert mydata to NeXus."""

    supported_nxdls = [
        "NXmynxdl" # this needs to be changed during implementation.
    ]

    def read(
        self,
        template: dict = None,
        file_paths: Tuple[str] = None,
        objects: Tuple[Any] = None
    ) -> dict:
        """Reads data from given file and returns a filled template dictionary"""
        # Here, you must provide functionality to fill the the template, see below.
        # Example:
        # template["/entry/instrument/name"] = "my_instrument"

        return template


# This has to be set to allow the convert script to use this reader. Set it to "MyDataReader".
READER = MyDataReader

The reader template dictionary

The read function takes a Template dictionary, which is used to map from the measurement (meta)data to the concepts defined in the NeXus application definition. The template contains keys that match the concepts in the provided NXDL file.

The returned template dictionary should contain keys that exist in the template as defined below. The values of these keys have to be data objects to populate the output NeXus file. They can be lists, numpy arrays, numpy bytes, numpy floats, numpy ints, ... . Practically you can pass any value that can be handled by the h5py package.

Example for a template entry:

{
  "/entry/instrument/source/type": "None"
}

For a given NXDL schema, you can generate an empty template with the command

user@box:~$ dataconverter generate-template --nxdl NXmynxdl

Naming of groups

In case the NXDL does not define a name for the group the requested data belongs to, the template dictionary will list it as /NAME_IN_NXDL[name_in_output_nexus]. You can choose any name you prefer instead of the suggested name_in_output_nexus (see here for the naming conventions). This allows the reader function to repeat groups defined in the NXDL to be outputted to the NeXus file.

{
  "/ENTRY[my_entry]/INSTRUMENT[my_instrument]/SOURCE[my_source]/type": "None"
}

Attributes

For attributes defined in the NXDL, the reader template dictionary will have the assosciated key with a "@" prefix to the attributes name at the end of the path:

{
  "/entry/instrument/source/@attribute": "None"
}

Units

If there is a field defined in the NXDL, the converter expects a filled in /data/@units entry in the template dictionary corresponding to the right /data field unless it is specified as NX_UNITLESS in the NXDL. Otherwise, a warning will be shown.

{
  "/ENTRY[my_entry]/INSTRUMENT[my_instrument]/SOURCE[my_source]/data": "None",
  "/ENTRY[my_entry]/INSTRUMENT[my_instrument]/SOURCE[my_source]/data/@units": "Should be set to a string value"
}

You can also define links by setting the value to sub dictionary object with key link:

template["/entry/instrument/source"] = {"link": "/path/to/source/data"}

Building off of the BaseReader

When building off the BaseReader, the developer has the most flexibility. Any new reader must implement the read function, which must return a filled template object.

Building off of the MultiFormatReader

While building on the BaseReader allows for the most flexibility, in most cases it is desirable to implement a reader that can read in multiple file formats and then populate the template based on the read data. For this purpose, pynxtools has the MultiFormatReader, which can be readily extended for your own data.

You can find an extensive how-to guide to build off the MultiFormatReader here.

Calling the reader from the command line

The dataconverter can be executed using:

user@box:~$ dataconverter --reader mydatareader --nxdl NXmynxdl --output path_to_output.nxs
Here, the --reader flag must match the reader name defined in [project.entry-points."pynxtools.reader"] in the pyproject.toml file. The NXDL name passed to --nxdlmust be a valid NeXus NXDL/XML file in pynxtools.definitions.

Aside from this default structure, there are many more flags that can be passed to the dataconverter call. Here is its API:

dataconverter

This command allows you to use the converter functionality of the dataconverter.

Usage:

dataconverter [OPTIONS] [FILES]...

Options:

Name Type Description Default
--input-file text Deprecated: Please use the positional file arguments instead. The path to the input data file to read. Repeat for more than one file. default=[] This option is required if no '--params-file' is supplied. []
--reader choice (example | json_map | json_yml | multi) The reader to use. Examples are json_map or readers from a pynxtools plugin. default='json_map' This option is required if no '--params-file' is supplied. json_map
--nxdl text The name of the NeXus application definition file to use without the extension nxdl.xml. This option is required if no '--params-file' is supplied. None
--output text The path to the output NeXus file to be generated. default='output.nxs' output.nxs
--params-file filename Allows to pass a .yaml file with all the parameters the converter supports. None
--ignore-undocumented boolean Ignore all undocumented fields during validation. False
--fail boolean Fail conversion and don't create an output file if the validation fails. False
--skip-verify boolean Skips the verification routine during conversion. False
--mapping text Takes a .mapping.json file and converts data from given input files. None
-c, --config file A json config file for the reader None
--help boolean Show this message and exit. False