Build your own pynxtools plugin¶
The pynxtools dataconverter is used to convert experimental data to NeXus/HDF5 files based on any provided NXDL schemas. The converter allows extending support to other data formats by allowing extensions called readers
. There exist a set of built-in pynxtools readers as well as pynxtools plugins to convert supported data files for some experimental techniques into compliant NeXus files.
Your current data is not supported yet by the built-in pynxtools readers or the officially supported pynxtools plugins?
Don't worry, the following how-to will guide you through the steps of writing a reader for your own data.
Getting started¶
You should start by creating a clean repository that implements the following structure (for a plugin called pynxtools-plugin
):
pynxtools-plugin
├── .github/workflows
├── docs
│ ├── explanation
│ ├── how-tos
│ ├── reference
│ ├── tutorial
├── src
│ ├── pynxtools_plugin
│ ├── reader.py
├── tests
│ └── data
├── LICENSE
├── mkdocs.yaml
├── dev-requirements.txt
└── pyproject.toml
To identify pynxtools-plugin
as a plugin for pynxtools, an entry point must be established (in the pyproject.toml
file):
Note that it is also possible that your plugin contains multiple readers. In that case, each reader must have its unique entry point.
Here, we will focus mostly on the reader.py
file and how to build a reader. For guidelines on how to build the other parts of your plugin, you can have a look here:
Writing a Reader¶
After you have established the main structure, you can start writing your reader. The new reader shall be placed in reader.py
.
Then implement the reader function:
"""MyDataReader implementation for the DataConverter to convert mydata to NeXus."""
from typing import Tuple, Any
from pynxtools.dataconverter.readers.base.reader import BaseReader
class MyDataReader(BaseReader):
"""MyDataReader implementation for the DataConverter to convert mydata to NeXus."""
supported_nxdls = [
"NXmynxdl" # this needs to be changed during implementation.
]
def read(
self,
template: dict = None,
file_paths: Tuple[str] = None,
objects: Tuple[Any] = None
) -> dict:
"""Reads data from given file and returns a filled template dictionary"""
# Here, you must provide functionality to fill the the template, see below.
# Example:
# template["/entry/instrument/name"] = "my_instrument"
return template
# This has to be set to allow the convert script to use this reader. Set it to "MyDataReader".
READER = MyDataReader
The reader template dictionary¶
The read function takes a Template
dictionary, which is used to map from the measurement (meta)data to the concepts defined in the NeXus application definition. The template contains keys that match the concepts in the provided NXDL file.
The returned template dictionary should contain keys that exist in the template as defined below. The values of these keys have to be data objects to populate the output NeXus file.
They can be lists, numpy arrays, numpy bytes, numpy floats, numpy ints, ... . Practically you can pass any value that can be handled by the h5py
package.
Example for a template entry:
For a given NXDL schema, you can generate an empty template with the command
Naming of groups¶
In case the NXDL does not define a name
for the group the requested data belongs to, the template dictionary will list it as /NAME_IN_NXDL[name_in_output_nexus]
. You can choose any name you prefer instead of the suggested name_in_output_nexus
(see here for the naming conventions). This allows the reader function to repeat groups defined in the NXDL to be outputted to the NeXus file.
Attributes¶
For attributes defined in the NXDL, the reader template dictionary will have the assosciated key with a "@" prefix to the attributes name at the end of the path:
Units¶
If there is a field defined in the NXDL, the converter expects a filled in /data/@units entry in the template dictionary corresponding to the right /data field unless it is specified as NX_UNITLESS in the NXDL. Otherwise, a warning will be shown.
{
"/ENTRY[my_entry]/INSTRUMENT[my_instrument]/SOURCE[my_source]/data": "None",
"/ENTRY[my_entry]/INSTRUMENT[my_instrument]/SOURCE[my_source]/data/@units": "Should be set to a string value"
}
Links¶
You can also define links by setting the value to sub dictionary object with key link
:
Building off of the BaseReader¶
When building off the BaseReader
, the developer has the most flexibility. Any new reader must implement the read
function, which must return a filled template object.
Building off of the MultiFormatReader¶
While building on the BaseReader
allows for the most flexibility, in most cases it is desirable to implement a reader that can read in multiple file formats and then populate the template based on the read data. For this purpose, pynxtools
has the MultiFormatReader
, which can be readily extended for your own data.
You can find an extensive how-to guide to build off the MultiFormatReader
here.
Calling the reader from the command line¶
The dataconverter can be executed using:
Here, the--reader
flag must match the reader name defined in [project.entry-points."pynxtools.reader"]
in the pyproject.toml file. The NXDL name passed to --nxdl
must be a valid NeXus NXDL/XML file in pynxtools.definitions
.
Aside from this default structure, there are many more flags that can be passed to the dataconverter call. Here is its API:
dataconverter¶
This command allows you to use the converter functionality of the dataconverter.
Usage:
Options:
Name | Type | Description | Default |
---|---|---|---|
--input-file |
text | Deprecated: Please use the positional file arguments instead. The path to the input data file to read. Repeat for more than one file. default=[] This option is required if no '--params-file' is supplied. | [] |
--reader |
choice (example | json_map | json_yml | multi ) |
The reader to use. Examples are json_map or readers from a pynxtools plugin. default='json_map' This option is required if no '--params-file' is supplied. | json_map |
--nxdl |
text | The name of the NeXus application definition file to use without the extension nxdl.xml. This option is required if no '--params-file' is supplied. | None |
--output |
text | The path to the output NeXus file to be generated. default='output.nxs' | output.nxs |
--params-file |
filename | Allows to pass a .yaml file with all the parameters the converter supports. | None |
--ignore-undocumented |
boolean | Ignore all undocumented fields during validation. | False |
--fail |
boolean | Fail conversion and don't create an output file if the validation fails. | False |
--skip-verify |
boolean | Skips the verification routine during conversion. | False |
--mapping |
text | Takes a |
None |
-c , --config |
file | A json config file for the reader | None |
--help |
boolean | Show this message and exit. | False |