Build a parser¶
Your current data is not supported yet? Don't worry, the following how-to will guide you how to write a reader your own data.
This guide covers two scenarios:
- Your format is already partially supported — you only need to adjust how fields are mapped to NeXus.
- Your format is not supported at all — you need to write a new parser from scratch.
Your format is already supported, but some fields differ¶
Good! The basic functionality to read your data is already in place.
The simplest fix is to supply a custom ELN YAML file with the missing or corrected values. For structural mismatches, consider one of the following before writing a new parser:
- Modify a config file. Each vendor has a JSON config in
src/pynxtools_xps/config/that maps flat dict keys to NeXus paths. Adjusting the mapping there is often enough. - Open a pull request. If your change benefits all users of that format, propose it on the GitHub repository.
Adding a completely new format¶
Adding a new format requires creating a vendor subpackage, registering the parser, and providing test data.
1. Set up a development environment¶
Follow the instructions in the development guide.
2. Create the vendor subpackage¶
Add a new directory under
src/pynxtools_xps/parsers/
with the following layout:
parsers/<vendor>/
├── __init__.py
├── data_model.py # typed intermediate representation
├── metadata.py # _MetadataContext instance
└── parser.py # _XPSParser subclass
Use an existing parser (for example,
parsers/vms/
or
parsers/phi/)
as a reference for the expected structure.
3. Define typed data models (data_model.py)¶
Create dataclasses for each logical record in your format (header, spectrum, region, …)
by subclassing
_XPSDataclass.
_XPSDataclass enforces type annotations at assignment time and coerces compatible
types automatically. See the
phi/data_model.py for a worked example.
Note
Defining a data model is recommended, but remains optional. It is only applicable if the data model for a vendor format is always the same and can thus be explicitly written down.
4. Define the normalization context (metadata.py)¶
Create a module-level _context instance of
_MetadataContext
with four mappings:
key_map— vendor-specific key names → canonical namesvalue_map— canonical key → converter function (use the shared converters frommapping.pywhere possible:_convert_measurement_method,_convert_energy_scan_mode, etc.)unit_map— vendor unit strings → standard units (None= dimensionless/drop)default_units— canonical key → unit when the value carries none
See
parsers/vms/metadata.py
for a typical example.
5. Implement the parser (parser.py)¶
Start by subclassing
_XPSParser. Implement the following required class variables and methods:
supported_file_extensions— a tuple of file extensions that your parser supportsmatches_file(file_path)— returnTrueonly if the file unambiguously conforms to your format (check headers, magic bytes, or required keywords)._parse(file_path)— extract all spectra and populateself._dataas a list of dictionaries, where each dict holds the raw key-value pairs for one spectrum.
If your format embeds a version string, override detect_version(file_path) to return
a VersionTuple (produced by
normalize_version)
and declare supported_versions and requires_version as class variables.
Set config_file to the name of the JSON config you will create in step 7.
6. Register the parser¶
- Import and re-export the new mapper class from
parsers/__init__.py. - Add the parser and mapper to the lists in
reader.py.
7. Add a config file¶
Create a config/config_<vendor>.json config file by starting from
config/template.json. Each entry maps a canonical dict key to the NeXus path in NXxps or NXmpes.
8. Add tests¶
- Place at least one representative example file in
tests/data/<vendor>/. - Add your test case to the
test_caseslist intests/test_reader.py. - Generate the reference
.nxsfile by running the conversion once and inspecting the output. You may also modifyscripts/generate_reference_files.shto automate this. - Run the full test suite to confirm nothing is broken:
9. Add documentation¶
Add a page to the References section of the documentation to describe what your parser does, which files and versions are supported, and how it integrates in the parser landscape.
Note that there exists a special mkdocs macro (in src/pynxtools_xps/mkdocs.py) to automatically generate the supported file formats and versions of a parser.
It can be called within your reference markdown file like this:
For example, for the SpecsSLEParser located in src/pynxtools_xps/parsers/specs/sle/parser.py, the call is:
Further reading¶
- Explanation > Parser architecture — the three-layer pipeline
NXxpsapplication definitionNXmpesapplication definition