Skip to content

Table of Contents

  1. Introduction
  2. nyaml Workflow
  3. How to Use nyaml Tool
  4. Conversion from YAML to XML
  5. Design of NeXus Ontology and Terms in YAML
  6. Root section for base classes and application definitions
  7. NeXus Group
  8. NeXus Field and NeXus Attrubute
  9. NeXus Link
  10. NeXus Choice
  11. Special Keywords in YAML
  12. Keyword exists
  13. Keyword unit
  14. Keyword dimensions
  15. Keyword enumeration
  16. Keyword xref
  17. How to Install nyaml
  18. Conclusion
  19. References

Introduction

The NeXus data format, described by the NeXus Definition Language (NXDL), represents a concerted effort aimed at facilitating data exchange within scientific communities, particularly among those engaged in neutron, X-ray, and muon research J. Appl. Cryst. (2015). 48, 301-305. It serves as a standardized framework for both data exchange and storage. At its core, the NeXus Definition Language (NXDL) functions as the cornerstone through which scientists delineate the nomenclature and organizational structure of information within NeXus data files, tailored to specific scientific techniques.

NXDL is used to define general data storage objects (base classes) and use them as the building blocks for defining measurement-specific or even instrument-specific data storage objects (application definitions). In this process, members and definitions of individual base classes can be used as is or customized. In essence, the process of schema development, whether for a base class or an application definition, entails crafting an NXDL schema definition file with the extension 'nxdl.xml', utilizing the Extensible Markup Language, XML .

To expedite the schema development process, we have introduced Yet Another Markup Language (YAML), which provides a syntax or style specifically tailored for defining scientific domain-driven schemas with NXDL. One significant advantage of YAML over XML is its indentation-driven approach, which eliminates the need for starting and ending tags for each entity within the schema. The YAML format results in a reduction of NXDL keyword repetition and allows for a more intuitive grasp of Python syntax, such as class inheritance. These benefits are attained without compromising the integrity of the original NeXus schema, which is traditionally expressed in XML format.

The YAML format, while not an official version of NeXus application definitions or base classes, necessitates a method for transcoding it into XML. The nyaml Python package serves as a converter tool designed specifically for this purpose. It enables the conversion of NXDL from YAML format to XML, thereby enhancing the capability of NeXus schema developers to incorporate domain-specific scientific knowledge into the schema. Furthermore, the tool offers the flexibility to extend existing NeXus schemas in XML by facilitating conversion back and forth between the two formats. It is important to note that this paper does not introduce NeXus objects, terms, or types, which are fundamental for writing base class schemas or application definition schemas. For individuals new to NeXus, please refer to the official NeXus site at NeXus official site.

nyaml Workflow

Like every scientific software, the nyaml tool also follows a specific workflow.

graph TD; subgraph Start id1["Input File (YAML or XML)"] end subgraph Correct File Converter id2["YAML Converter"] id3["XML Converter"] end subgraph YAML converter id4["Comment Collector"] id5["Python Dictionary Object"] end subgraph XML converter id6["XML Object"] end subgraph Final Product id7["Write XML File"] id8["Write YAML File"] end id1--> |YAML File|id2 id1--> |XML File|id3 id2-->id4 id4-->id5 id3-->id6 id5-->id7 id6-->id8

With input file the nyaml converter checks for the correct file type and call appropriate converter. For XML file, the XML converter parse the XML file, by lxml python library, into a XML tree object. By following the NXDL rules the converter writes the application definition or base class object into yaml file following the nyaml syntax. If the input file is yaml then the yaml converter collects the comments in a Comments object and parse the yaml file into python dictionary object. Later, the application definition or base classes will be written into XML file from the Comments and python dictionary object.

How to Install nyaml

The tool is published to PyPI and available for plain install

$ pip install nyaml

To contribute on the tool or to install in a develpment mode

$ git clone https://github.com/FAIRmat-NFDI/nyaml.git
$ cd nyaml
$ pip install -e ".[dev]"

To install pre-commit hook for code formatting and linting

$ pre-commit install

How to Use nyaml Tool

This is a command line tool to convert NeXus application definition or base class in yaml file format into nxdl.xml file format and vice-versa. The converter can be called by command

$ nyaml2nxdl [OPTIONS] [INPUT_FILE]

with the available options:

  --output-file TEXT   Specify the output file path for the converted file.
  --check-consistency  Check whether YAML and NXDL can be recursively
                       converted, ensuring version consistency.
  --do-not-store-nxdl  Prevent the input NXDL file from being stored as a
                       comment at the end of the output YAML file.
  --verbose            Display keywords and value types in standard output to
                       assist in identifying issues in YAML files.
  --help               Show this message and exit.

The --output-file option if user wants to define output file name (including extension) otherwise converter will define the output file name e.g. from input file NXapplication.nxdl.xml (NXapplication.yaml) the resultant file will be NXapplication_parser.yaml (NXapplication.nxdl.xml). With the option --check-consistency the converter produces the same type of file as the input, e.g. for input NXapplication.nxdl.xml the output file is NXapplication_consistency.nxd.xml. The intention for this option is to verify proper file and version conversion of the file. When converting the nxdl.xml file into yaml it also stores the nxdl.xml file at the end of yaml file with a hash. The option --do-not-store-nxdl prevents the yaml file from storing nxdl.xml text. The verbose option is to identify the issue, if there are some unexpected conversion, while converting the file from one to another.

Conversion from YAML to XML

Presented below is a concise and trimmed example of the NXmpes application definition in YAML format, alongside its corresponding translation into XML format, as illustrated below. Subsequently, the fundamental rules governing this conversion process are elucidated. For a comprehensive understanding of the basic structure of NXDL, readers are encouraged to explore the NeXus Manual. Throughout the followed discussions, various components of the NXmpes application definition will be discussed in the light of nyaml converter.

NXmpes application definition in YAML format

category: application
type: group
doc: |
  This is the most general application definition for multidimensional photoelectron spectroscopy.

  .. _ISO 18115-1:2023: https://www.iso.org/standard/74811.html
  .. _IUPAC Recommendations 2020: https://doi.org/10.1515/pac-2019-0404
symbols:
  doc: |
    The symbols used in the schema to specify e.g. dimensions of arrays
  n_transmission_function: |
    Number of data points in the transmission function.
NXmpes(NXobject):
  (NXentry):
    exsits: required
    definition:
      \@version:
      enumeration: [NXmpes]
    title:
    start_time(NX_DATE_TIME):
      doc: |
        Datetime of the start of the measurement.
    end_time(NX_DATE_TIME):
      exists: recommended
      doc: |
        Datetime of the end of the measurement.
    (NXinstrument):
      doc:
      - |
        Description of the MPES spectrometer and its individual parts.
      - |
        xref:
          spec: ISO 18115-1:2023
          term: 12.58
          url: https://www.iso.org/obp/ui/en/#iso:std:iso:18115:-1:ed-3:v1:en:term:12.58
      source_TYPE(NXsource):
        exists: recommended
        doc: |
          A source used to generate a beam.
      (NXmanipulator):
        exists: optional
        doc: |
          Manipulator for positioning of the sample.
        value_log(NXlog):
          exists: optional
          value(NX_NUMBER):
            unit: NX_PRESSURE
            doc: |
              In the case of an experiment in which the gas pressure changes and is recorded,
              this is an array of length m of gas pressures.
    (NXprocess):
      exists: recommended
      doc: |
        Document an event of data processing, reconstruction, or analysis for this data.
      transmission_correction(NXcalibration):
        exists: optional
        doc: |
          This calibration procedure is used to account for the different tranmsission efficiencies.
        transmission_function(NXdata):
          exists: recommended
          doc: |
            Transmission function of the electron analyser.
          \@axes:
            enumeration: [kinetic_energy]
          kinetic_energy(NX_FLOAT):
            unit: NX_ENERGY
            doc: |
              Kinetic energy values
            dimensions:
              rank: 1
              dim: [[1, n_transmission_function]]

NXmpes application definition in nxdl.xml format

  <?xml version='1.0' encoding='UTF-8'?>
  <?xml-stylesheet type="text/xsl" href="nxdlformat.xsl"?>
  <definition xmlns="http://definition.nexusformat.org/nxdl/3.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" category="application" type="group" name="NXmpes" extends="NXobject" xsi:schemaLocation="http://definition.nexusformat.org/nxdl/3.1 ../nxdl.xsd">
      <symbols>
          <doc>
              The symbols used in the schema to specify e.g. dimensions of arrays
          </doc>
          <symbol name="n_transmission_function">
              <doc>
                  Number of data points in the transmission function.
              </doc>
          </symbol>
      </symbols>
      <doc>
          This is the most general application definition for multidimensional
          photoelectron spectroscopy.

          .. _ISO 18115-1:2023: https://www.iso.org/standard/74811.html
          .. _IUPAC Recommendations 2020: https://doi.org/10.1515/pac-2019-0404
      </doc>
      <group type="NXentry">
          <field name="definition">
              <attribute name="version"/>
              <enumeration>
                  <item value="NXmpes"/>
              </enumeration>
          </field>
          <field name="title"/>
          <field name="start_time" type="NX_DATE_TIME">
              <doc>
                  Datetime of the start of the measurement.
              </doc>
          </field>
          <field name="end_time" type="NX_DATE_TIME" recommended="true">
              <doc>
                  Datetime of the end of the measurement.
              </doc>
          </field>
          <group type="NXinstrument">
              <doc>
                  Description of the MPES spectrometer and its individual parts.

                  This concept is related to term `12.58`_ of the ISO 18115-1:2023 standard.

                  .. _12.58: https://www.iso.org/obp/ui/en/#iso:std:iso:18115:-1:ed-3:v1:en:term:12.58
              </doc>
              <group name="source_TYPE" type="NXsource" recommended="true">
                  <doc>
                      A source used to generate a beam.
                  </doc>
              </group>
              <group type="NXmanipulator" optional="true">
                  <doc>
                      Manipulator for positioning of the sample.
                  </doc>
                  <group name="value_log" type="NXlog" optional="true">
                      <field name="value" type="NX_NUMBER" units="NX_PRESSURE">
                          <doc>
                              In the case of an experiment in which the gas pressure changes and is recorded,
                              this is an array of length m of gas pressures.
                          </doc>
                      </field>
                  </group>
              </group>
          </group>
          <group type="NXprocess" recommended="true">
              <doc>
                  Document an event of data processing, reconstruction, or analysis for this data.
              </doc>
              <group name="transmission_correction" type="NXcalibration" optional="true">
                  <doc>
                      This calibration procedure is used to account for the different tranmsission
                      efficiencies.
                  </doc>
                  <group name="transmission_function" type="NXdata" recommended="true">
                      <doc>
                          Transmission function of the electron analyser.
                      </doc>
                      <attribute name="axes">
                          <enumeration>
                              <item value="kinetic_energy"/>
                          </enumeration>
                      </attribute>
                      <field name="kinetic_energy" type="NX_FLOAT" units="NX_ENERGY">
                          <doc>
                              Kinetic energy values
                          </doc>
                          <dimensions rank="1">
                              <dim index="1" value="n_transmission_function"/>
                          </dimensions>
                      </field>
                  </group>
              </group>
          </group>
      </group>
  </definition>

Design of NeXus Ontology and Terms in YAML

Root section for base classes and application definitions:

Within the YAML format, the root section denotes the top-level description of the application definition or base class schema, comprising the category, type, doc, symbols block, and the name of the schema (e.g. NXmpes(NXobject)). Correspondingly, the root section refers to the XML element definition, encompassing the first doc child of the definition and symbols. The definition element encapsulates essential xml attributes such as the schema's name (and xml attribute), the object it extends (an xml attribute), and the schema type (an xml attribute), with additional XML attributes (e.i. xmlns:xsi) handled by the nyaml converter. The accurate designation of category as either base or application distinguishes between an application definition and a base class. The schema name (e.i. NXmpes(NXobject)) with paranthesis indicates the extension of the current application definition, noting that base classes must extends NXobject, whereas application definitions may extends either NXobject or another application definition (excluding base classes). Schemas may incorporate one or multiple symbols, each imbued with specialized physical meanings beyond their literal interpretation, which are utilised over the application definition.

A typical root section for the application definition NXmpes outlined

category: application
type: group
doc: |
  This is the most general application definition for multidimensional photoelectron spectroscopy.

  .. _ISO 18115-1:2023: https://www.iso.org/standard/74811.html
  .. _IUPAC Recommendations 2020: https://doi.org/10.1515/pac-2019-0404
symbols:
  doc: |
    The symbols used in the schema to specify e.g. dimensions of arrays
  n_transmission_function: |
    Number of data points in the transmission function.
NXmpes(NXobject):

NeXus Group

NeXus groups, as instances of NeXus base classes, embody the compositional structure of application definitions. These groups can be initialized dynamically or statically, each approach offering distinct advantages.

Dynamic initialization allows for the instantiation of groups while implementing the NeXus definition to store data (in HDF5 file format called NeXus file). This method provides flexibility for multiple instances at the same level within the NeXus file. For instance, the group (NXmanipulator) can initialize multiple groups such as manipulator1 and manipulator2 of the base class NXmanipulator during data writing.

In contrast, static initialization, exemplified by syntax like value_log(NXlog), references a single group value_log of the base class NXlog, disallowing the construction of other groups from NXlog at the same level. Such groups appear only once within the same level of current application definition.

Descriptive information about NeXus groups is encapsulated within the doc child of the respective group. It is important to note that the group annotation of source_TYPE(NXsource) or (NXsource)source_TYPE signifies the encapsulation of the group's name as source_TYPE and its type as NXsource base class. Notably, the order between name and type within the XML element must be inverted such two different syntax.

Furthermore, the uppercase part of the group's name can be dynamically overwritten, allowing for the instantiation of multiple instances. For example, source_electric and source_magnetic can coexist from NXsource. It is essential to adhere to the uppercase dynamic rules for NeXus groups, fields, and attributes.

NeXus Groups in YAML format

# NeXus groups in YAML format
source_TYPE(NXsource):
  exists: recommended
  doc: |
    A source used to generate a beam.
(NXmanipulator):
  exists: optional
  doc: |
    Manipulator for positioning of the sample.
  value_log(NXlog):
    exists: optional

NeXus Field and NeXus Attrubute

NeXus group may contain NeXus fields, NeXus attributes, and NeXus groups. A field, that does not have preceding NX, and a attribute, preceded by \@, must have a NeXus type (e.g.NX_FLOAT, NX_CHAR). In YAML format each NeXus field or NeXus attribute has a implicit type NX_CHAR otherwise type must be denoted inside the paranthesis (e.g. end_time(NX_DATE_TIME)). Other xml attributes of NeXus field and NeXus attribute comes as children of the field and attribute (the special keywords will be discussed on next section). The introductory text of the NeXus fields or attributes goes under doc child.

A NeXus group may encompass multiple field, attribute, and subgroup, each serving distinct purposes within the data structure. The field, denoted without the prefix NX, and attribute, indicated by \@, must be associated with a NeXus type (e.g., NX_FLOAT, NX_CHAR). In YAML format, each field or attribute (NeXus attribute) implicitly assumes the type NX_CHAR, unless explicitly specified within parentheses (e.g., end_time(NX_DATE_TIME)).

Additionally, XML attributes specific to NeXus field and attribute are represented as children of the corresponding field or attribute elements (further details on special keywords will be discussed in the following section). Descriptive information pertaining to NeXus fields or attributes is encapsulated within the doc child element.

NeXus field and attribute in YAML format

(NXentry):
  exsits: required
  definition:  # Field type: NX_CHAR
    \@version:  # Attribute type: NX_CHAR
    enumeration: [NXmpes]
  title:
  start_time(NX_DATE_TIME):  # Field type: NX_DATE_TIME
    doc: Datetime of the start of the measurement.
  end_time(NX_DATE_TIME):  # Field type: NX_DATE_TIME
    exists: recommended
    doc: Datetime of the end of the measurement.

NeXus link concept reduces duplication of the data, while several concepts of the same kind (e.g. NeXus field or NeXus attribute) can refer to the single copy of data . In YAML format NeXus link is defined denoting the link in side paranthesis. At the same time the concept containing the data must be mentioned under the target child.

NeXus link in YAML format

reference_measurement(link):
  target: /NXentry
  doc: A link to a full data collection.

In the provided YAML example, reference_measurement is defined as a link refering the NXentry group with its target specified as /NXentry. This structure ensures that the concept referencing the data is effectively linked to the designated target, thereby reducing redundancy and maintaining data integrity within the NeXus framework.

NeXus Choice

NeXus choice concept is designed to choice a concept from a bunch of concepts of the same kind (e.g. NeXus field). The choice opens the door to define a scientific concept in several mode regarding different situations.

NeXus choice in YAML format

pixel_shape(choice):
  (NXoff_geometry):
    doc: Shape description of each pixel. Use only if all pixels in the detector
      are of uniform shape.
  (NXcylindrical_geometry):
    doc: Shape description of each pixel. Use only if all pixels in the detector
      are of uniform shape and require being described by cylinders.

In this choice example pixes_shape could be anyone from groups (NXoff_geometry) and (NXcylindrical_geometry) which are depens on the geometry of pixel.

Special Keywords in YAML

To explain the context of NeXus, certain keywords hold significance beyond their literal interpretations. These special keywords are utilized to elucidate and denote various NeXus terms e.g. attributes, fields, links, and groups, thereby enhancing the clarity and specificity of the data representation.

Keyword exists

The exists keyword plays a pivotal role in delineating the optionality of NeXus concepts attribute, field, choice link, and group, during the implementation of NeXus definitions on NeXus files. It provides crucial insights into the expected presence or absence of these concepts within the NeXus data structure. By defalut all the concepts of a base class are optional, and required application definition.

Presently, the accepted values for the exists keyword encompass:

optional: Denotes that the NeXus concept is not mandatory and may be absent. recommended: Suggests that the NeXus concept is advisable but not mandatory. required: Indicates that the NeXus concet must be present within the structure. [min, <number>, max, <number> or infty]: Represents an array type value that signifies the multiplicity of the NeXus concepts. For instance, concept belonging exists: [min, 3, max, infty] implies that the concept must come a minimum of three instances and may extend to any number.

exists in YAML

transmission_correction(NXcalibration):
  exists: optional
  doc: |
    This calibration procedure is used to account for the different tranmsission efficiencies.

In the above example the greoup transmission_correction is a optional group.

Keyword unit

A statement introducing NeXus-compliant NXDL attribute units attributes to the field e.g. NX_VOLTAGE to assign predefied pyhsical unit.

unit in YAML

detector_voltage(NX_FLOAT):
  unit: NX_VOLTAGE
  doc: |
    Voltage applied to detector.

Keyword dimensions

The dimensions term describes the multidimensional nature of the data, specifying its rank, dimensional indices, and corresponding length of the rank. For example, the attribute rank defines the dimension of the data set. To elucidate each dimension we use other two keywords dim and dim_parameters. The dim keyword comprises an array of arrays, the nested array encapsulates values for index and value (NeXus keywords) pairs. Each array within the dim array corresponds to a specific dimension of the multidimensional data. For example, for 2D particle motion, the dim array may be represented as [[0, nx], [1, ny]], indicating the axes index and its length. Another keyword dim_parameters contains further information of each dimension such as doc, ref, etc. It is important to note that each term or keyword within dim_parameters must have the same length as the value of the rank keyword.

dimensions in YAML

# 2D particle motion
dimensions:
   rank: 2
   dim: [[0, nx], [1, ny]]
   dim_parameters:
      doc: ["Position of particle on x-axis.","Position of particle on y-axis."]

The dimensions can also be written in shorter form Dimensions in YAML (shorter form)

# 2D particle motion
dimensions:
   rank: 2
   dim: (nx, ny)

Keyword enumeration

Python list of strings which are considered as recommended items for the fields or attributes.

Enumeration in YAML

definition:
  \@version:
  enumeration: [NXmpes]

In the example the valid value for NeXus field definition is NXmpes.

Keyword xref

The xref keyword used inside the doc to refer any other ontology or any other standard such ISO. The xref in the example doc will reflect the information inside the xml doc.

xref in YAML

(NXinstrument):
  doc:
  - |
    Description of the MPES spectrometer and its individual parts.
  - |
    xref:
      spec: ISO 18115-1:2023
      term: 12.58
      url: https://www.iso.org/obp/ui/en/#iso:std:iso:18115:-1:ed-3:v1:en:term:12.58

Conclusion

Defining a NeXus application definition or base class in YAML format is not a official structure of NeXus, but it a format to reduce the effor of the application developer to construct an application definition or base class. The nyaml is the tool that converts the application definitions or base classes from YAML format to nxdl.xml (XML type) format with any knowledge of XML style or syntax. This is a open source software funded by NFDI under FARImat progect and sitting on the github repo therefore anyone can create an issue after detecting a bug, suggestion for improvement and open to contribution. The nyaml is also published in PyPi and can be installed with pip python package manageer.

References

[@Könnecke]: J. Appl. Cryst. (2015). 48, 301-305 (https://doi.org/10.1107/S1600576714027575)