Explanation - how the processing works?
In the previous section, How-to upload data, we showed that when the data was uploaded in the Uploads page, a processing of the raw data was triggered.
Once the data is added to the upload, NOMAD interprets the files and determines which of them is a mainfile. Any other files in the upload can be viewed as auxiliary files. In the same upload, there might be multiple mainfiles and auxiliary files organized in a folder tree structure.
The mainfiles are the main output file of a calculation. The presence of a mainfile in the upload is key for NOMAD to recognize a calculation. In NOMAD, we support several computational codes for first principles calculations, molecular dynamics simulations, and lattice modeling, as well as workflow and database managers. For each code, NOMAD recognizes a single file as the mainfile. For example, the VASP mainfile is by default the vasprun.xml
, although if the vasprun.xml
is not present in the upload NOMAD searches the OUTCAR
file and assigns it as the mainfile1.
The rest of files which are not the mainfile are auxiliary files. These can have several purposes and be supported and recognized by NOMAD in the parser. For example, the band*.out
or GW_band*
files in FHI-aims are auxiliary files that allows the NOMAD FHI-aims parser to recognize band structures in DFT and GW, respectively.
You can see the full list of supported codes, mainfiles, and auxiliary files in the general NOMAD documentation under Supported parsers.
We recommend that the user keeps the folder structure and files generated by the simulation code, but without reaching the uploads limits. Please, also check our recommendations on Best Practices: preparing the data and folder structure.
Structured data with the NOMAD metainfo¶
Once the mainfile has been recognized, a new entry in NOMAD is created and a specific parser is called. The auxliary files are searched by and accessed within the parser. You can check more details in Writing a parser plugin on how to add new parsers in order for NOMAD to support new codes.
For this new entry, NOMAD generates a NOMAD archive. It will contain all the (meta)information extracted from the unstructured raw data files but in a structured, well defined, and machine readable format. This metadata provides context to the raw data, i.e., what were the input methodological parameters, on which material the calculation was performed, etc. We define the NOMAD Metainfo as all the set of sections, sub-sections, and quantities used to structure the raw data into a structured schema. Further information about the NOMAD Metainfo is available in the general NOMAD documentation page in Learn > Structured data.
NOMAD sections for computational data¶
Under the Entry
/ archive
section, there are several sections and quantities being populated by the parsers. For computational data, only the following sections are populated:
metadata
: contains general and non-code specific metadata. This is mainly information about authors, creation of the entry time, identifiers (id), etc.run
: contains the parsed and normalized raw data into the structured NOMAD schema. This is all the possible raw data which can be translated into a structured way.workflow2
: contains metadata about the specific workflow performed within the entry. This is mainly a set of well-defined workflows, e.g.,GeometryOptimization
, and their parameters.results
: contains the normalized and search indexed metadata. This is mainly relevant for searching, filtering, and visualizing data in NOMAD.
workflow
and workflow2
sections: development and refactoring
You have probably noticed the name workflow2
but also the existence of a section called workflow
under archive
. This is because
workflow
is an old version of the workflow section, while workflow2
is the new version. Sometimes, certain sections suffer a rebranding
or refactoring, in most cases to add new features or to polish them after we receive years of feedback. In this case, the workflow
section
will remain until all older entries containing such section are reprocessed to transfer this information into workflow2
.
Parsing¶
A parser is a Python module which reads the code-specific mainfile and auxiliary files and populates the run
and workflow2
sections of the archive
, along with all relevant sub-sections and quantities. We explain them more in detail in Writing a parser plugin.
Parsers are added to NOMAD as plugins and are divided in a set of Github sub-projects under the main NOMAD repository. You can find a detailed list of projects in Writing a parser plugin - Parser organization.
External contributions
We always welcome external contributions for new codes and parsers in our repositories. Furthermore, we are always happy to hear feedback and implement new features into our parsers. Please, check our Contact information to get in touch with us so we can promptly help you!
Normalizing¶
After the parsing populates the run
and workflow2
sections, an extra layer of Python modules is executed on top of the processed NOMAD metadata. This has two main purposes: 1. normalize or homogenize certain metadata parsed from different codes, and 2. populate the results
section. For example, this is the case of normalizing the density of states (DOS) to its size intensive value, independently of the code used to calculate the DOS. The set of normalizers relevant for computational data are listed in /nomad/config/models.py
and are executed in the specific order defined there. Their roles are explained more in detail in Normalizing metadata.
Search indexing (and storing)¶
The last step is to store the structured metadata and pass some of it to the search index. The metadata which is passed to the search index is defined in the results
section. These metadata can then be searched by filtering in the Entries page of NOMAD or by writing a Python script which searches using the NOMAD API, see Filtering and Querying.
Entries OVERVIEW page¶
Once the parsers and normalizers finish, the Uploads page will show if the processing of the entry was a SUCCESS
or a FAILURE
. The entry information can be browsed by clicking on the icon.
You will land on the OVERVIEW
page of the entry. On the top menu you can further select the FILES
page, the DATA
page, and the LOGS
page.
The overview page contains a summary of the parsed metadata, e.g., tabular information about the material and methodology of the calculation (in the example, a G0W0 calculation done with the code exciting for bulk Si2), and visualizations of the system and some relevant properties. We note that all metadata are read directly from results
.
LOGS page¶
In the LOGS
page, you can find information about the processing. You can read error, warning, and critical messages which can provide insight if the processing of an entry was a FAILURE
.
-
Please, check our note References > VASP POTCAR stripping. ↩