Workflows¶
The built-in abstract workflow schema¶
Workflows are an important aspect of data as they explain how the data came to be. Let's first clarify that workflow refers to a workflow that already happened and that has produced input and output data that are linked through tasks that have been performed . This often is also referred to as data provenance or provenance graph.
The following shows the overall abstract schema for worklows that can be found
in nomad.datamodel.metainfo.workflow
(blue):
In this UML diagram, filled diamonds denote a “contains“ relationship with the diamond on the containing section. Open arrows denote inheritance and point to the parent section. Filled arrows denote sub-sections, named with the arrow label and defined by the section the arrow is pointed towards.
The idea is that workflows are stored in a top-level archive section along-side other sections that contain the inputs and outputs. This way the workflow or provenance graph is just additional piece of the archive that describes how the data in this (or other archives) is connected.
Example workflow¶
Consider an example workflow consisting of a geometry optimization and ground state
calculation performed by two individual DFT code runs. The code runs are stored in
NOMAD entries geom_opt.archive.yaml
and ground_state.archive.yaml
using the run
top-level section.
Here is a logical depiction of the workflow and all its tasks, inputs, and outputs.
Standardized versus custom workflows¶
Warning
Coming soon...