Part 4: Creating Custom Workflows to Link Multiple Uploads¶

🎯 What You Will Learn¶

How to define and create a custom workflow entry in NOMAD
How to use the nomad-utility-workflows module to generate a workflow YAML
How to connect individual uploads as linked tasks in a workflow graph
How to add custom plots to your project overview
How to obtain a DOI for your complete workflow

You’ll finalize the example project by creating a structured, interactive representation of the entire research process—from input data through all processing steps to final analysis.

🧩 Custom Plots and Hierarchical Workflows¶

Workflows are an important aspect of data as they explain how the data was generated and are essential for reproducibility. In this sense, a workflow has already happened and has produced input and output data that are linked through tasks that have been performed. This often is also referred to as data provenance or provenance graph.

Challenge:

You now need to create an overarching workflow that documents the steps executed in your project.
You also want to showcase some of the analysis plots that you created from the data generated by the workflow.

Your Solution:

Use NOMAD's plotly functionalities to create entries with custom plots on the overview page.
Use nomad-utility-workflows to generate a workflow YAML for the overarching workflow.

Plotting Entry¶

You have performed some vibrational analysis of the DFT configurations obtained from your simulations. The results are shown below. Create a file result-vibrational-analysis-DFT.csv to store the results:

electron_density,oh_stretch_frequency
e/A^3,1/cm
0.0102,3601.
0.0125,3554.
0.0140,3507.
0.0158,3473.
0.0169,3448.
0.0184,3392.
0.0195,3355.
0.0208,3321.
0.0223,3289.
0.0236,3267.
0.0248,3230.
0.0261,3194.

NOMAD has a variety of tools, including plotting functionalities, that can be utilized when defining a custom schema. Let's create an ELN entry that will plot the results of the vibrational analysis.

Attention

Understanding the details of the customized plotting schema below is beyond the scope of this tutorial, and is not necessarily the adviced route for the most robust plotting customization. The important take away is that you can in principal create you own plotting schema. In this case we have done so to create a very basic plot within an entry.

Create a file vibrational_plot_schema.archive.yaml with the following content:

"definitions":
  "name": This is a parser for vibrational analysis data in the .csv format
  "sections":
    "Vibrational_Analysis":
      "base_sections":
        - nomad.datamodel.data.EntryData
        - nomad.parsing.tabular.TableData
        - nomad.datamodel.metainfo.plot.PlotSection
      "quantities":
        "data_file":
          "type": str
          "descritpion": Upload your .csv data file
          "m_annotations":
            "eln":
              "component": FileEditQuantity
            "browser":
              "adaptor": RawFileAdaptor
            "tabular_parser":
              "parsing_options":
                "comment": "#"
                "skiprows": [1]
              "mapping_options":
                - "mapping_mode": column
                  "file_mode": current_entry
                  "sections":
                    - "#root"
        "Electron_Density":
          "type": np.float64
          "shape": ["*"]
          "m_annotations":
            "tabular":
              "name": electron_density
        "OH_Stretch_Frequency":
          "type": np.float64
          "shape": ["*"]
          "m_annotations":
            "tabular":
              "name": oh_stretch_frequency
      "m_annotations":
        "plotly_graph_object":
          "data":
            "x": "#Electron_Density"
            "y": "#OH_Stretch_Frequency"
          "layout":
            "title": Vibrational Analysis

Here we will not describe in detail the plotting annotations. Rather, this serves as a simple demonstration that custom plotting is possible. In practice, there are various routes for creating custom visualizations. See NOMAD Docs > Reference > Annotations for more information.

To create an entry according to this schema, create the file vibrational_analysis.archive.yaml with the following contents:

"data":
  "m_def": "../upload/raw/vibrational_plot_schema.archive.yaml#Vibrational_Analysis"
  "data_file": "result-vibrational-analysis-DFT.csv"

Alternatively, you can download all 3 files here:

Download Vibrational Analysis Files

Now we can once again use either the GUI or the API to upload, edit metadata (add a title and link to the dataset), and publish. We will not repeat the steps here, but encourage you to try to repeat the API steps on your own. You can also download the prefilled Vibrational_Analysis.ipynb to perform these steps:

Download Vibrational Analysis Upload and Publish Notebook

After you have completed the publishing, save the entry_id to your PIDs.json, as we will need it in the final section of the tutorial below:

{
  "upload_ids": {
    "md-workflow": "<your md workflow upload id from Part 1>"
  },
  "entry_ids": {
    "md-workflow": "<your md workflow entry id from Part 1>",
    "DFT": ["<your list of dft entry ids from above>"],
    "setup-workflow": "<your setup workflow entry id from Part 3>",
    "parameters": "<your workflow parameters entry id from Part 3>",
    "analysis": "<copy the vibrational analysis entry id here>"
  },
  "dataset_id": "<your dataset id>"
}

{
  "upload_ids": {
    "md-workflow": "<your md workflow upload id from Part 1>",
    "DFT": ["<your list of dft upload ids from above>"],
    "setup-workflow": "<your setup workflow upload id here>",
    "analysis": "<copy the vibrational analysis upload id here>"
     },
  "entry_ids": {
    "md-workflow": "<your md workflow entry id from Part 1>",
    "DFT": ["<your list of dft entry ids from above>"],
    "setup-workflow": "<your setup workflow entry id here>",
    "parameters": "<your workflow parameters entry id here>",
    "analysis": "<copy the vibrational analysis entry id here>"
  },
  "dataset_id": "<your dataset id>"
}

Creating the overarching project workflow¶

Now that all the individual tasks and sub-workflows for the project are stored in the NOMAD repository, we need to create an overarching workflow to connect these components. The final workflow graph should look as follows:

As in Part 3, we could create the necessary archive.yaml manually. However, there are many cases where this can be quite tedious and require detailed knowledge of the NOMAD schema. For that reason nomad-utility-workflows includes some tools for automating the generation of this file. In the following, we will demonstrate the basic functionalities of these tools. For more details see nomad-utility-workflow Docs.

Creating a graph representation of the project workflow¶

Create a new notebook Generate_Workflow_Yaml.ipynb to work step by step or download the prefilled notebook:

Download Generate_Workflow_Yaml.ipynb

Make the imports:

import json
import gravis as gv
import networkx as nx

from nomad_utility_workflows.utils.workflows import (
    NodeAttributes,
    NodeAttributesUniverse,
    NomadWorkflow
    nodes_to_graph,
)

Load the saved PIDs:

with open('PIDs.json') as f:
    pids_dict = json.load(f)
entry_ids = pids_dict.get('entry_ids')
upload_ids = pids_dict.get('upload_ids')

Create a dictionary with inputs, outputs, and tasks as follows:

node_attributes = {
    0: NodeAttributes(
        name='Workflow Parameters',
        type='input',
        path_info={
            'entry_id': entry_ids.get('parameters'),
            'upload_id': upload_ids.get('setup-workflow'),
            'mainfile_path': 'workflow_parameters.archive.yaml',
            'archive_path': 'data',
        },
    ),
    1: NodeAttributes(
        name='MD Setup',
        type='workflow',
        path_info={
            'entry_id': entry_ids.get('setup-workflow'),
            'upload_id': upload_ids.get('setup-workflow'),
            'mainfile_path': 'setup_workflow.archive.yaml',
        },
        in_edge_nodes=[0],
    ),
    2: NodeAttributes(
        name='MD Equilibration',
        type='workflow',
        path_info={
            'entry_id': entry_ids.get('md-workflow'),
            'upload_id': upload_ids.get('md-workflow'),
            'mainfile_path': 'workflow.archive.yaml',
        },
        in_edge_nodes=[1],
    ),
    3: NodeAttributes(
        name='DFT-1',
        type='workflow',
        entry_type='simulation',
        path_info={
            'entry_id': entry_ids.get('DFT')[0],
            'upload_id': upload_ids.get('DFT')[0],
            'mainfile_path': 'aims.out'
            },
        in_edge_nodes=[2],
        out_edge_nodes=[6],
    ),
    4: NodeAttributes(
        name='DFT-2',
        type='task',
        entry_type='simulation',
        path_info={
            'entry_id': entry_ids.get('DFT')[1],
            'upload_id': upload_ids.get('DFT')[1],
            'mainfile_path': 'aims.out'
            },
        in_edge_nodes=[2],
        out_edge_nodes=[6],
    ),
    5: NodeAttributes(
        name='DFT-3',
        type='task',
        entry_type='simulation',
        path_info={
            'entry_id': entry_ids.get('DFT')[2],
            'upload_id': upload_ids.get('DFT')[2],
            'mainfile_path': 'aims.out'
            },
        in_edge_nodes=[2],
        out_edge_nodes=[6],
    ),
    6: NodeAttributes(
        name='Vibrational Analysis',
        type='output',
        path_info={
            'entry_id': entry_ids.get('analysis'),
            'upload_id': upload_ids.get('analysis'),
            'mainfile_path': 'vibrational_analysis.archive.yaml',
            'archive_path': 'data',
        },
    ),
}

Validate your dictionary by creating a NodeAttributesUniverse object:

node_attributes_universe = NodeAttributesUniverse(nodes=node_attributes)

This dictionary of node attributes directly informs the creation of the workflow YAML without explicitly referencing specific sections of the NOMAD schema. Specification of the mainfile recognized by the NOMAD parsers is required. In this case, we also include the appropriate entry ids to link together the entries that we have already created. Detailed information about all possible attributes can be found in nomad-utility-workflows Docs > Explanation > Workflows

Now, to create a graph of your workflow simply run:

workflow_graph_input = nodes_to_graph(node_attributes_universe)

The result can be visualized with:

gv.d3(
    workflow_graph_input,
    node_label_data_source='name',
    edge_label_data_source='name',
    zoom_factor=1.5,
    node_hover_tooltip=True,
)

Filling the workflow graph with default connections¶

Before creating the workflow YAML, nomad-utility-workflows will fill the workflow graph generated in the previous step with certain "default" connections (i.e., inputs and outputs) based on the types of nodes in the workflow. This happens automatically upon instantiation of the NomadWorkflow class:

nomad_workflow = NomadWorkflow(
    workflow_graph=workflow_graph_input,
)

Here, we create a NomadWorkflow using the workflow graph generated in the previous step. The resulting "filled" workflow graph can be visualized with:

gv.d3(
    nomad_workflow.workflow_graph,
    node_label_data_source='name',
    edge_label_data_source='name',
    zoom_factor=1.5,
    node_hover_tooltip=True,
)

We see that our output graph looks signficantly different than the input. The additional inputs/outputs ensure the proper node connections within NOMAD's workflow visualizer. For nodes without an entry_type, these default connections work by simply adding inputs/outputs that point to the mainfile of one of the nodes connected by an edge in the graph.

Generating the workflow yaml¶

Finally, we can add some workflow metadata to the NomadWorkflow (i.e., the filename of the output yaml and the name of the workflow) and generate the workflow YAML file with the class method build_workflow_yaml():

nomad_workflow.destination_filename = './project_workflow.archive.yaml'
nomad_workflow.name = 'NOMAD Tutorial Workflows Project'
nomad_workflow.build_workflow_yaml()

The workflow name will show up on top of the workflow graph visualization on the overview page of the workflow entry that we will create.

example project_workflow.archive.yaml

'workflow2':
  'name': 'NOMAD Tutorial Workflows Project'
  'inputs':
  - 'name': 'Workflow Parameters'
    'section': '../uploads/ME2oYBdiQUW4CGcG0YsGuw/archive/ujuIHCdj7StVCxbvYP7NjMvG4v22#/data'
  'outputs':
  - 'name': 'Vibrational Analysis'
    'section': '../uploads/yyqHpBFOSxqhrNALIJswSw/archive/zv4Q_jnvhsGry4oEyj5AC2qD103H#/data'
  - 'name': 'output system from DFT-1'
    'section': '../uploads/JHsC4UFnRtCB9dQYR1BO6A/archive/d2ZJkTjL4LoxdFAVS8YT_jlZdzhm#/run/0/system/-1'
  - 'name': 'output calculation from DFT-1'
    'section': '../uploads/JHsC4UFnRtCB9dQYR1BO6A/archive/d2ZJkTjL4LoxdFAVS8YT_jlZdzhm#/run/0/calculation/-1'
  - 'name': 'output system from DFT-2'
    'section': '../uploads/exkqL5J8Q62ldMHbuE5tcQ/archive/zeiiGOoZL0dikRgYSvzuZZ3xPv9m#/run/0/system/-1'
  - 'name': 'output calculation from DFT-2'
    'section': '../uploads/exkqL5J8Q62ldMHbuE5tcQ/archive/zeiiGOoZL0dikRgYSvzuZZ3xPv9m#/run/0/calculation/-1'
  - 'name': 'output system from DFT-3'
    'section': '../uploads/0wUg7_GuSLiG3U1LcNlr2A/archive/8x9j0tEtx6LeGVEv7ByWD3k5VK1F#/run/0/system/-1'
  - 'name': 'output calculation from DFT-3'
    'section': '../uploads/0wUg7_GuSLiG3U1LcNlr2A/archive/8x9j0tEtx6LeGVEv7ByWD3k5VK1F#/run/0/calculation/-1'
  'tasks':
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'MD Setup'
    'task': '../uploads/ME2oYBdiQUW4CGcG0YsGuw/archive/8EcTSRbvb8PiYaG_TwqAgOYiE8J7#/workflow2'
    'inputs':
    - 'name': 'input data from Workflow Parameters'
      'section': '../uploads/ME2oYBdiQUW4CGcG0YsGuw/archive/ujuIHCdj7StVCxbvYP7NjMvG4v22#/data'
    'outputs':
    - 'name': 'output workflow2 from MD Setup'
      'section': '../uploads/ME2oYBdiQUW4CGcG0YsGuw/archive/8EcTSRbvb8PiYaG_TwqAgOYiE8J7#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'MD Equilibration'
    'task': '../uploads/7Ncu4YyXTTyTSfg0qpiziQ/archive/J3Vkhz2NAtSw4-KTnbHqUyPhhY3g#/workflow2'
    'inputs':
    - 'name': 'input workflow2 from MD Setup'
      'section': '../uploads/ME2oYBdiQUW4CGcG0YsGuw/archive/8EcTSRbvb8PiYaG_TwqAgOYiE8J7#/workflow2'
    'outputs':
    - 'name': 'output workflow2 from MD Equilibration'
      'section': '../uploads/7Ncu4YyXTTyTSfg0qpiziQ/archive/J3Vkhz2NAtSw4-KTnbHqUyPhhY3g#/workflow2'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'DFT-1'
    'task': '../uploads/JHsC4UFnRtCB9dQYR1BO6A/archive/d2ZJkTjL4LoxdFAVS8YT_jlZdzhm#/workflow2'
    'inputs':
    - 'name': 'input workflow2 from MD Equilibration'
      'section': '../uploads/7Ncu4YyXTTyTSfg0qpiziQ/archive/J3Vkhz2NAtSw4-KTnbHqUyPhhY3g#/workflow2'
    - 'name': 'input system from DFT-1'
      'section': '../uploads/JHsC4UFnRtCB9dQYR1BO6A/archive/d2ZJkTjL4LoxdFAVS8YT_jlZdzhm#/run/0/system/-1'
    'outputs':
    - 'name': 'output data from Vibrational Analysis'
      'section': '../uploads/yyqHpBFOSxqhrNALIJswSw/archive/zv4Q_jnvhsGry4oEyj5AC2qD103H#/data'
    - 'name': 'output system from DFT-1'
      'section': '../uploads/JHsC4UFnRtCB9dQYR1BO6A/archive/d2ZJkTjL4LoxdFAVS8YT_jlZdzhm#/run/0/system/-1'
    - 'name': 'output calculation from DFT-1'
      'section': '../uploads/JHsC4UFnRtCB9dQYR1BO6A/archive/d2ZJkTjL4LoxdFAVS8YT_jlZdzhm#/run/0/calculation/-1'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'DFT-2'
    'task': '../uploads/exkqL5J8Q62ldMHbuE5tcQ/archive/zeiiGOoZL0dikRgYSvzuZZ3xPv9m#/workflow2'
    'inputs':
    - 'name': 'input workflow2 from MD Equilibration'
      'section': '../uploads/7Ncu4YyXTTyTSfg0qpiziQ/archive/J3Vkhz2NAtSw4-KTnbHqUyPhhY3g#/workflow2'
    - 'name': 'input system from DFT-2'
      'section': '../uploads/exkqL5J8Q62ldMHbuE5tcQ/archive/zeiiGOoZL0dikRgYSvzuZZ3xPv9m#/run/0/system/-1'
    'outputs':
    - 'name': 'output data from Vibrational Analysis'
      'section': '../uploads/yyqHpBFOSxqhrNALIJswSw/archive/zv4Q_jnvhsGry4oEyj5AC2qD103H#/data'
    - 'name': 'output system from DFT-2'
      'section': '../uploads/exkqL5J8Q62ldMHbuE5tcQ/archive/zeiiGOoZL0dikRgYSvzuZZ3xPv9m#/run/0/system/-1'
    - 'name': 'output calculation from DFT-2'
      'section': '../uploads/exkqL5J8Q62ldMHbuE5tcQ/archive/zeiiGOoZL0dikRgYSvzuZZ3xPv9m#/run/0/calculation/-1'
  - 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
    'name': 'DFT-3'
    'task': '../uploads/0wUg7_GuSLiG3U1LcNlr2A/archive/8x9j0tEtx6LeGVEv7ByWD3k5VK1F#/workflow2'
    'inputs':
    - 'name': 'input workflow2 from MD Equilibration'
      'section': '../uploads/7Ncu4YyXTTyTSfg0qpiziQ/archive/J3Vkhz2NAtSw4-KTnbHqUyPhhY3g#/workflow2'
    - 'name': 'input system from DFT-3'
      'section': '../uploads/0wUg7_GuSLiG3U1LcNlr2A/archive/8x9j0tEtx6LeGVEv7ByWD3k5VK1F#/run/0/system/-1'
    'outputs':
    - 'name': 'output data from Vibrational Analysis'
      'section': '../uploads/yyqHpBFOSxqhrNALIJswSw/archive/zv4Q_jnvhsGry4oEyj5AC2qD103H#/data'
    - 'name': 'output system from DFT-3'
      'section': '../uploads/0wUg7_GuSLiG3U1LcNlr2A/archive/8x9j0tEtx6LeGVEv7ByWD3k5VK1F#/run/0/system/-1'
    - 'name': 'output calculation from DFT-3'
      'section': '../uploads/0wUg7_GuSLiG3U1LcNlr2A/archive/8x9j0tEtx6LeGVEv7ByWD3k5VK1F#/run/0/calculation/-1'

Now upload, edit the metadata, and publish project_workflow.archive.yaml following Part 3 > Uploading and Publishing. Browse the workflow graph of this entry to see how it links all of your uploads together. Your workflow graph should look like this:

Now, go to PUBLISH > Datasets and find the dataset that you created. Click the arrow to the right of this dataset and browse all the entries that it contains to make sure all of your uploads are included. Then, go back to the datasets page, and click the "assign a DOI" icon to publish your dataset.

That's it! You now have a persistant identifier to add to your publication in order to reference your data. Once your manuscript is accepted and receives a DOI, you can then cross-reference the manuscript by once again editing the metadata of each upload to include a reference to your paper.