Part 3: Creating Custom Entries in NOMAD¶
🎯 What You Will Learn¶
- How to store data that is not supported by existing NOMAD parsers
- How to use the NOMAD ELN (Electronic Lab Notebook) interface
- How to annotate and reference files as part of a workflow
- How to connect entries into a workflow graph
🧰 Workflow Tasks Executed Manually or via Custom Scripts¶
Challenge: You need to include the simulation setup procedure with sufficient details for reproducibility, but the setup steps were performed either manually or with custom scripts, such that they will not be automatically recognized by NOMAD.
Your Solution: Use NOMAD’s ELN and custom schema functionalities!
Attention
There are some steps in this section that can be performed programmatically or "manually" in the GUI. The programmable route is intended to be performed in a single jupyter session (pre-filled example notebook provided below).
Simulation setup steps¶
Imagine that to set up the MD simulations uploaded in part 1, you had to create structure and topology files. You ran 2 python scripts for this, workflow_script_1.py
and workflow_script_2.py
. The first script creates the simulation box (stored in box.gro
) and inserts the water molecules (stored in water.gro
). The second script creates the force field file (water.top
).
You can download these 5 files and save them for later:
Download Simulation Setup Input/Output Files
The entire setup workflow can be represented as:
This is the exact workflow graph that we aim to generate in NOMAD within this part of the tutorial.
Electronic lab notebook (ELN) entries in NOMAD¶
Let's explore the basic functionalities of NOMAD ELNs. You can create a basic ELN entry from Your uploads
page by clicking CREATE A NEW UPLOAD > CREATE FROM SCHEMA
and selecting Basic ELN
under the Built-in Schema
drop-down menu, as demonstrated in this video:
Upon entry creation, you will be taken to the Data
tab, where you can fill in or edit the predefined ELN quantities in the user-editable ELN interface. Type a dummy description for this entry and then press the icon in the upper right. Now, navigate to the Overview
page to see your changes there.
The editable quantities that you found in your ELN entry (e.g., short name, tags, ID, description) appear because they are defined within the Basic ELN
schema that you selected in NOMAD. NOMAD provides a tool for browsing all such schemas. Go to ANALYZE > The NOMAD MetaInfo
, then select nomad > Basic ELN
to view all the quantity definitions and descriptions within this entry class:
Creating an ELN entry from YAML¶
Analogous to the simulation code parsers, NOMAD has a parser for its native schema — the NOMAD MetaInfo. This parser is automatically executed for files named <file_name>.archive.yaml
. In this way, users can create ELN entries by uploading a yaml file populated according to NOMAD's schema.
For example, we can create a basic ELN entry by creating and uploading a file, e.g. basic_eln_entry.archive.yaml
, with the contents:
data:
m_def: "nomad.datamodel.metainfo.eln.ElnBaseSection"
name: "ELN entry from YAML"
description: "A test ELN entry..."
The data
section is created and defined as type ElnBaseSection
, meaning that we can populate all the quantities (e.g., name and description) living in this section (as seen in the MetaInfo browser above).
Uploading this yaml to the test deployment results in an entry with the overview page:
The ELNFileManager
¶
ELNFileManager
is a built-in schema for referencing and annotating files within an ELN entry. You can create an ELNFileManager
either from the GUI or via the YAML approach, in the same ways as described above.
We can now use these definitions to create an entry file for the step of creating the force field, and link to the output force field file from this step in the workflow:
create_force_field.archive.yaml
data:
m_def: 'nomad.datamodel.metainfo.eln.ElnFileManager'
name: 'Create force field'
description: 'The force field is defined for input to the MD simulation engine.'
Files:
- file: 'Custom_ELN_Entries/water.top'
description: 'The force field file for simulation input.'
Uploading to NOMAD should result in the following entry display:
You can now create analogous files create_box.archive.yaml
, insert_water.archive.yaml
, workflow_parameters.archive.yaml
, workflow_scripts.archive.yaml
:
create_box.archive.yaml
insert_water.archive.yaml
workflow_parameters.archive.yaml
workflow_scripts.archive.yaml
data:
m_def: 'nomad.datamodel.metainfo.eln.ElnFileManager'
name: 'Workflow Scripts'
description: 'All the scripts run during setup of the MD simulation.'
Files:
- file: 'Custom_ELN_Entries/workflow_script_1.py'
description: 'Creates the simulation box and inserts water molecules.'
- file: 'Custom_ELN_Entries/workflow_script_2.py'
description: 'Creates the appropriate force field files for the simulation engine.'
Customizing the schema¶
For standardization and search capabilities, it is best practice to use existing classes in the MetaInfo. However, NOMAD also allows users to customize the schema to their own specific needs. For example, you can customize one of the existing ELN schemas by adding specific sections and quantities relevant for your use case.
Customization of schemas is beyond the scope of the present tutorial, however, the tip box below will direct you to further resources if you are interested in learning more.
Customizing the schema
Schema customization can be achieved via 2 approaches:
-
By including a
YAML
-based schema alongside yourYAML
entry file: This is a quick and dirty way to customize your NOMAD entries, and only provides partial support in terms of, e.g., search capabilities. See NOMAD Docs > How to write a YAML schema package. -
By creating a schema plugin: This more robust and powerful approach integrates your new schema into NOMAD via python class definitions See NOMAD Docs > How to get started with plugins. The following tools are available to assist in plugin creation and development: Plugin Template; NOMAD Distro Template.
Creating a custom workflow in NOMAD¶
NOMAD allows users to connect entries into a workflow, i.e., a directed graph structure. This is achieved using the same parsing functionality as demonstrated with the custom schemas above. In this case, we simply populate the workflow2
section instead of the data
section. When uploaded to NOMAD, a new workflow entry will be created, with references to each of the workflow tasks, and also an interactive workflow graph for easy navigation of the entire workflow.
Let's construct this workflow yaml piece by piece, starting with the section definition and global inputs/outputs:
"workflow2":
"name": "MD Setup workflow"
"inputs":
- "name": "workflow parameters"
"section": "<path_to_mainfile>/workflow_parameters.archive.yaml#/data"
- "name": "workflow scripts"
"section": "<path_to_mainfile>/workflow_scripts.archive.yaml#/data/Files"
"outputs":
- "name": "structure file"
"section": "<path_to_mainfile>/insert_water.archive.yaml#/data/Files/0/file"
- "name": "force field file"
"section": "<path_to_mainfile>/create_force_field.archive.yaml#/data/Files/0/file"
This example denotes full path to each yaml file with placeholders like <path_to_mainfile> = ../upload/archive/mainfile/Custom_ELN_Entries/
. As we already saw above, the ../upload/
syntax is used to access files that were uploaded together. The archive/mainfile
directory can be used to access all the mainfiles (i.e., files automatically recognized by NOMAD). Custom_ELN_Entries/
is the user-defined folder in which the upload is contained.
This workflow takes as input the entire "workflow parameters" entry and a list of workflow scripts, and outputs the structure and force field files.
We now need to define each task, which contains its own inputs and outputs, e.g., the task that creates the force field file:
"workflow2":
... ### I/Os
"tasks":
... ### Other tasks
- "m_def": "nomad.datamodel.metainfo.workflow.TaskReference"
"name": "create force field"
"task": "<path_to_mainfile>/create_force_field.archive.yaml#/data"
"inputs":
- "name": "workflow parameters"
"section": "<path_to_mainfile>/workflow_parameters.archive.yaml#/data"
- "name": "workflow script 2"
"section": "<path_to_mainfile>/workflow_scripts.archive.yaml#/data/Files/1/file"
"outputs":
- "name": "force field file"
"section": "<path_to_mainfile>/create_force_field.archive.yaml#/data/Files/0/file"
This task is linked to the entry defined in create_force_field.archive.yaml
. It takes as input: 1. the entire workflow parameters entry, defined in workflow_parameters.archive.yaml
, and 2. The second file stored in the files list within the workflow scripts entry, defined by workflow_scripts.archive.yaml
. The output of this task is the force field file, which is the first file stored in the file list of the create for field entry.
You can now add the "create box" and "insert water" tasks to create the final workflow file:
setup_workflow.archive.yaml
"workflow2":
"name": "MD Setup workflow"
"inputs":
- "name": "workflow parameters"
"section": "<path_to_mainfile>/workflow_parameters.archive.yaml#/data"
- "name": "workflow scripts"
"section": "<path_to_mainfile>/workflow_scripts.archive.yaml#/data/Files"
"outputs":
- "name": "structure file"
"section": "<path_to_mainfile>/insert_water.archive.yaml#/data/Files/0/file"
- "name": "force field file"
"section": "<path_to_mainfile>/create_force_field.archive.yaml#/data/Files/0/file"
"tasks":
- "m_def": "nomad.datamodel.metainfo.workflow.TaskReference"
"name": "create box"
"task": "<path_to_mainfile>/create_box.archive.yaml#/data"
"inputs":
- "name": "workflow parameters"
"section": "<path_to_mainfile>/workflow_parameters.archive.yaml#/data"
- "name": "workflow script 1"
"section": "<path_to_mainfile>/workflow_scripts.archive.yaml#/data/Files/0/file"
"outputs":
- "name": "initial box"
"section": "<path_to_mainfile>/create_box.archive.yaml#/data/Files/0/file"
- "m_def": "nomad.datamodel.metainfo.workflow.TaskReference"
"name": "insert water"
"task": "<path_to_mainfile>/insert_water.archive.yaml#/data"
"inputs":
- "name": "initial box"
"section": "<path_to_mainfile>/create_box.archive.yaml#/data/Files/0/file"
- "name": "workflow script 1"
"section": "<path_to_mainfile>/workflow_scripts.archive.yaml#/data/Files/0/file"
"outputs":
- "name": "structure file"
"section": "<path_to_mainfile>/insert_water.archive.yaml#/data/Files/0/file"
- "m_def": "nomad.datamodel.metainfo.workflow.TaskReference"
"name": "create force field"
"task": "<path_to_mainfile>/create_force_field.archive.yaml#/data"
"inputs":
- "name": "workflow parameters"
"section": "<path_to_mainfile>/workflow_parameters.archive.yaml#/data"
- "name": "workflow script 2"
"section": "<path_to_mainfile>/workflow_scripts.archive.yaml#/data/Files/1/file"
"outputs":
- "name": "force field file"
"section": "<path_to_mainfile>/create_force_field.archive.yaml#/data/Files/0/file"
Create a new folder called Custom_ELN_Entries
and place in it all of the completed files.
Don't forget to:
- replace
<path_to_mainfile>
with../upload/archive/mainfile/Custom_ELN_Entries/
in the last created filesetup_workflow.archive.yaml
- include the 5 files previously downloaded (
workflow_script_1.py
,workflow_script_2.py
,box.gro
,water.gro
,water.top
).
Alternatively, you can download the complete folder here:
Download Custom_ELN_Entries folder
Uploading and publishing¶
We now need to upload these files, edit the metadata, and publish the upload. You have the choice to use either the GUI, as demonstrated in Part 1, or the API, as demonstrated in Part 2.
Using the GUI¶
If you use the GUI upload, you will need to find and use the Edit Metadata
button on the uploads page in order to add the dataset_id
manually to link the upload with your dataset.
Once you have published the upload, continue with Saving the PIDs below.
Using the API¶
Create a new notebook Custom_ELN_Entries.ipynb
to try the steps below on your own or download the prefilled notebook:
Download Custom_ELN_Entries.ipynb
- Make the imports:
Solution
- Zip the folder for upload:
- Upload to NOMAD and check for completion of processing:
Solution
fnm = 'Custom_ELN_Entries.zip'
# define the timing parameters
max_wait_time = 60 # 60 seconds
interval = 5 # 5 seconds
# make the upload
eln_entries_upload_id = upload_files_to_nomad(filename=fnm, url='test')
# wait until the upload is processed successfully before continuing
elapsed_time = 0
while elapsed_time < max_wait_time:
nomad_upload = get_upload_by_id(eln_entries_upload_id, url='test')
# Check if the upload is complete
if nomad_upload.process_status == 'SUCCESS':
break
# Wait the specified interval before the next call
time.sleep(interval)
elapsed_time += interval
else:
raise TimeoutError(f'Maximum wait time of {max_wait_time/60.} minutes exceeded. Upload with id {eln_entries_upload_id} is not complete.')
print(eln_entries_upload_id)
- Add a title and link to your dataset:
Solution
dataset_id = '<your dataset id>'
metadata_new = {'upload_name': f'Test Upload - ELN Entries', 'datasets': dataset_id}
_ = edit_upload_metadata(eln_entries_upload_id, url='test', upload_metadata=metadata_new)
time.sleep(10)
nomad_upload = get_upload_by_id(eln_entries_upload_id, url='test')
print(nomad_upload.process_status == 'SUCCESS')
print(nomad_upload.process_running is False)
- Go to NOMAD and inspect your upload. If everything looks correct, go ahead and publish the upload:
Solution
# define the timing parameters
max_wait_time = 30 # 30 seconds
interval = 5 # 5 seconds
published_upload = publish_upload(eln_entries_upload_id, url='test')
# wait until the upload is processed successfully before continuing
elapsed_time = 0
while elapsed_time < max_wait_time:
nomad_upload = get_upload_by_id(eln_entries_upload_id, url='test')
# Check if the edit upload is complete
if nomad_upload.process_status == 'SUCCESS':
break
# Wait the specified interval before the next call
time.sleep(interval)
elapsed_time += interval
else:
raise TimeoutError(f'Maximum wait time of {max_wait_time/60.} minutes exceeded. Publish Upload with id {eln_entries_upload_id} is not complete.')
Saving the PIDs¶
For Part 4, we will need the entry ids for the setup workflow entry (setup_workflow.archive.yaml
) and the workflow parameters entry (workflow_parameters.archive.yaml
), that is the input for the md setup workflow. Find the proper entry ids using the GUI or the get_entries_of_upload()
method as in Part 2. Copy the entry_id
for each into your PIDs.json
file:
{
"upload_ids": {
"md-workflow": "<your md workflow upload id from Part 1>",
"DFT": ["<your list of dft upload ids from above>"],
"setup-workflow": "<copy the setup workflow upload id here>",
"analysis": ""
},
"entry_ids": {
"md-workflow": "<your md workflow entry id from Part 1>",
"DFT": ["<your list of dft entry ids from above>"],
"setup-workflow": "<copy the setup workflow entry id here>",
"parameters": "<copy the workflow parameters entry id here>",
"analysis": ""
},
"dataset_id": "<your dataset id>"
}