Querying Archive¶
In the previous page, we explained how to fetch data from different data resources (mainly MongoDB
documents).
They are mostly flat (with only a few levels of nesting) and relatively easy to query and fetch.
The very idea can be extended to fetching archives that are stored on the file system.
Accessing Archives¶
An archive is the processed data of an entry, which is stored on the file system as a binary file.
Each archive thus corresponds to an entry, and the corresponding entry ID can be used as the unique identifier to access the archive.
In the graph system, the archive is linked to the corresponding entry via the special token archive
.
Thus, to access the archive of an entry with ID example_entry_id
, one can use the following query.
plain directive
The plain
directive means 'just return the data as it is'.
We will introduce other directives later.
The above query will return the contents of the target archive.
Replacing example_entry_id
with a valid entry ID will make it ready to be executed.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"m_request":{ "directive":"plain" }
}
}
}
}
a valid example
The following is a valid curl
command that fetches the archive of a random entry x36WdKPMctUOkjXMyV8oQq2zWcSx
.
Nested Fetching¶
The archive is JSON
compatible, which means it is effectively a JSON
object (with tree-like structure).
Thus, one can apply the exact same logic and 'express' the intention in the request by using a tree-like structure.
For example, if one wants to fetch the n_quantities
under metadata
in the archive, the request would look like this.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"n_quantities":{ "m_request":{ "directive":"plain" } }
}
}
}
}
}
The following is the response of the above request. It can be noted that the response and the request have the same structure, and the intended data is returned.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"n_quantities":12427
}
}
}
}
}
Advanced Customization¶
List Slicing¶
If the target data is a list, it is possible to extract a slice of the list by using the index
field in the request configuration.
The following request fetches the second (0-indexed) element of the processing_logs
list in the archive of the entry with ID x36WdKPMctUOkjXMyV8oQq2zWcSx
.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"processing_logs":{
"m_request":{ "directive":"plain", "index":[ 1 ] }
}
}
}
}
}
The exact data will be returned in the corresponding position.
Since the first element is not requested, it will be null
in the response.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"processing_logs":[
null,
{
"event":"Reading force field from tpr not yet supported for Gromacs 2024. Interactions will not be stored",
"proc":"Entry",
"process":"process_entry",
"process_worker_id":"RhqUJg02RQ-06EReb8BWZA",
"parser":"atomisticparsers:gromacs_parser_entry_point",
"step":"atomisticparsers:gromacs_parser_entry_point",
"logger":"nomad.processing",
"timestamp":"2025-05-27 09:39.20",
"level":"WARNING"
}
]
}
}
}
}
Apart from using the index
field, one can alternatively use the indexing syntax in the key.
For example, the above request can be equivalently written as follows.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"processing_logs[1]":{ "m_request":{ "directive":"plain" } }
}
}
}
}
This format allows flexible nesting.
range slicing
It is possible to assign both start and end indices to the index
field.
For example, index: [1, 3]
will return the second to the fourth elements (both inclusive).
Using the indexing key, it is equivalent to key[1:3]
.
Limiting Depth¶
Sometimes, it is only necessary to know what the archive contains, without needing to fetch all the data.
In such cases, one can limit the depth of the request by using the depth
field in the request configuration.
The following request fetches the archive of the entry x36WdKPMctUOkjXMyV8oQq2zWcSx
with a depth limit of 1.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"m_request":{
"directive":"plain",
"depth":1
}
}
}
}
}
The response will contain only the top-level fields of the archive, without any nested data.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"processing_logs":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/processing_logs",
"run":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run",
"workflow2":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/workflow2",
"metadata":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/metadata",
"results":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/results"
}
}
}
}
The values of each field will be replaced by internal reference strings to indicate that the data is available but not fetched.
There is one exception. If the value is a primitive (like a string, number, boolean, etc.), it is always returned as is. This is because generating internal reference strings for primitive values makes little sense and often has a negative impact on performance.
Limiting Container Size¶
Some archives may contain large lists or dictionaries, and not all of them may be needed.
In such cases, one can limit the size of containers by using max_list_size
and max_dict_size
fields in the request configuration.
The following request fetches the data with a maximum list size of 3.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"optimade":{
"m_request":{
"directive":"plain",
"max_list_size":3
}
}
}
}
}
}
}
The response will contain only lists with a maximum of 3 elements. Longer lists will be replaced by internal reference strings.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"optimade":{
"elements":[
"K",
"S",
"W"
],
"nelements":3,
"elements_ratios":[
0.007425742574257425,
0.0024752475247524753,
0.9900990099009901
],
"chemical_formula_descriptive":"K3SW400",
"chemical_formula_reduced":"K3SW400",
"chemical_formula_hill":"K3SW400",
"chemical_formula_anonymous":"A400B3C",
"dimension_types":[
1,
1,
1
],
"lattice_vectors":[
[
35.59059935653863,
0,
0
],
[
0,
35.59059935653863,
0
],
[
0,
0,
38.11340132386931
]
],
"cartesian_site_positions":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/metadata/optimade/cartesian_site_positions",
"nsites":404,
"species_at_sites":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/metadata/optimade/species_at_sites",
"structure_features":[
],
"species":[
{
"name":"W",
"chemical_symbols":[
"W"
],
"concentration":[
1
]
},
{
"name":"S",
"chemical_symbols":[
"S"
],
"concentration":[
1
]
},
{
"name":"K",
"chemical_symbols":[
"K"
],
"concentration":[
1
]
}
]
}
}
}
}
}
}
The max_dict_size
field works similarly for dictionaries: if the dictionary has more than the specified number of keys, it will be replaced by an internal reference string.
Filtering Unknown Keys¶
By providing either include
or exclude
fields in the request configuration, one can filter the keys of the archive.
Both fields accept a list of keys to include or exclude, respectively.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"optimade":{
"m_request":{
"directive":"plain",
"include":[ "*element*" ]
}
}
}
}
}
}
}
The corresponding response looks like this.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"optimade":{
"elements":[ "K", "S", "W" ],
"nelements":3,
"elements_ratios":[
0.007425742574257425,
0.0024752475247524753,
0.9900990099009901
]
}
}
}
}
}
}
Note that glob patterns are expected.
Thus, if the include
field is set to *elements
, it will include all keys that end with elements
.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"optimade":{
"m_request":{
"directive":"plain",
"include":[ "*elements" ]
}
}
}
}
}
}
}
The corresponding response looks like this.
Note the field elements_ratios
is not included any more.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"metadata":{
"optimade":{
"elements":[ "K", "S", "W" ],
"nelements":3
}
}
}
}
}
}
Note
Only one of the fields include
and exclude
can be used in a single request configuration.
Both fields will not be passed to deeper levels of the archive.
Resolving References¶
Archives may contain references that point to some other locations in the archive, or even to other entries.
Conceptually it is similar to the soft links in file systems.
By using a default request configuration, references will be returned as they are.
The following request fetches the third calculations_ref
under the path workflow2/results
.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"workflow2":{
"results":{
"calculations_ref[2]":{ "m_request":{ "directive":"plain" } }
}
}
}
}
}
}
The reference string #/run/0/calculation/2
is returned, the data it points to is, however, not fetched.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"workflow2":{
"results":{
"calculations_ref":[
null,
null,
"#/run/0/calculation/2"
]
}
}
}
}
}
}
various formats of references
The format of the reference string may vary, depending on whether it is a reference to the same entry or to another entry.
To resolve the reference, one shall use the resolved
directive instead of the default plain
directive.
Note the following request also limits the size of the list to 2 elements such that the response has a reasonable length to be presented here.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"workflow2":{
"results":{
"calculations_ref[2]":{
"m_request":{
"directive":"resolved",
"max_list_size":2
}
}
}
}
}
}
}
}
The very first change is that the reference string is normalized to uploads/RzWMitKESo2dQmuE6uQB-Q/entries/x36WdKPMctUOkjXMyV8oQq2zWcSx/archive/run/0/calculation/2
.
By default, the 'extra' data requested (here, the resolved data) will be added to the response under a fixed path: uploads/<upload_id>/entries/<entry_id>/archive/<path_to_data>
.
This fixed path is not affected by any other factors, even if it is a reference to the same entry.
This is a valid path that can be used to access the data in the same response.
The motivation is to produce a response that is as self-contained as possible.
The second thing to note is that the target calculation
contains a further reference method_ref
that points to the method used for the calculation.
This reference is also resolved, and the corresponding data is fetched and included in the response.
As a matter of fact, all references will be recursively resolved such that the response contains all the data that is reachable from the original reference.
{
"uploads":{
"RzWMitKESo2dQmuE6uQB-Q":{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"run":[
{
"calculation":[
null,
null,
{
"method_ref":"uploads/RzWMitKESo2dQmuE6uQB-Q/entries/x36WdKPMctUOkjXMyV8oQq2zWcSx/archive/run/0/method/0",
"volume":4.8326900482177747e-26,
"density":996.3873291015625,
"pressure":-26484371.948242188,
"pressure_tensor":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/calculation/2/pressure_tensor",
"virial_tensor":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/calculation/2/virial_tensor",
"enthalpy":-1.473205588040599e-17,
"temperature":300.7535095214844,
"step":6000,
"time":1.2e-10,
"energy":{
"total":{
"value":-1.4736888308472856e-17
},
"electrostatic":{
"value":0,
"short_range":0
},
"van_der_waals":{
"value":-1.724283145003578e-17,
"short_range":-1.724283145003578e-17
},
"kinetic":{
"value":2.5059431415629207e-18
},
"potential":{
"value":-1.724283145003578e-17
},
"pressure_volume_work":{
"value":4.832690154891482e-21
}
},
"x_gromacs_thermodynamics_contributions":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/calculation/2/x_gromacs_thermodynamics_contributions"
}
],
"method":[
{
"force_field":{
"model":[
{
"contributions":[
{
"type":"bond",
"n_interactions":5,
"n_atoms":2,
"atom_labels":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/method/0/force_field/model/0/contributions/0/atom_labels",
"atom_indices":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/method/0/force_field/model/0/contributions/0/atom_indices"
}
]
}
],
"force_calculations":{
"vdw_cutoff":1.2e-9,
"coulomb_type":"reaction_field",
"coulomb_cutoff":1.2,
"neighbor_searching":{
"neighbor_update_frequency":40,
"neighbor_update_cutoff":1.4e-9
}
}
},
"atom_parameters":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/method/0/atom_parameters"
}
]
}
]
}
}
}
}
},
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"workflow2":{
"results":{
"calculations_ref":[
null,
null,
"uploads/RzWMitKESo2dQmuE6uQB-Q/entries/x36WdKPMctUOkjXMyV8oQq2zWcSx/archive/run/0/calculation/2"
]
}
}
}
}
}
}
Controlling Reference Resolution¶
However, it is not always desired to resolve all references.
It is possible to assign a resolve_depth
field in the request configuration to control how deep the references should be resolved.
For example, the following request will resolve only one level of references.
{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"workflow2":{
"results":{
"calculations_ref[2]":{
"m_request":{
"directive":"resolved",
"max_list_size":2,
"resolve_depth":1
}
}
}
}
}
}
}
}
As can be seen in the response, the first method
is not resolved any more, since it is at the second level of references.
Every resolution/redirection is counted as one level.
{
"uploads":{
"RzWMitKESo2dQmuE6uQB-Q":{
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"run":[
{
"calculation":[
null,
null,
{
"method_ref":"#/run/0/method/0",
"volume":4.8326900482177747e-26,
"density":996.3873291015625,
"pressure":-26484371.948242188,
"pressure_tensor":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/calculation/2/pressure_tensor",
"virial_tensor":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/calculation/2/virial_tensor",
"enthalpy":-1.473205588040599e-17,
"temperature":300.7535095214844,
"step":6000,
"time":1.2e-10,
"energy":{
"total":{
"value":-1.4736888308472856e-17
},
"electrostatic":{
"value":0,
"short_range":0
},
"van_der_waals":{
"value":-1.724283145003578e-17,
"short_range":-1.724283145003578e-17
},
"kinetic":{
"value":2.5059431415629207e-18
},
"potential":{
"value":-1.724283145003578e-17
},
"pressure_volume_work":{
"value":4.832690154891482e-21
}
},
"x_gromacs_thermodynamics_contributions":"__INTERNAL__:../uploads/RzWMitKESo2dQmuE6uQB-Q/archive/x36WdKPMctUOkjXMyV8oQq2zWcSx#/run/0/calculation/2/x_gromacs_thermodynamics_contributions"
}
]
}
]
}
}
}
}
},
"entries":{
"x36WdKPMctUOkjXMyV8oQq2zWcSx":{
"archive":{
"workflow2":{
"results":{
"calculations_ref":[
null,
null,
"uploads/RzWMitKESo2dQmuE6uQB-Q/entries/x36WdKPMctUOkjXMyV8oQq2zWcSx/archive/run/0/calculation/2"
]
}
}
}
}
}
}