How to work with the H5MD-NOMAD schema¶
Writing an HDF5 file according to H5MD-NOMAD with python¶
You can write to an HDF5 file via a python interface, using the h5py package. This section walks you through the creation of each section of the H5MD-NOMAD schema, using practical examples to help you get started.
Imports¶
import numpy as np
import json
import h5py
import parmed as chem
import MDAnalysis as mda
from pint import UnitRegistry
ureg = UnitRegistry()
h5py : module for reading and writing HDF5 files.
UnitRegistry
: object from the pint package that provides assistance for working with units. We suggest using this package for easiest compatibility with NOMAD. If you have nomad-lab installed, you can alternatively import ureg with from nomad.units import ureg.
MDAnalysis : a library to analyze trajectories from molecular dynamics simulations stored in various formats.
ParmEd : a tool for aiding in investigations of biomolecular systems using popular molecular simulation packages.
Example Data¶
For concreteness, we consider a fictitious set of "vanilla" molecular dynamics simulations, run with the OpenMM software. The following definitions set the dimensionality, periodicity, and the units for this simulation.
dimension = 3
periodicity = [True, True, True]
time_unit = 1.0 * ureg.picosecond
length_unit = 1.0 * ureg.angstrom
energy_unit = 1000. * ureg.joule
mass_unit = 1.0 * ureg.amu
charge_unit = 1.0 * ureg.e
temperature_unit = 1.0 * ureg.K
custom_unit = 1.0 * ureg.newton / length_unit**2
acceleration_unit = 1.0 * length_unit / time_unit**2
In this example, we will assume that the relevant simulation data is compatible with MDAnalysis, such that a universe containing the trajectory and topology information can be created.
Note
Knowledge of the MDAnalysis package is not necessary for understanding this example. The dimensions of the supplied quantities will be made clear in each case.
Create a universe by supplying a pdb structure file and corresponding dcd trajectory file (MDAnalysis supports many different file formats):
universe = mda.Universe('initial_structure.pdb', 'trajectory.dcd')
n_frames = len(universe.trajectory)
n_atoms = universe.trajectory[0].n_atoms
parmed and then import it to MDanalysis:
pdb = app.PDBFile('initial_structure.pdb')
forcefield = app.ForceField('force_field.xml')
system = forcefield.createSystem(pdb.topology)
struct = chem.openmm.load_topology(pdb.topology, system)
universe_toponly = mda.Universe(struct)
H5MD Group¶
Create an HDF5 file called test_h5md-nomad.h5 and create the group h5md under root:
Add the h5md version (1.0.x in this case) as an attribute of the h5md group:
Create the author group and add the associated metadata:
author = h5md.create_group('author')
author.attrs['name'] = 'author name'
author.attrs['email'] = 'author-name@example-domain.edu'
Create the program group and add the associated metadata:
program = h5md.create_group('program')
program.attrs['name'] = 'OpenMM'
program.attrs['version'] = '7.7.0'
Create the creator group and add the associated metadata:
program = h5md.create_group('creator')
program.attrs['name'] = h5py.__name__
program.attrs['version'] = str(h5py.__version__)
Particles Group¶
Create the particles group and the underlying all group to hold the relevant particle data:
Get the steps, times, positions, and lattice vectors (i.e., box dimensions) from the MDA universe:
# quantities extracted from MDAnalysis
steps = []
times = []
positions = []
lattice_vectors = []
for i_frame, frame in enumerate(universe.trajectory):
times.append(frame.time)
steps.append(frame.frame)
positions.append(frame.positions)
lattice_vectors.append(frame.triclinic_dimensions)
Set the positions and corresponding metadata:
position_group_all = particles_group_all.create_group('position')
position_group_all['step'] = steps # shape = (n_frames)
position_group_all['time'] = times # shape = (n_frames)
position_group_all['time'].attrs['unit'] = str(time_unit.units)
position_group_all['time'].attrs['unit_factor'] = time_unit.magnitude
position_group_all['value'] = positions # shape = (n_frames, n_atoms, dimension)
position_group_all['value'].attrs['unit'] = str(length_unit.units)
position_group_all['value'].attrs['unit_factor'] = length_unit.magnitude
Set the particle-specific metadata:
particles_group_all['species_label'] = universe_toponly.atoms.types # shape = (n_atoms)
particles_group_all['force_field_label'] = universe_toponly.atoms.names # shape = (n_atoms)
particles_group_all['mass'] = universe_toponly.atoms.masses # shape = (n_atoms)
particles_group_all['mass'].attrs['unit'] = str(mass_unit.units)
particles_group_all['mass'].attrs['unit_factor'] = mass_unit.magnitude
particles_group_all['charge'] = universe_toponly.atoms.charges # shape = (n_atoms)
particles_group_all['charge'].attrs['unit'] = str(charge_unit.units)
particles_group_all['charge'].attrs['unit_factor'] = charge_unit.magnitude
Create the box group under particles.all and write corresponding data:
box_group = particles_group_all.create_group('box')
box_group.attrs['dimension'] = dimension
box_group.attrs['boundary'] = periodicity
edges = box_group.create_group('edges')
edges['step'] = steps
edges['time'] = times
edges['time'].attrs['unit'] = str(time_unit.units)
edges['time'].attrs['unit_factor'] = time_unit.magnitude
edges['value'] = lattice_vectors
edges['value'].attrs['unit'] = str(length_unit.units)
edges['value'].attrs['unit_factor'] = length_unit.magnitude
Connectivity Group¶
Create the connectivity group under root and add the tuples of bonds, angles, and dihedrals:
connectivity = h5_write.create_group('connectivity')
connectivity['bonds'] = universe_toponly.bonds._bix # shape = (n_bonds, 2)
connectivity['angles'] = universe_toponly.angles._bix # shape = (n_angles, 3)
connectivity['dihedrals'] = universe_toponly.dihedrals._bix # shape = (n_dihedrals, 4)
connectivity['impropers'] = universe_toponly.impropers._bix # shape = (n_impropers, 4)
n_bonds, n_angles, n_dihedrals, and n_impropers represent the corresponding number of instances of each interaction within the force field.
You can read more about the creation of the hierarchical particles_group in Creating a topology.
Observables Group¶
For this section, we will consider sets of fabricated observable data for clarity. First, create the observables group under root:
There are 3 types of support observables:
Configurational Observables¶
Fabricated data:
temperatures = 300. * np.ones(n_frames)
potential_energies = 1.0 * np.ones(n_frames)
kinetic_energies = 2.0 * np.ones(n_frames)
Create a temperature group and populate the associated metadata:
temperature = observables.create_group('temperature')
temperature.attrs['type'] = types[0]
temperature['step'] = steps
temperature['time'] = times
temperature['time'].attrs['unit'] = str(time_unit.units)
temperature['time'].attrs['unit_factor'] = time_unit.magnitude
temperature['value'] = temperatures
temperature['value'].attrs['unit'] = str(temperature_unit.units)
temperature['value'].attrs['unit_factor'] = temperature_unit.magnitude
Create an energy group to hold various types of energies. Add :
energies = observables.create_group('energy')
potential_energy = energies.create_group('potential')
potential_energy.attrs['type'] = types[0]
potential_energy['step'] = steps
potential_energy['time'] = times
potential_energy['time'].attrs['unit'] = str(time_unit.units)
potential_energy['time'].attrs['unit_factor'] = time_unit.magnitude
potential_energy['value'] = potential_energies
potential_energy['value'].attrs['unit'] = str(energy_unit.units)
potential_energy['value'].attrs['unit_factor'] = energy_unit.magnitude
kinetic_energy = energies.create_group('kinetic')
kinetic_energy.attrs['type'] = types[0]
kinetic_energy['step'] = steps
kinetic_energy['time'] = times
kinetic_energy['time'].attrs['unit'] = str(time_unit.units)
kinetic_energy['time'].attrs['unit_factor'] = time_unit.magnitude
kinetic_energy['value'] = kinetic_energies
kinetic_energy['value'].attrs['unit'] = str(energy_unit.units)
kinetic_energy['value'].attrs['unit_factor'] = energy_unit.magnitude
Ensemble Average Observables¶
Fabricated data - the following represents radial distribution function (rdf) data calculated between molecule types X and Y, stored in rdf_MOLX-MOLY.xvg:
0.24 0.000152428
0.245 0.00457094
0.25 0.0573499
0.255 0.284764
0.26 0.842825
0.265 1.64705
0.27 2.37243
0.275 2.77916
0.28 2.80622
0.285 2.60082
0.29 2.27182
...
Store the rdf data in a dictionary along with some relevant metadata:
rdf_XX = np.loadtxt('rdf_MOLX-MOLX.xvg')
rdf_XY = np.loadtxt('rdf_MOLX-MOLY.xvg')
rdf_YY = np.loadtxt('rdf_MOLY-MOLY.xvg')
rdfs = {
'MOLX-MOLX': {
'n_bins': len(rdf_XX[:, 0]),
'bins': rdf_XX[:, 0],
'value': rdf_XX[:, 1],
'type': 'molecular',
'frame_start': 0,
'frame_end': n_frames-1
},
'MOLX-MOLY': {
'n_bins': len(rdf_XY[:, 0]),
'bins': rdf_XY[:, 0],
'value': rdf_XY[:, 1],
'type': 'molecular',
'frame_start': 0,
'frame_end': n_frames-1
},
'MOLY-MOLY': {
'n_bins': len(rdf_YY[:, 0]),
'bins': rdf_YY[:, 0],
'value': rdf_YY[:, 1],
'type': 'molecular',
'frame_start': 0,
'frame_end': n_frames-1
}
}
Now create the radial_distribution_functions group under observables and store each imported rdf:
radial_distribution_functions = observables.create_group('radial_distribution_functions')
for key in rdfs.keys():
rdf = radial_distribution_functions.create_group(key)
rdf.attrs['type'] = types[1]
rdf['type'] = rdfs[key]['type']
rdf['n_bins'] = rdfs[key]['n_bins']
rdf['bins'] = rdfs[key]['bins']
rdf['bins'].attrs['unit'] = str(length_unit.units)
rdf['bins'].attrs['unit_factor'] = length_unit.magnitude
rdf['value'] = rdfs[key]['value']
rdf['frame_start'] = rdfs[key]['frame_start']
rdf['frame_end'] = rdfs[key]['frame_end']
We can also store scalar ensemble average observables. Let's consider some fabricated diffusion constant data:
Ds = {
'MOLX': {
'value': 1.0,
'error_type': 'Pearson_correlation_coefficient',
'errors': 0.98
},
'MOLY': {
'value': 2.0,
'error_type': 'Pearson_correlation_coefficient',
'errors': 0.95
}
}
Create the diffusion constants group under observables and store the correspond (meta)data:
diffusion_constants = observables.create_group('diffusion_constants')
for key in Ds.keys():
diffusion_constant = diffusion_constants.create_group(key)
diffusion_constant.attrs['type'] = types[1]
diffusion_constant['value'] = Ds[key]['value']
diffusion_constant['value'].attrs['unit'] = str(diff_unit.units)
diffusion_constant['value'].attrs['unit_factor'] = diff_unit.magnitude
diffusion_constant['error_type'] = Ds[key]['error_type']
diffusion_constant['errors'] = Ds[key]['errors']
Time Correlation Observables¶
Fabricated data - the following represents mean squared displacement (msd) data calculated for molecule type X, stored in msd_MOLX.xvg:
0 0
2 0.0688769
4 0.135904
6 0.203573
8 0.271162
10 0.339284
12 0.410115
14 0.477376
16 0.545184
18 0.61283
...
Store the msd data in a dictionary along with some relevant metadata:
msd_X = np.loadtxt('msd_MOLX.xvg')
msd_Y = np.loadtxt('msd_MOLY.xvg')
msds = {
'MOLX': {
'n_times': len(msd_X[:, 0]),
'times': msd_X[:, 0],
'value': msd_X[:, 1],
'type': 'molecular',
'direction': 'xyz',
'error_type': 'standard_deviation',
'errors': np.zeros(len(msd_X[:, 0])),
},
'MOLY': {
'n_times': len(msd_Y[:, 0]),
'times': msd_Y[:, 0],
'value': msd_Y[:, 1],
'type': 'molecular',
'direction': 'xyz',
'error_type': 'standard_deviation',
'errors': np.zeros(len(msd_Y[:, 0])),
}
}
Now create the mean_squared_displacements group under observables and store each imported rdf:
mean_squared_displacements = observables.create_group('mean_squared_displacements')
msd_unit = length_unit * length_unit
diff_unit = msd_unit / time_unit
for key in msds.keys():
msd = mean_squared_displacements.create_group(key)
msd.attrs['type'] = types[2]
msd['type'] = msds[key]['type']
msd['direction'] = msds[key]['direction']
msd['error_type'] = msds[key]['error_type']
msd['n_times'] = msds[key]['n_times']
msd['times'] = msds[key]['times']
msd['times'].attrs['unit'] = str(time_unit.units)
msd['times'].attrs['unit_factor'] = time_unit.magnitude
msd['value'] = msds[key]['value']
msd['value'].attrs['unit'] = str(msd_unit.units)
msd['value'].attrs['unit_factor'] = msd_unit.magnitude
msd['errors'] = msds[key]['errors']
Parameter Group¶
Using the json templates for force calculations and molecular dynamics workflows, the (meta)data can be written to the H5MD-NOMAD file using the following code:
First, import the data extracted from the JSON templates:
with open('force_calculations_metainfo.json') as json_file:
force_calculation_parameters = json.load(json_file)
with open('workflow_metainfo.json') as json_file:
workflow_parameters = json.load(json_file)
Then, create the appropriate container groups:
parameters = h5_write.create_group('parameters')
force_calculations = parameters.create_group('force_calculations')
workflow = parameters.create_group('workflow')
Now, recursively write the (meta)data:
def get_parameters_recursive(parameter_group, parameter_dict):
# Store the parameters from parameter dict into an hdf5 file
for key, val in parameter_dict.items():
if type(val) == dict:
param = val.get('value')
if param is not None:
parameter_group[key] = param
unit = val.get('unit')
if unit is not None:
parameter_group[key].attrs['unit'] = unit
else: # This is a subsection
subsection = parameter_group.create_group(key)
subsection = get_parameters_recursive(subsection, val)
else:
if val is not None:
parameter_group[key] = val
return parameter_group
force_calculations = get_parameters_recursive(force_calculations, force_calculation_parameters)
workflow = get_parameters_recursive(workflow, workflow_parameters)
It's as simple as that! Now, we can upload our H5MD-NOMAD file directly to NOMAD and all the written (meta)data will be stored according to the standard NOMAD schema.
Accessing an H5MD-NOMAD file¶
The following functions are useful for accessing data from your H5MD-NOMAD file:
def apply_unit(quantity, unit, unit_factor):
from pint import UnitRegistry
ureg = UnitRegistry()
if quantity is None:
return
if unit:
unit = ureg(unit)
unit *= unit_factor
quantity *= unit
return quantity
def decode_hdf5_bytes(dataset):
if dataset is None:
return
elif type(dataset).__name__ == 'ndarray':
if dataset == []:
return dataset
dataset = np.array([val.decode("utf-8") for val in dataset]) if type(dataset[0]) == bytes else dataset
else:
dataset = dataset.decode("utf-8") if type(dataset) == bytes else dataset
return dataset
def hdf5_attr_getter(source, path, attribute, default=None):
'''
Extracts attribute from object based on path, and returns default if not defined.
'''
section_segments = path.split('.')
for section in section_segments:
try:
value = source.get(section)
source = value[-1] if isinstance(value, list) else value
except Exception:
return
value = source.attrs.get(attribute)
source = value[-1] if isinstance(value, list) else value
source = decode_hdf5_bytes(source) if source is not None else default
return source
def hdf5_getter(source, path, default=None):
'''
Extracts attribute from object based on path, and returns default if not defined.
'''
section_segments = path.split('.')
for section in section_segments:
try:
value = source.get(section)
unit = hdf5_attr_getter(source, section, 'unit')
unit_factor = hdf5_attr_getter(source, section, 'unit_factor', default=1.0)
source = value[-1] if isinstance(value, list) else value
except Exception:
return
if source is None:
source = default
elif type(source) == h5py.Dataset:
source = source[()]
source = apply_unit(source, unit, unit_factor)
return decode_hdf5_bytes(source)
Open your H5MD-NOMAD file with h5py:
Access a particular data set:
potential_energies = h5_read['observables']['energy']['potential']['value']
print(potential_energies[()])
Get the unit information for this quantity:
unit = potential_energies.attrs['unit']
unit_factor = potential_energies.attrs['unit_factor']
print(unit)
print(unit_factor)
results:
Alternatively, the above functions will return the dataset as python arrays, i.e., already applying [()] to the HDF5 element, and also apply the appropriate units where applicable:
potential_energies = hdf5_getter(h5_read, 'observables.energy.potential.value')
print(potential_energies)
result:
Creating a topology (particles_group)¶
This page demonstrates how to create a "standard" topology in H5MD-NOMAD. The demonstrated organization of molecules and monomers is identical to what other NOMAD parsers do to create a topology from native simulation files (e.g., outputs from GROMACS or LAMMPS). However, the user is free to deviate from this standard to create arbitrary organizations of particles, as described in Connectivity.
Standard topology structure for bonded force fields¶
topology
├── molecule_group_1
│ ├── molecule_1
│ │ ├── monomer_group_1
│ │ │ ├── monomer_1
│ │ │ │ └── metadata for monomer_1
│ │ │ ├── monomer_2
│ │ │ │ └── metadata for monomer_2
│ │ │ ├── ...
│ │ ├── monomer_group_2
│ │ └── ...
│ ├── molecule_2
│ │ ├── ...
│ └── ...
└── molecule_group_2
│ ├── molecule_1
│ │ └── metadata for molecule_1
│ ├── molecule_2
│ │ └── metadata for molecule_2
│ │ └── ...
│ └── ...
└── ...
molecule_group_1 and molecule_group_2 represent distinct molecule types. At the next level of the hierarchy, each molecule within this group is stored (i.e., molecule_1, molecule_2, etc.). In the above example, molecule_group_1 represents a polymer (or protein). Thus, below the molecule level, there is a "monomer group level". Similar to the molecule group, the monomer group organizes all monomers (of the parent molecule) that are of the same type. Thus, for molecule_1 of molecule_group_1, monomer_group_1 and monomer_group_2 represent distinct types of monomers existing within the polymer. Then, below monomer_group_1, each monomer within this group is stored. Finally, beneath these individual monomers, only the metadata for that monomer is stored (i.e., no further organization levels). Note however, that metadata can be (and is) stored at each level of the hierarchy, but is left out of the illustration for clarity. Notice also that molecule_group_2 is not a polymer. Thus, each molecule within this group stores only the corresponding metadata, and no further levels of organization.
Creating the standard hierarchy from an MDAnalysis universe¶
We start from the perspective of the Writing an HDF5 file according to H5MD-NOMAD with python section, with identical imports and assuming that an MDAnalysis universe is already instantiated from the raw simulation files. As in the previous example, the universe containing the topology information is called universe_topology.
The following functions will be useful for creating the topology:
def get_composition(children_names):
'''
Given a list of children, return a compositional formula as a function of
these children. The format is <child_1>(n_child_1)<child_2>(n_child_2)...
'''
children_count_tup = np.unique(children_names, return_counts=True)
formula = ''.join([f'{name}({count})' for name, count in zip(*children_count_tup)])
return formula
def get_molecules_from_bond_list(n_particles: int, bond_list: List[int], particle_types: List[str] = None, particles_typeid=None):
'''
Returns a dictionary with molecule info from the list of bonds
'''
import networkx
system_graph = networkx.empty_graph(n_particles)
system_graph.add_edges_from([(i[0], i[1]) for i in bond_list])
molecules = [system_graph.subgraph(c).copy() for c in networkx.connected_components(system_graph)]
mol_dict = []
for i_mol, mol in enumerate(molecules):
mol_dict.append({})
mol_dict[i_mol]['indices'] = np.array(mol.nodes())
mol_dict[i_mol]['bonds'] = np.array(mol.edges())
mol_dict[i_mol]['type'] = 'molecule'
mol_dict[i_mol]['is_molecule'] = True
if particles_typeid is None and len(particle_types) == n_particles:
mol_dict[i_mol]['names'] = [particle_types[int(x)] for x in sorted(np.array(mol_dict[i_mol]['indices']))]
if particle_types is not None and particles_typeid is not None:
mol_dict[i_mol]['names'] = [particle_types[particles_typeid[int(x)]] for x in sorted(np.array(mol_dict[i_mol]['indices']))]
mol_dict[i_mol]['formula'] = get_composition(mol_dict[i_mol]['names'])
return mol_dict
def is_same_molecule(mol_1: dict, mol_2: dict):
'''
Checks whether the 2 input molecule dictionaries represent the same
molecule type, i.e., same particle types and corresponding bond connections.
'''
if sorted(mol_1['names']) == sorted(mol_2['names']):
mol_1_shift = np.min(mol_1['indices'])
mol_2_shift = np.min(mol_2['indices'])
mol_1_bonds_shift = mol_1['bonds'] - mol_1_shift
mol_2_bonds_shift = mol_2['bonds'] - mol_2_shift
bond_list_1 = [sorted((mol_1['names'][i], mol_1['names'][j])) for i, j in mol_1_bonds_shift]
bond_list_2 = [sorted((mol_2['names'][i], mol_2['names'][j])) for i, j in mol_2_bonds_shift]
bond_list_names_1, bond_list_counts_1 = np.unique(bond_list_1, axis=0, return_counts=True)
bond_list_names_2, bond_list_counts_2 = np.unique(bond_list_2, axis=0, return_counts=True)
bond_list_dict_1 = {bond[0] + '-' + bond[1]: bond_list_counts_1[i_bond] for i_bond, bond in enumerate(bond_list_names_1)}
bond_list_dict_2 = {bond[0] + '-' + bond[1]: bond_list_counts_2[i_bond] for i_bond, bond in enumerate(bond_list_names_2)}
if bond_list_dict_1 == bond_list_dict_2:
return True
return False
return False
Then, we can create the topology structure from the MDAnalysis universe:
bond_list = universe_toponly.bonds._bix
molecules = get_molecules_from_bond_list(n_atoms, bond_list, particle_types=universe_toponly.atoms.types, particles_typeid=None)
# create the topology
mol_groups = []
mol_groups.append({})
mol_groups[0]['molecules'] = []
mol_groups[0]['molecules'].append(molecules[0])
mol_groups[0]['type'] = 'molecule_group'
mol_groups[0]['is_molecule'] = False
for mol in molecules[1:]:
flag_mol_group_exists = False
for i_mol_group in range(len(mol_groups)):
if is_same_molecule(mol, mol_groups[i_mol_group]['molecules'][0]):
mol_groups[i_mol_group]['molecules'].append(mol)
flag_mol_group_exists = True
break
if not flag_mol_group_exists:
mol_groups.append({})
mol_groups[-1]['molecules'] = []
mol_groups[-1]['molecules'].append(mol)
mol_groups[-1]['type'] = 'molecule_group'
mol_groups[-1]['is_molecule'] = False
for i_mol_group, mol_group in enumerate(mol_groups):
mol_groups[i_mol_group]['formula'] = molecule_labels[i_mol_group] + '(' + str(len(mol_group['molecules'])) + ')'
mol_groups[i_mol_group]['label'] = 'group_' + str(molecule_labels[i_mol_group])
mol_group_indices = []
for i_molecule, molecule in enumerate(mol_group['molecules']):
molecule['label'] = molecule_labels[i_mol_group]
mol_indices = molecule['indices']
mol_group_indices.append(mol_indices)
mol_resids = np.unique(universe_toponly.atoms.resindices[mol_indices])
if mol_resids.shape[0] == 1:
continue
res_dict = []
for i_resid, resid in enumerate(mol_resids):
res_dict.append({})
res_dict[i_resid]['indices'] = np.where( universe_toponly.atoms.resindices[mol_indices] == resid)[0]
res_dict[i_resid]['label'] = universe_toponly.atoms.resnames[res_dict[i_resid]['indices'][0]]
res_dict[i_resid]['formula'] = get_composition(universe_toponly.atoms.names[res_dict[i_resid]['indices']])
res_dict[i_resid]['is_molecule'] = False
res_dict[i_resid]['type'] = 'monomer'
res_groups = []
res_groups.append({})
res_groups[0]['residues'] = []
res_groups[0]['residues'].append(res_dict[0])
res_groups[0]['label'] = 'group_' + res_dict[0]['label']
res_groups[0]['type'] = 'monomer_group'
res_groups[0]['is_molecule'] = False
for res in res_dict[1:]:
flag_res_group_exists = False
for i_res_group in range(len(res_groups)):
if res['label'] == res_groups[i_res_group]['label']:
res_groups[i_res_group]['residues'].append(res)
flag_res_group_exists = True
break
if not flag_res_group_exists:
res_groups.append({})
res_groups[-1]['residues'] = []
res_groups[-1]['residues'].append(res)
res_groups[-1]['label'] = 'group_' + res['label']
res_groups[-1]['formula'] = get_composition(universe_toponly.atoms.names[res['indices']])
res_groups[-1]['type'] = 'monomer_group'
res_groups[-1]['is_molecule'] = False
molecule['formula'] = ''
for res_group in res_groups:
res_group['formula'] = res_group['residues'][0]['label'] + '(' + str(len(res_group['residues'])) + ')'
molecule['formula'] += res_group['formula']
res_group_indices = []
for res in res_group['residues']:
res_group_indices.append(res['indices'])
res_group['indices'] = np.concatenate(res_group_indices)
mol_group['indices'] = np.concatenate(mol_group_indices)
molecule['residue_groups'] = res_groups
Writing the topology to an H5MD-NOMAD file¶
Here we assume an H5MD-NOMAD file has already been created, as demonstrated on the Writing an HDF5 file according to H5MD-NOMAD with python section, and that the connectivity group was created under the root level.
Now, create the particles_group group under connectivity within our HDF5-NOMAD file:
topology_keys = ['type', 'formula', 'particles_group', 'label', 'is_molecule', 'indices']
custom_keys = ['molecules', 'residue_groups', 'residues']
topology = connectivity.create_group('particles_group')
for i_mol_group, mol_group in enumerate(mol_groups):
hdf5_mol_group = topology.create_group('group_' + molecule_labels[i_mol_group])
for mol_group_key in mol_group.keys():
if mol_group_key not in topology_keys + custom_keys:
continue
if mol_group_key != 'molecules':
hdf5_mol_group[mol_group_key] = mol_group[mol_group_key]
else:
hdf5_molecules = hdf5_mol_group.create_group('particles_group')
for i_molecule, molecule in enumerate(mol_group[mol_group_key]):
hdf5_mol = hdf5_molecules.create_group('molecule_' + str(i_molecule))
for mol_key in molecule.keys():
if mol_key not in topology_keys + custom_keys:
continue
if mol_key != 'residue_groups':
hdf5_mol[mol_key] = molecule[mol_key]
else:
hdf5_residue_groups = hdf5_mol.create_group('particles_group')
for i_res_group, res_group in enumerate(molecule[mol_key]):
hdf5_res_group = hdf5_residue_groups.create_group('residue_group_' + str(i_res_group))
for res_group_key in res_group.keys():
if res_group_key not in topology_keys + custom_keys:
continue
if res_group_key != 'residues':
hdf5_res_group[res_group_key] = res_group[res_group_key]
else:
hdf5_residues = hdf5_res_group.create_group('particles_group')
for i_res, res in enumerate(res_group[res_group_key]):
hdf5_res = hdf5_residues.create_group('residue_' + str(i_res))
for res_key in res.keys():
if res_key not in topology_keys:
continue
if res[res_key] is not None:
hdf5_res[res_key] = res[res_key]