NXapm_paraprobe_config_clusterer

Status:

application definition, extends NXobject

Description:

Configuration of a paraprobe-clusterer tool run in atom probe microscopy.

Symbols:

The symbols used in the schema to specify e.g. dimensions of arrays.

n_ivecmax: Maximum number of atoms per molecular ion. Should be 32 for paraprobe.

n_clust_algos: Number of clustering algorithms used.

n_ions: Number of different iontypes to distinguish during clustering.

Groups cited:

NXapm_input_ranging, NXapm_input_reconstruction, NXcg_cylinder_set, NXcg_ellipsoid_set, NXcg_face_list_data_structure, NXcg_hexahedron_set, NXcs_filter_boolean_mask, NXentry, NXmatch_filter, NXprocess, NXspatial_filter, NXsubsampling_filter

Structure:

ENTRY: (required) NXentry

@version: (required) NX_CHAR

Version specifier of this application definition.

definition: (required) NX_CHAR

Official NeXus NXDL schema with which this file was written.

Obligatory value: NXapm_paraprobe_config_clusterer

program: (required) NX_CHAR

Given name of the program/software/tool with which this NeXus (configuration) file was generated.

@version: (required) NX_CHAR

Ideally program version plus build number, or commit hash or description of ever persistent resources where the source code of the program and build instructions can be found so that the program can be configured ideally in such a manner that the result of this computational process is recreatable in the same deterministic manner.

time_stamp: (required) NX_DATE_TIME

ISO 8601 formatted time code with local time zone offset to UTC information included when this configuration file was created.

analysis_identifier: (optional) NX_CHAR

Ideally, a (globally persistent) unique identifier for referring to this analysis.

analysis_description: (optional) NX_CHAR

Possibility for leaving a free-text description about this analysis.

number_of_processes: (required) NX_UINT {units=NX_UNITLESS}

How many tasks to perform?

cameca_to_nexus: (optional) NXprocess

This process maps results from cluster analyses performed with IVAS/APSuite into an interoperable representation. Specifically in this process paraprobe-clusterer takes results from clustering methods from other tools of the APM community, like IVAS/APSuite. These results are usually reported in two ways. Either as an explicit list of reconstructed ion positions. In the case of IVAS these positions are reported through a text file with a cluster label for each position.

Alternatively, the list of positions is reported, as it is the case for AMETEK (IVAS/AP Suite) but the cluster labels are specified implicitly only in the following way: The mass-to-charge-state ratio column of a what is effectively a file formatted like POS is used to assign a hypothetical mass-to-charge value which resolves a floating point representation of the cluster ID.

Another case can occur where all disjoint floating point values, i.e. here cluster labels, are reported and then a dictionary is created how each value matches to a cluster ID.

In general the cluster ID zero is reserved for marking the dataset as to not be assigned to any cluster. Therefore, indices of disjoint clusters start at 1.

recover_evaporation_id: (required) NX_BOOLEAN

Specifies if the tool should try to recover for each position the closest matching position from dataset/dataset_name_reconstruction (within floating point accuracy). This can be useful for instance when users wish to recover the original evaporation ID, which IVAS/AP Suite drops for instance when writing their .indexed. cluster results POS files.

dataset: (required) NXapm_input_reconstruction

filename: (required) NX_CHAR

@version: (required) NX_CHAR

dataset_name_reconstruction: (required) NX_CHAR

dataset_name_mass_to_charge: (required) NX_CHAR

AMETEK/Cameca results of cluster analyses, like with the maximum- separation (MS) method clustering algorithm J. Hyde et al. are stored as an improper POS file: This is a matrix of floating point quadruplets, one for each ion and as many quadruplets as ions were investigated. The first three values encode the position of the ion. The fourth value is an improper mass-to-charge-state-ratio value which encodes the integer identifier of the cluster as a floating point number.

cluster_analysis: (optional) NXprocess

This process performs a cluster analysis on a reconstructed dataset or a portion of the reconstruction.

ion_type_filter: (required) NX_CHAR

How should iontypes be interpreted/considered during the cluster analysis. Different options exist how iontypes are interpreted (if considered at all) given an iontype represents in general a (molecular) ion with different isotopes that have individually different multiplicity.

The value resolve_all will set an ion active in the analysis regardless of which iontype it is. The value resolve_unknown will set an ion active when it is of the UNKNOWNTYPE. The value resolve_ion will set an ion active if it is of the specific iontype, irregardless of its elemental or isotopic details. The value resolve_element will set an ion active, and most importantly, account as many times for it, as the (molecular) ion contains atoms of elements in the whitelist ion_query_isotope_vector. The value resolve_isotope will set an ion active, and most importantly, account as many times for it, as the (molecular) ion contains isotopes in the whitelist ion_query_isotope_vector.

In effect, ion_query_isotope_vector acts as a whitelist to filter which ions are considered as source ions of the correlation statistics and how the multiplicity of each ion will be factorized.

This is relevant as in atom probe we have the situation that a ion of a molecular ion with more than one nuclid, say Ti O for example is counted such that although there is a single TiO molecular ion at a position that the cluster has two members. This multiplicity affects the size of the feature and chemical composition.

Obligatory value: resolve_element

ion_query_isotope_vector: (required) NX_UINT (Rank: 2, Dimensions: [n_ions, n_ivecmax]) {units=NX_UNITLESS}

Matrix of isotope vectors, as many as rows as different candidates for iontypes should be distinguished as possible source iontypes. In the simplest case, the matrix contains only the proton number of the element in the row, all other values set to zero. Combined with ion_query_type_source set to resolve_element this will recover usual spatial correlation statistics like the 1NN C-C spatial statistics.

dataset: (required) NXapm_input_reconstruction

filename: (required) NX_CHAR

@version: (required) NX_CHAR

dataset_name_reconstruction: (required) NX_CHAR

dataset_name_mass_to_charge: (required) NX_CHAR

iontypes: (required) NXapm_input_ranging

filename: (required) NX_CHAR

@version: (required) NX_CHAR

group_name_iontypes: (required) NX_CHAR

ion_to_edge_distances: (optional) NXprocess

The tool enables to inject precomputed distance information for each point/ion which can be used for further post-processing and analysis.

filename: (required) NX_CHAR

Name of an HDF5 file which contains the ion distances.

@version: (required) NX_CHAR

Version identifier of the file such as a secure hash which documents the binary state of the file to add an additional layer of reproducibility from which file specifically contains these data.

dataset_name: (required) NX_CHAR

Absolute HDF5 path to the dataset with distance values for each ion.

spatial_filter: (optional) NXspatial_filter

windowing_method: (required) NX_CHAR

Obligatory value: entire_dataset

CG_ELLIPSOID_SET: (optional) NXcg_ellipsoid_set

dimensionality: (required) NX_POSINT

cardinality: (required) NX_POSINT

identifier_offset: (required) NX_INT

center: (required) NX_NUMBER

half_axes_radii: (required) NX_NUMBER

orientation: (required) NX_NUMBER

CG_CYLINDER_SET: (optional) NXcg_cylinder_set

dimensionality: (required) NX_POSINT

cardinality: (required) NX_POSINT

identifier_offset: (required) NX_INT

center: (required) NX_NUMBER

height: (required) NX_NUMBER

radii: (required) NX_NUMBER

CG_HEXAHEDRON_SET: (optional) NXcg_hexahedron_set

dimensionality: (required) NX_POSINT

cardinality: (required) NX_POSINT

identifier_offset: (required) NX_INT

hexahedra: (required) NXcg_face_list_data_structure

CS_FILTER_BOOLEAN_MASK: (optional) NXcs_filter_boolean_mask

number_of_objects: (required) NX_UINT

bitdepth: (required) NX_UINT

mask: (required) NX_UINT

identifier: (required) NX_UINT

evaporation_id_filter: (optional) NXsubsampling_filter

iontype_filter: (optional) NXmatch_filter

hit_multiplicity_filter: (optional) NXmatch_filter

dbscan: (required) NXprocess

Settings for DBScan clustering algorithm. For original details about the algorithms and (performance-relevant) details consider:

For details about how the DBScan algorithms is the key behind the specific modification known as the maximum-separation method in the atom probe community consider E. Jägle et al.

high_throughput_method: (required) NX_CHAR

Strategy how runs are performed with different parameter:

  • For tuple as many runs are performed as parameter values.

  • For combinatorics individual parameter arrays are looped over.

As an example we may define eps with ten entries and min_pts with three entries. If high_throughput_method is tuple the analysis is invalid as we have an insufficient number of min_pts for the ten eps values. By contrast, for combinatorics paraprobe-clusterer will run three individual min_pts runs for each eps value, resulting in a total of 30 analyses. As an example the DBScan analysis reported in M. Kühbach et al. would have defined an array of values np.linspace(0.2, 5.0, nums=241, endpoint=True) eps values, min_pts one, and high_throughput_method set to combinatorics.

Any of these values: tuple | combinatorics

eps: (required) NX_FLOAT (Rank: 1, Dimensions: [i]) {units=NX_LENGTH}

Array of epsilon (eps) parameter values.

min_pts: (required) NX_UINT (Rank: 1, Dimensions: [j]) {units=NX_UNITLESS}

Array of minimum points (min_pts) parameter values.

optics: (required) NXprocess

Settings for the OPTICS clustering algorithm.

high_throughput_method: (required) NX_CHAR

Strategy how runs are performed with different parameter:

  • For tuple as many runs are performed as parameter values.

  • For combinatorics individual parameter arrays are looped over.

See the explanation for the corresponding parameter for dbscan processes above-mentioned for further details.

Any of these values: tuple | combinatorics

min_pts: (required) NX_UINT (Rank: 1, Dimensions: [i]) {units=NX_UNITLESS}

Array of minimum points (min_pts) parameter values.

max_eps: (required) NX_FLOAT (Rank: 1, Dimensions: [j]) {units=NX_LENGTH}

Array of maximum epsilon (eps) parameter values.

hdbscan: (required) NXprocess

Settings for the HPDBScan clustering algorithm.

See also this documentation for details about the parameter. Here we use the terminology of the hdbscan documentation.

high_throughput_method: (required) NX_CHAR

Strategy how runs are performed with different parameter:

  • For tuple as many runs are performed as parameter values.

  • For combinatorics individual parameter arrays are looped over.

See the explanation for the corresponding parameter for dbscan processes above-mentioned for further details.

Any of these values: tuple | combinatorics

min_cluster_size: (required) NX_NUMBER (Rank: 1, Dimensions: [i]) {units=NX_ANY}

Array of min_cluster_size parameter values.

min_samples: (required) NX_NUMBER (Rank: 1, Dimensions: [j]) {units=NX_ANY}

Array of min_samples parameter values.

cluster_selection_epsilon: (required) NX_NUMBER (Rank: 1, Dimensions: [k]) {units=NX_ANY}

Array of cluster_selection parameter values.

alpha: (required) NX_NUMBER (Rank: 1, Dimensions: [m]) {units=NX_ANY}

Array of alpha parameter values.

Hypertext Anchors

List of hypertext anchors for all groups, fields, attributes, and links defined in this class.

NXDL Source:

https://github.com/FAIRmat-Experimental/nexus_definitions/tree/fairmat/contributed_definitions/NXapm_paraprobe_config_clusterer.nxdl.xml