NXapm_paraprobe_config_clusterer¶
Status:
application definition, extends NXobject
Description:
Configuration of a paraprobe-clusterer tool run in atom probe microscopy.
Symbols:
The symbols used in the schema to specify e.g. dimensions of arrays.
n_ivecmax: Maximum number of atoms per molecular ion. Should be 32 for paraprobe.
n_clust_algos: Number of clustering algorithms used.
n_ions: Number of different iontypes to distinguish during clustering.
- Groups cited:
NXapm_input_ranging, NXapm_input_reconstruction, NXcg_cylinder_set, NXcg_ellipsoid_set, NXcg_face_list_data_structure, NXcg_hexahedron_set, NXcs_filter_boolean_mask, NXentry, NXmatch_filter, NXprocess, NXspatial_filter, NXsubsampling_filter
Structure:
ENTRY: (required) NXentry
@version: (required) NX_CHAR
Version specifier of this application definition.
definition: (required) NX_CHAR
Official NeXus NXDL schema with which this file was written.
Obligatory value:
NXapm_paraprobe_config_clusterer
program: (required) NX_CHAR
Given name of the program/software/tool with which this NeXus (configuration) file was generated.
@version: (required) NX_CHAR
Ideally program version plus build number, or commit hash or description of ever persistent resources where the source code of the program and build instructions can be found so that the program can be configured ideally in such a manner that the result of this computational process is recreatable in the same deterministic manner.
time_stamp: (required) NX_DATE_TIME
ISO 8601 formatted time code with local time zone offset to UTC information included when this configuration file was created.
analysis_identifier: (optional) NX_CHAR
Ideally, a (globally persistent) unique identifier for referring to this analysis.
analysis_description: (optional) NX_CHAR
Possibility for leaving a free-text description about this analysis.
number_of_processes: (required) NX_UINT {units=NX_UNITLESS}
How many tasks to perform?
cameca_to_nexus: (optional) NXprocess
This process maps results from cluster analyses performed with IVAS/APSuite into an interoperable representation. Specifically in this process paraprobe-clusterer takes results from clustering methods from other tools of the APM community, like IVAS/APSuite. These results are usually reported in two ways. Either as an explicit list of reconstructed ion positions. In the case of IVAS these positions are reported through a text file with a cluster label for each position.
Alternatively, the list of positions is reported, as it is the case for AMETEK (IVAS/AP Suite) but the cluster labels are specified implicitly only in the following way: The mass-to-charge-state ratio column of a what is effectively a file formatted like POS is used to assign a hypothetical mass-to-charge value which resolves a floating point representation of the cluster ID.
Another case can occur where all disjoint floating point values, i.e. here cluster labels, are reported and then a dictionary is created how each value matches to a cluster ID.
In general the cluster ID zero is reserved for marking the dataset as to not be assigned to any cluster. Therefore, indices of disjoint clusters start at 1.
recover_evaporation_id: (required) NX_BOOLEAN
Specifies if the tool should try to recover for each position the closest matching position from dataset/dataset_name_reconstruction (within floating point accuracy). This can be useful for instance when users wish to recover the original evaporation ID, which IVAS/AP Suite drops for instance when writing their .indexed. cluster results POS files.
dataset: (required) NXapm_input_reconstruction
filename: (required) NX_CHAR
@version: (required) NX_CHAR
dataset_name_reconstruction: (required) NX_CHAR
dataset_name_mass_to_charge: (required) NX_CHAR
AMETEK/Cameca results of cluster analyses, like with the maximum- separation (MS) method clustering algorithm J. Hyde et al. are stored as an improper POS file: This is a matrix of floating point quadruplets, one for each ion and as many quadruplets as ions were investigated. The first three values encode the position of the ion. The fourth value is an improper mass-to-charge-state-ratio value which encodes the integer identifier of the cluster as a floating point number.
cluster_analysis: (optional) NXprocess
This process performs a cluster analysis on a reconstructed dataset or a portion of the reconstruction.
ion_type_filter: (required) NX_CHAR
How should iontypes be interpreted/considered during the cluster analysis. Different options exist how iontypes are interpreted (if considered at all) given an iontype represents in general a (molecular) ion with different isotopes that have individually different multiplicity.
The value resolve_all will set an ion active in the analysis regardless of which iontype it is. The value resolve_unknown will set an ion active when it is of the UNKNOWNTYPE. The value resolve_ion will set an ion active if it is of the specific iontype, irregardless of its elemental or isotopic details. The value resolve_element will set an ion active, and most importantly, account as many times for it, as the (molecular) ion contains atoms of elements in the whitelist ion_query_isotope_vector. The value resolve_isotope will set an ion active, and most importantly, account as many times for it, as the (molecular) ion contains isotopes in the whitelist ion_query_isotope_vector.
In effect, ion_query_isotope_vector acts as a whitelist to filter which ions are considered as source ions of the correlation statistics and how the multiplicity of each ion will be factorized.
This is relevant as in atom probe we have the situation that a ion of a molecular ion with more than one nuclid, say Ti O for example is counted such that although there is a single TiO molecular ion at a position that the cluster has two members. This multiplicity affects the size of the feature and chemical composition.
Obligatory value:
resolve_element
ion_query_isotope_vector: (required) NX_UINT (Rank: 2, Dimensions: [n_ions, n_ivecmax]) {units=NX_UNITLESS}
Matrix of isotope vectors, as many as rows as different candidates for iontypes should be distinguished as possible source iontypes. In the simplest case, the matrix contains only the proton number of the element in the row, all other values set to zero. Combined with ion_query_type_source set to resolve_element this will recover usual spatial correlation statistics like the 1NN C-C spatial statistics.
dataset: (required) NXapm_input_reconstruction
iontypes: (required) NXapm_input_ranging
ion_to_edge_distances: (optional) NXprocess
The tool enables to inject precomputed distance information for each point/ion which can be used for further post-processing and analysis.
filename: (required) NX_CHAR
Name of an HDF5 file which contains the ion distances.
@version: (required) NX_CHAR
Version identifier of the file such as a secure hash which documents the binary state of the file to add an additional layer of reproducibility from which file specifically contains these data.
dataset_name: (required) NX_CHAR
Absolute HDF5 path to the dataset with distance values for each ion.
spatial_filter: (optional) NXspatial_filter
windowing_method: (required) NX_CHAR
Obligatory value:
entire_dataset
CG_ELLIPSOID_SET: (optional) NXcg_ellipsoid_set
CG_CYLINDER_SET: (optional) NXcg_cylinder_set
CG_HEXAHEDRON_SET: (optional) NXcg_hexahedron_set
dimensionality: (required) NX_POSINT
cardinality: (required) NX_POSINT
identifier_offset: (required) NX_INT
hexahedra: (required) NXcg_face_list_data_structure
CS_FILTER_BOOLEAN_MASK: (optional) NXcs_filter_boolean_mask
evaporation_id_filter: (optional) NXsubsampling_filter
iontype_filter: (optional) NXmatch_filter
hit_multiplicity_filter: (optional) NXmatch_filter
dbscan: (required) NXprocess
Settings for DBScan clustering algorithm. For original details about the algorithms and (performance-relevant) details consider:
For details about how the DBScan algorithms is the key behind the specific modification known as the maximum-separation method in the atom probe community consider E. Jägle et al.
high_throughput_method: (required) NX_CHAR
Strategy how runs are performed with different parameter:
For tuple as many runs are performed as parameter values.
For combinatorics individual parameter arrays are looped over.
As an example we may define eps with ten entries and min_pts with three entries. If high_throughput_method is tuple the analysis is invalid as we have an insufficient number of min_pts for the ten eps values. By contrast, for combinatorics paraprobe-clusterer will run three individual min_pts runs for each eps value, resulting in a total of 30 analyses. As an example the DBScan analysis reported in M. Kühbach et al. would have defined an array of values np.linspace(0.2, 5.0, nums=241, endpoint=True) eps values, min_pts one, and high_throughput_method set to combinatorics.
Any of these values:
tuple
|combinatorics
eps: (required) NX_FLOAT (Rank: 1, Dimensions: [i]) {units=NX_LENGTH}
Array of epsilon (eps) parameter values.
min_pts: (required) NX_UINT (Rank: 1, Dimensions: [j]) {units=NX_UNITLESS}
Array of minimum points (min_pts) parameter values.
optics: (required) NXprocess
Settings for the OPTICS clustering algorithm.
high_throughput_method: (required) NX_CHAR
Strategy how runs are performed with different parameter:
For tuple as many runs are performed as parameter values.
For combinatorics individual parameter arrays are looped over.
See the explanation for the corresponding parameter for dbscan processes above-mentioned for further details.
Any of these values:
tuple
|combinatorics
min_pts: (required) NX_UINT (Rank: 1, Dimensions: [i]) {units=NX_UNITLESS}
Array of minimum points (min_pts) parameter values.
max_eps: (required) NX_FLOAT (Rank: 1, Dimensions: [j]) {units=NX_LENGTH}
Array of maximum epsilon (eps) parameter values.
hdbscan: (required) NXprocess
Settings for the HPDBScan clustering algorithm.
McInnes et al. <https://dx.doi.org/10.21105/joss.00205>`_
scikit-learn hdbscan library https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html
See also this documentation for details about the parameter. Here we use the terminology of the hdbscan documentation.
high_throughput_method: (required) NX_CHAR
Strategy how runs are performed with different parameter:
For tuple as many runs are performed as parameter values.
For combinatorics individual parameter arrays are looped over.
See the explanation for the corresponding parameter for dbscan processes above-mentioned for further details.
Any of these values:
tuple
|combinatorics
min_cluster_size: (required) NX_NUMBER (Rank: 1, Dimensions: [i]) {units=NX_ANY}
Array of min_cluster_size parameter values.
min_samples: (required) NX_NUMBER (Rank: 1, Dimensions: [j]) {units=NX_ANY}
Array of min_samples parameter values.
cluster_selection_epsilon: (required) NX_NUMBER (Rank: 1, Dimensions: [k]) {units=NX_ANY}
Array of cluster_selection parameter values.
alpha: (required) NX_NUMBER (Rank: 1, Dimensions: [m]) {units=NX_ANY}
Array of alpha parameter values.
Hypertext Anchors¶
List of hypertext anchors for all groups, fields, attributes, and links defined in this class.
/NXapm_paraprobe_config_clusterer/ENTRY/analysis_description-field
/NXapm_paraprobe_config_clusterer/ENTRY/analysis_identifier-field
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus-group
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus/dataset-group
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus/dataset/dataset_name_mass_to_charge-field
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus/dataset/dataset_name_reconstruction-field
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus/dataset/filename-field
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus/dataset/filename@version-attribute
/NXapm_paraprobe_config_clusterer/ENTRY/cameca_to_nexus/recover_evaporation_id-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dataset-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dataset/dataset_name_mass_to_charge-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dataset/dataset_name_reconstruction-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dataset/filename-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dataset/filename@version-attribute
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dbscan-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dbscan/eps-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dbscan/high_throughput_method-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/dbscan/min_pts-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/evaporation_id_filter-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hdbscan-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hdbscan/alpha-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hdbscan/cluster_selection_epsilon-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hdbscan/high_throughput_method-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hdbscan/min_cluster_size-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hdbscan/min_samples-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/hit_multiplicity_filter-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/ion_query_isotope_vector-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/ion_to_edge_distances-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/ion_to_edge_distances/dataset_name-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/ion_to_edge_distances/filename-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/ion_type_filter-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/iontype_filter-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/iontypes-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/iontypes/filename-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/iontypes/filename@version-attribute
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/iontypes/group_name_iontypes-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/optics-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/optics/high_throughput_method-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/optics/max_eps-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/optics/min_pts-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CG_CYLINDER_SET-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CG_CYLINDER_SET/center-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CG_CYLINDER_SET/height-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CG_CYLINDER_SET/radii-field
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CG_ELLIPSOID_SET-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CG_HEXAHEDRON_SET-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/CS_FILTER_BOOLEAN_MASK-group
/NXapm_paraprobe_config_clusterer/ENTRY/cluster_analysis/spatial_filter/windowing_method-field
/NXapm_paraprobe_config_clusterer/ENTRY/number_of_processes-field
/NXapm_paraprobe_config_clusterer/ENTRY/program@version-attribute