NXapm_paraprobe_results_clusterer

Status:

application definition, extends NXobject

Description:

Results of a paraprobe-clusterer tool run.

Symbols:

The symbols used in the schema to specify e.g. dimensions of arrays.

n_ions: The total number of ions in the reconstruction.

n_dict: The total number of entries in the restricted_identifier dictionary.

Groups cited:

NXcoordinate_system_set, NXcs_computer, NXcs_cpu, NXcs_filter_boolean_mask, NXcs_gpu, NXcs_io_obj, NXcs_io_sys, NXcs_mm_sys, NXcs_profiling_event, NXcs_profiling, NXentry, NXfabrication, NXprocess, NXsimilarity_grouping, NXtransformations, NXuser

Structure:

ENTRY: (required) NXentry

@version: (required) NX_CHAR

Version specifier of this application definition.

definition: (required) NX_CHAR

Official NeXus NXDL schema with which this file was written.

Obligatory value: NXapm_paraprobe_results_clusterer

program: (required) NX_CHAR

Given name of the program/software/tool with which this NeXus (configuration) file was generated.

@version: (required) NX_CHAR

Ideally program version plus build number, or commit hash or description of ever persistent resources where the source code of the program and build instructions can be found so that the program can be configured ideally in such a manner that the result of this computational process is recreatable in the same deterministic manner.

analysis_identifier: (required) NX_CHAR

Ideally, a (globally persistent) unique identifier for referring to this analysis.

analysis_description: (optional) NX_CHAR

Possibility for leaving a free-text description about this analysis.

start_time: (required) NX_DATE_TIME

ISO 8601 formatted time code with local time zone offset to UTC information included when the analysis behind this results file was started, i.e. the paraprobe-tool executable started as a process.

end_time: (required) NX_DATE_TIME

ISO 8601 formatted time code with local time zone offset to UTC information included when the analysis behind this results file were completed and the paraprobe-tool executable exited as a process.

config_filename: (required) NX_CHAR

The absolute path and name of the config file for this analysis.

@version: (required) NX_CHAR

At least SHA256 strong hash of the specific config_file for tracking provenance.

results_path: (optional) NX_CHAR

Path to the directory where the tool should store NeXus/HDF5 results of this analysis. If not specified results will be stored in the current working directory.

status: (required) NX_CHAR

A statement whether the paraprobe-tool executable managed to process the analysis or failed prematurely.

This status is written to the results file after the end_time at which point the executable must no longer compute analyses. Only when this status message is present and shows success, the user should consider the results. In all other cases, it might be that the executable has terminated prematurely or another error occurred.

Any of these values: success | failure

USER: (optional) NXuser

If used, contact information and eventually details of at least the person who performed this analysis.

name: (required) NX_CHAR

affiliation: (recommended) NX_CHAR

address: (optional) NX_CHAR

email: (recommended) NX_CHAR

orcid: (recommended) NX_CHAR

orcid_platform: (recommended) NX_CHAR

telephone_number: (optional) NX_CHAR

role: (recommended) NX_CHAR

social_media_name: (optional) NX_CHAR

social_media_platform: (optional) NX_CHAR

COORDINATE_SYSTEM_SET: (required) NXcoordinate_system_set

Details about the coordinate system conventions used. If nothing else is specified we assume that there has to be at least one set of NXtransformations named paraprobe defined, which specifies the coordinate system. In which all positions are defined.

TRANSFORMATIONS: (required) NXtransformations

The individual coordinate systems which should be used. Field names should be prefixed with the following controlled terms indicating which individual coordinate system is described:

  • paraprobe

  • lab

  • specimen

  • laser

  • leap

  • detector

  • recon

PROCESS: (optional) NXprocess

window: (required) NXcs_filter_boolean_mask

A bitmask which identifies which of the ions in the dataset were analyzed during this process.

number_of_ions: (required) NX_UINT {units=NX_UNITLESS}

Number of ions covered by the mask. The mask value for most may be 0.

bitdepth: (required) NX_UINT {units=NX_UNITLESS}

Number of bits assumed matching on a default datatype. (e.g. 8 bits for a C-style uint8).

mask: (required) NX_UINT (Rank: 1, Dimensions: [n_ions]) {units=NX_UNITLESS}

The unsigned integer array representing the content of the mask. If padding is used, padded bits are set to 0. The mask is for convenience always as large as the entire dataset as it will be stored compressed anyway. The convenience feature with this is that then the mask can be decoded with numpy and mirrored against the evaporation_id array and one immediately can filter out all points that were used by the paraprobe-toolbox executable. The length of the array adds to the next unsigned integer if the number of ions in the dataset is not an integer multiple of the bitdepth (padding).

cluster_analysis: (optional) NXprocess

The result of a cluster analyses. These include typically the label for each ion/point documenting to which feature (if any) an ion is assigned. Typically, each analysis/run yields only a single cluster. In cases of fuzzy clustering it can be possible that an ion is assigned to multiple cluster (eventually with different) weight/probability.

dbscanID: (optional) NXsimilarity_grouping

Results of a DBScan clustering analysis.

eps: (required) NX_FLOAT {units=NX_LENGTH}

The epsilon (eps) parameter.

min_pts: (required) NX_UINT {units=NX_UNITLESS}

The minimum points (min_pts) parameter.

cardinality: (required) NX_UINT {units=NX_UNITLESS}

Number of members in the set which is partitioned into features. Specifically, this is the total number of targets filtered from the dataset. Cardinality here is not the total number of ions in the dataset.

identifier_offset: (required) NX_NUMBER {units=NX_UNITLESS}

Which identifier is the first to be used to label a cluster.

The value should be chosen in such a way that special values can be resolved: * identifier_offset-1 indicates an object belongs to no cluster. * identifier_offset-2 indicates an object belongs to the noise category. Setting for instance identifier_offset to 1 recovers the commonly used case that objects of the noise category get values to -1 and unassigned points to 0. Numerical identifier have to be strictly increasing.

targets: (required) NX_UINT (Rank: 1, Dimensions: [c]) {units=NX_UNITLESS}

The evaporation sequence identifier to figure out which ions from the reconstruction were considered targets.

model_labels: (required) NX_INT (Rank: 1, Dimensions: [c]) {units=NX_UNITLESS}

The raw labels from the DBScan clustering backend process.

core_sample_indices: (required) NX_INT (Rank: 1, Dimensions: [n]) {units=NX_UNITLESS}

The raw array of core sample indices which specify which of the targets are core points.

numerical_label: (required) NX_UINT (Rank: 1, Dimensions: [c]) {units=NX_UNITLESS}

Matrix of numerical label for each member in the set. For classical clustering algorithms this can for instance encode the cluster_identifier.

weight: (optional) NX_NUMBER (Rank: 1, Dimensions: [c]) {units=NX_UNITLESS}

The array of weight which specifies how surely/likely the cluster is associated/assigned to a specific identifier as is specified in the cluster_identifier array. For the DBScan and atom probe tomography the multiplicity of each ion with respect to the cluster. That is how many times should the position of the ion be accounted for because the ion is e.g. a molecular ion with several elements or isotope of requested type.

is_noise: (optional) NX_BOOLEAN (Rank: 1, Dimensions: [c])

Optional bitmask encoding if members of the set are assigned to as noise or not.

is_core: (optional) NX_BOOLEAN (Rank: 1, Dimensions: [c])

Optional bitmask encoding if member of the set are a core point. For details to which feature/cluster an ion/point is a core point consider numerical_label.

statistics: (required) NXprocess

In addition to the detailed storage which members was grouped to which feature/group summary statistics are stored under this group.

number_of_noise: (required) NX_UINT {units=NX_UNITLESS}

Total number of members in the set which are categorized as noise.

number_of_core: (required) NX_UINT {units=NX_UNITLESS}

Total number of members in the set which are categorized as a core point.

number_of_features: (required) NX_UINT {units=NX_UNITLESS}

Total number of clusters (excluding noise and unassigned).

feature_identifier: (required) NX_UINT (Rank: 1, Dimensions: [n_features]) {units=NX_UNITLESS}

Array of numerical identifier of each feature (cluster).

feature_member_count: (required) NX_UINT (Rank: 1, Dimensions: [n_features]) {units=NX_UNITLESS}

Array of number of members for each feature.

performance: (recommended) NXcs_profiling

current_working_directory: (required) NX_CHAR

command_line_call: (optional) NX_CHAR

start_time: (recommended) NX_DATE_TIME

end_time: (recommended) NX_DATE_TIME

total_elapsed_time: (required) NX_NUMBER

number_of_processes: (required) NX_POSINT

number_of_threads: (required) NX_POSINT

number_of_gpus: (required) NX_POSINT

CS_COMPUTER: (recommended) NXcs_computer

name: (recommended) NX_CHAR

operating_system: (required) NX_CHAR

@version: (required) NX_CHAR

uuid: (optional) NX_CHAR

CS_CPU: (optional) NXcs_cpu

name: (optional) NX_CHAR

FABRICATION: (recommended) NXfabrication

identifier: (optional) NX_CHAR

capabilities: (optional) NX_CHAR

CS_GPU: (optional) NXcs_gpu

name: (optional) NX_CHAR

FABRICATION: (recommended) NXfabrication

identifier: (optional) NX_CHAR

capabilities: (optional) NX_CHAR

CS_MM_SYS: (optional) NXcs_mm_sys

total_physical_memory: (required) NX_NUMBER

CS_IO_SYS: (optional) NXcs_io_sys

CS_IO_OBJ: (required) NXcs_io_obj

technology: (required) NX_CHAR

max_physical_capacity: (required) NX_NUMBER

name: (optional) NX_CHAR

FABRICATION: (recommended) NXfabrication

identifier: (optional) NX_CHAR

capabilities: (optional) NX_CHAR

CS_PROFILING_EVENT: (required) NXcs_profiling_event

start_time: (optional) NX_DATE_TIME

end_time: (optional) NX_DATE_TIME

description: (required) NX_CHAR

elapsed_time: (required) NX_NUMBER

number_of_processes: (required) NX_POSINT

Specify if it was different from the number_of_processes in the NXcs_profiling super class.

number_of_threads: (required) NX_POSINT

Specify if it was different from the number_of_threads in the NXcs_profiling super class.

number_of_gpus: (required) NX_POSINT

Specify if it was different from the number_of_threads in the NXcs_profiling super class.

max_virtual_memory_snapshot: (recommended) NX_NUMBER

max_resident_memory_snapshot: (recommended) NX_NUMBER

Hypertext Anchors

List of hypertext anchors for all groups, fields, attributes, and links defined in this class.

NXDL Source:

https://github.com/FAIRmat-Experimental/nexus_definitions/tree/fairmat/contributed_definitions/NXapm_paraprobe_results_clusterer.nxdl.xml