2.3.3.3.192. NXsimilarity_grouping

Status:

base class, extends NXobject

Description:

Base class to store results obtained from applying a similarity grouping (cluste ...

Base class to store results obtained from applying a similarity grouping (clustering) algorithm.

Similarity grouping algorithms are segmentation or machine learning algorithms for partitioning the members of a set of objects (e.g. geometric primitives) into (sub-)groups aka features of different kind/type. A plethora of algorithms exists.

This base class considers metadata and results of having a similarity grouping algorithm applied to a set in which objects are either categorized as noise or belonging to a cluster, i.e. members of a cluster. The algorithm assigns each similarity group (feature/cluster) at least one identifier (numerical or categorical labels) to distinguish different cluster.

Symbols:

The symbols used in the schema to specify e.g. dimensions of arrays.

c: Cardinality of the set.

n_lbl_num: Number of numerical labels per object.

n_lbl_cat: Number of categorical labels per object.

n_features: Total number of similarity groups aka features/clusters.

Groups cited:

NXprocess

Structure:

cardinality: (optional) NX_UINT {units=NX_UNITLESS}

Number of members in the set which gets partitioned into features.

number_of_numeric_labels: (optional) NX_UINT {units=NX_UNITLESS}

How many numerical labels does each feature have.

number_of_categorical_labels: (optional) NX_UINT {units=NX_UNITLESS}

How many categorical labels does each feature have.

identifier_offset: (optional) NX_INT {units=NX_UNITLESS}

Which numerical identifier is the first to be used to label a feature. ...

Which numerical identifier is the first to be used to label a feature.

The value should be chosen in such a way that special values can be resolved: * identifier_offset - 1 indicates that an object belongs to no cluster. * identifier_offset - 2 indicates that an object belongs to the noise category. Setting for instance identifier_offset to 1 recovers the commonly used case that objects of the noise category get values to -1 and unassigned points to 0. Numerical identifier have to be strictly increasing.

numerical_label: (optional) NX_INT (Rank: 2, Dimensions: [c, n_lbl_num]) {units=NX_UNITLESS}

Matrix of numerical label for each member in the set. ...

Matrix of numerical label for each member in the set. For classical clustering algorithms this can for instance encode the cluster_identifier.

categorical_label: (optional) NX_CHAR (Rank: 2, Dimensions: [c, n_lbl_cat])

Matrix of categorical attribute data for each member in the set.

statistics: (optional) NXprocess

In addition to the detailed storage which objects were grouped to which ...

In addition to the detailed storage which objects were grouped to which feature/group summary statistics are stored under this group.

unassigned: (optional) NX_UINT {units=NX_UNITLESS}

Total number of features categorized as unassigned.

noise: (optional) NX_UINT {units=NX_UNITLESS}

Total number of features categorized as noise.

total: (optional) NX_UINT {units=NX_UNITLESS}

Total number of features.

identifier: (optional) NX_UINT (Rank: 1, Dimensions: [n_features]) {units=NX_UNITLESS}

Array of numerical identifier of each feature.

member_count: (optional) NX_UINT (Rank: 2, Dimensions: [n_features, n_lbl_num]) {units=NX_UNITLESS}

Array of number of objects for each feature.

Hypertext Anchors

List of hypertext anchors for all groups, fields, attributes, and links defined in this class.

NXDL Source:

https://github.com/FAIRmat-NFDI/nexus_definitions/tree/fairmat/contributed_definitions/NXsimilarity_grouping.nxdl.xml