2.3.3.3.89. NXcs_profiling¶
Status:
base class, extends NXobject
Description:
Computer science description for performance and profiling data of an applicatio ...
Computer science description for performance and profiling data of an application.
Performance monitoring and benchmarking of software is a task where questions can be asked at various levels of detail. In general, there are three main contributions to performance:
Hardware capabilities and configuration
Software configuration and capabilities
Dynamic effects of the system in operation and the system working together with eventually multiple computers, especially when these have to exchange information across a network and these are used usually by multiple users.
At the most basic level users may wish to document how long e.g. a data analysis with a scientific software (app) took. A frequent idea is here to answer practical questions like how critical is the effect on the workflow of the scientists, i.e. is the analysis possible in a few seconds or would it take days if I were to run this analysis on a comparable machine? For this more qualitative performance monitoring, mainly the order of magnitude is relevant, as well as how this was achieved using parallelization (i.e. reporting the number of CPU and GPU resources used, the number of processes and threads configured, and providing basic details about the computer).
At more advanced levels benchmarks may go as deep as detailed temporal tracking of individual processor instructions, their relation to other instructions, the state of call stacks; in short eventually the entire app execution history and hardware state history. Such analyses are mainly used for performance optimization, i.e. by software and hardware developers as well as for tracking bugs. Specialized software exists which documents such performance data in specifically-formatted event log files or databases.
This base class cannot and should not replace these specific solutions for now. Instead, the intention of the base class is to serve scientists at the basic level to enable simple monitoring of performance data and log profiling data of key algorithmic steps or parts of computational workflows, so that these pieces of information can guide users which order of magnitude differences should be expected or not.
Developers of application definitions should add additional fields and references to e.g. more detailed performance data to which they wish to link the metadata in this base class.
Symbols:
The symbols used in the schema to specify e.g. dimensions of arrays.
- Groups cited:
Structure:
current_working_directory: (optional) NX_CHAR
Path to the directory from which the tool was called.
command_line_call: (optional) NX_CHAR
Command line call with arguments if applicable.
start_time: (optional) NX_DATE_TIME
ISO 8601 time code with local time zone offset to UTC information ...
ISO 8601 time code with local time zone offset to UTC information included when the app was started.
end_time: (optional) NX_DATE_TIME
ISO 8601 time code with local time zone offset to UTC information ...
ISO 8601 time code with local time zone offset to UTC information included when the app terminated or crashed.
total_elapsed_time: (optional) NX_NUMBER {units=NX_TIME}
Wall-clock time how long the app execution took. This may be in principle ...
Wall-clock time how long the app execution took. This may be in principle end_time minus start_time; however usage of eventually more precise timers may warrant to use a finer temporal discretization, and thus demands a more precise record of the wall-clock time.
number_of_processes: (optional) NX_UINT {units=NX_UNITLESS}
Qualifier which specifies with how many nominal processes the app was ...
Qualifier which specifies with how many nominal processes the app was invoked. The main idea behind this field e.g. for apps which use e.g. MPI (Message Passing Interface) parallelization is to communicate how many processes were used.
For sequentially running apps number_of_processes and number_of_threads is 1. If the app uses exclusively GPU parallelization number_of_gpus can be larger than 1. If no GPU is used number_of_gpus is 0 even though the hardware may have GPUs installed, thus indicating these were not used though.
number_of_threads: (optional) NX_UINT {units=NX_UNITLESS}
Qualifier how many nominal threads were used at runtime. ...
Qualifier how many nominal threads were used at runtime. Specifically here the maximum number of threads used for the high-level threading library used (e.g. OMP_NUM_THREADS), posix.
number_of_gpus: (optional) NX_UINT {units=NX_UNITLESS}
Qualifier with how many nominal GPUs the app was invoked at runtime.
CS_COMPUTER: (optional) NXcs_computer
A collection with one or more computing nodes each with own resources. ...
A collection with one or more computing nodes each with own resources. This can be as simple as a laptop or the nodes of a cluster computer.
CS_PROFILING_EVENT: (optional) NXcs_profiling_event
A collection of individual profiling event data which detail e.g. how ...
A collection of individual profiling event data which detail e.g. how much time the app took for certain computational steps and/or how much memory was consumed during these operations.
Hypertext Anchors¶
List of hypertext anchors for all groups, fields, attributes, and links defined in this class.