15 Apr 19:15

3d6557d

OCS/GCS v9.0.5 Latest

Latest

Major Enhancements

v9.0.5

qtelemetry (Developer Preview in GCS)

This release introduces qtelemetry, a new metrics exporter for Gridware Cluster Scheduler (GCS). It allows administrators to easily collect and expose cluster metrics for monitoring and observability purposes.

Features:

Simple integration with Prometheus and Grafana
Export cluster metrics, including:
Host metrics (CPU load, GPU availability, memory usage, and many more)
Job metrics (queued, running, errored, waiting time, and many more)
qmaster statistics (CPU/memory usage of sge_qmaster, spooling filesystem information)
Optional per-job metric export for detailed insights (recommended only for very small workloads)
Built-in support for pre-configured Grafana dashboard:
Grafana dashboard example.

Quick Start:

By default, qtelemetry exports metrics on port 9464 from the /metrics endpoint:

./qtelemetry start

Enable additional metrics sources using command-line flags:

# Export exec host and qmaster metrics

./qtelemetry start --enableExecd --enableMaster

# Export individual job-level metrics (for smaller systems)

./qtelemetry start --singleJobs

(Available in Gridware Cluster Scheduler only)

Out of the Box Support of various MPI Distributions

The $SGE_ROOT/mpi directory contains templates of the PE configuration for the following MPI distributions:

Intel MPI
mpich
mvapich
openmpi

They can be added by simply calling qconf -Ap <path to template> and will add the PE configuration for running jobs using the given MPI in tight integration.

In addition build scripts for mpich, mvapich, and openmpi give an example on how the MPI distribution can be built and installed. The build scripts are located in $SGE_ROOT/mpi/<mpi name>/build.sh.

$SGE_ROOT/mpi/examples contains a MPI example written in the C language.
It can be run as tightly integrated parallel job in any of the MPI distributions mentioned above
and supports checkpointing and restart.

It comes with documentation, build script, job script and a template of a ckeckpointing enviroment.

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)

Easier Creation of Configuration Templates

Configuration objects can now contain the additional special variables $sge_root and $sge_cell for
paths to scripts, e.g. for

prolog and epilog in the global config and queue configurations
starter_method, suspend_method, resume_method, and terminate_method in the queue configuration
start_proc_args and stop_proc_args in the parallel environment configuration
ckpt_command, migr_command, restart_command, and clean_command in the checkpointing environment

This allows to have configuration templates that can be used in different environments without
the need to modify the paths before applying the configuration.

A list of all special variables is given in the sge_conf.5 man page in the prolog section.

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)

Full List of Fixes

Release notes - Cluster Scheduler

v9.0.5

Improvement

CS-342 provide an openmpi integration

CS-343 provide an example and test program using MPI

CS-791 sge_root should be available as special variable in the configuration of prolog, epilog, queue, pe, ckpt

CS-914 Make ARCH script more robust

CS-1090 qstat -r shall report resource requests by scope

CS-1094 Update sge_pe.md to better explain PE_HOSTFILE

CS-1114 Add GPU monitoring examples to qtelemetry Grafana dashboard

CS-1115 Build qtelemetry in containers for lx-amd64 and lx-arm64

CS-1126 in the environment of tasks of tightly integrated parallel jobs set the pe_task_id

CS-1128 Add enroot to worker GPU VM image for GCP

CS-1143 provide a MPICH integration

CS-1144 provide a MVAPICH integration

CS-1145 provide an Intel MPI integration

CS-1146 cleanup and document the ssh wrapper MPI template and scripts

CS-1152 add a checktree_mpi to testsuite with configuration and tests making use of the various MPI integrations

CS-1158 Add qtelemetry Grafana dashboard to public Grafana Cloud Dashboards

New Feature

CS-1091 Clearly document the slots syntax in man5 sge_queue_conf.md

Sub-task

CS-697 Jenkins: enable issue_3013

CS-698 Jenkins: enable issue_3179

Task

CS-662 verify delayed job reporting of sge_execd after reconnecting to sge_qmaster

CS-1117 Add qtelemetry as developer preview to GCS distribution

CS-1118 Create a packer file which builds a GPU enabled VM with and without GCS for fast deployment on GCP

CS-1125 Provide a basic examples of how enroot can be used with the GPU integration

CS-1134 message cutoff after 8 characters

CS-1136 add checktree_qtelemetry to all build environments + Jenkins setup

Bug

CS-430 booking of resources into advance reservations needs to distinguish between host and queue resources

CS-722 env_list in qstat should show NONE if not set

CS-1028 qtelemetry should support NVIDIA loadsensor values for hosts

CS-1085 BDB build error on lx-riscv64 after OS update.

CS-1096 USE_QSUB_GID functionality fails on FreeBSD 14

CS-1111 minimum and maximum thread counts in the bootstrap.5 man page are incorrect

CS-1131 wallclock time reported for tasks of a tightly integrated parallel job is incorrect

CS-1139 job deletion via JAPI/DRMAA fails if job ID exceeds INT_MAX

CS-1140 termination of event client via JAPI fails if event client ID exceeds INT_MAX

CS-1141 MacOS build broken due to unavailability of getgrouplist()

CS-1163 when a queue is signalled then additional invalid entries are created in the berkeleydb spooling database

Assets 2

1 Join discussion

05 Mar 18:56

ernst-bablick

V904_TAG

170013b

OCS/GCS v9.0.4

v9.0.4

IT IS STRONGLY RECOMMENDED TO UPGRADE TO PATCH v9.0.4

We fixed several critical bugs that caused
- the sge_qmaster to crash
- issues in the internal bookkeeping of the scheduler
- jobs to be stuck in the system without being able to delete them
- ...
Find the full List of fixes in the release notes: https://www.hpc-gridware.com/download/10333/?tmstv=1741200897

Assets 2

12 Feb 22:11

ernst-bablick

V903_TAG

d8b40ee

OCS/GCS v9.0.3

Patch release. Prebuild packages are available here: https://www.hpc-gridware.com/download-main/

Assets 2

0 Join discussion

20 Jan 19:26

ernst-bablick

V902_TAG

b29234d

GCS v9.0.2

Enhanced NVIDIA GPU Support with qgpu

With the release of patch 9.0.2, the qgpu command has been added to simplify workload management for GPU resources. The qgpu command allows administrators to manage GPU resources more efficiently. It is available for Linux amd64 and Linux arm64. qgpu is a multi-purpose command which can act as a load sensor reporting the characteristics and metrics of of NVIDIA GPU devices. For that it depends on NVIDIA DCGM to be installed on the GPU nodes. It also works as a prolog and epilog for jobs to setup NVIDIA runtime and environment variables. Further it sets up per job GPU accounting so that the GPU usage and power consumption is automatically reported in the accounting being visible in the standard qacct -j output. It supports all NVIDIA GPUs which are supported by Nvidias DCGM including NVIDIA's latest Grace Hopper superchips. For more information about qgpu please refer to the Admin Guide.

(Available in Gridware Cluster Scheduler only)

Automatic Session Management

Patch 9.0.2 introduces the new concept of automatic sessions. Session allows the Gridware Cluster Scheduler system to synchronize internal data stores, so that client commands can be enforced to get the most recent data. Session management is enabled, but can be disabled by setting the DISABLE_AUTOMATIC_SESSIONS parameter to true in the qmaster_params of the cluster configuration.

The default for the qmaster_param DISABLE_SECONDARY_DS_READER is now also false. This means that the reader thread pool is enabled by default and does not need to be enabled manually as in patch 9.0.1.

The reader thread pool in combination with sessions ensure that commands that trigger changes within the cluster (write-requests), such as submitting a job, modifying a queue or changing a complex value, are executed and the outcome of those commands is guaranteed to be visible to the user who initiated the change. Commands that only read data (read-requests), such as qstat, qhost or qconf -s..., that are triggered by the same user, always return the most recent data although all read-requests in the system are executed completely in parallel to the other Gridware Cluster Scheduler core components. This additional synchronization ensures that the data is consistent for the user with each read-request but on the other side might slow down individual read-requests.

Assume following script:
```
#!/bin/sh

job_id=`qsub -terse ...`
qstat -j $job_id
```
Without activated sessions it is not guaranteed that the qstat -j command will see the job that was submitted before. With sessions enabled, the qstat -j command will always see the job but the command will be slightly slower compared to the same scenario without sessions.

Sessions eliminate the need to poll for information about an action until it is visible in the system. Unlike other workload management systems, session management in Gridware Cluster Scheduler is automatic. There is no need to manually create or destroy sessions after they have been enabled globally.
The sge_qmaster monitoring has been improved. Beginning with this patch the output for worker and reader threads will show following numbers in the output section for reader and worker threads:
```
... OTHER (ql:0,rql:0,wrql:0) ...
```
All three values show internal request queue lengths. Usually they are all 0 but in high load situations or when sessions are enabled then they can increase:
- ql shows the queue length of the worker threads. This request queue contains requests that require a write lock on the main data store.
- rql shows the queue length of the reader threads. The queue contains requests that require a read lock on the secondary reader data store.
- wrql shows the queue length of the waiting reader threads. All requests that cannot be handled by reader threads immediately are stored in this list till the secondary reader data store is ready to handle them. If sessions are disabled then the number will always be 0.
Increasing values are uncritical as long as the numbers also decrease again. If the numbers increase continuously then the system is under high load and the performance might be impacted.

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)

Departments, Users and Jobs - Department View

With the release of patch 9.0.2, we have removed the restriction that users can only be assigned to one department. Users can now be assigned to multiple departments. This is particularly useful in environments where users are members of multiple departments in a company and access to resources is based on department affiliation.

Jobs must still be assigned to a single department. This means that a user who is a member of multiple departments can submit jobs to any of the departments of which he/she is a member, by specifying the department in the job submission command using the -dept switch. If a user does not specify a particular department, sge_qmaster assigns the job to the first department found.

Using qstat and qhost, the output can be filtered based on access lists and departments using the -sdv switch. When this switch is used, the following applies:

Only the hosts/queues to which the user has access are displayed.
Jobs are only displayed if they belong to the executing user or a user who belongs to one of the departments where the executing user is also part of.
Child objects are only displayed if the user also has access to the corresponding parent object. This means that jobs are not displayed if the queue or host does not offer access (anymore) where the jobs are running, and queues if the host is not accessible (anymore).

Please note that this may result in situations where users are no longer being able to see their own jobs if the access permissions are changed for a user who has jobs running in the system.

Users having the manager role always see all hosts/queues and jobs independent of the use of the -sdv switch.

Please note that this specific functionality is still in beta phase. It is only available in Gridware Cluster Scheduler and the implementation will change with upcoming patch releases.

Assets 2

0 Join discussion

19 Oct 17:43

ernst-bablick

V901_TAG

383c653

GCS v9.0.1

The first patch release of Gridware Cluster Scheduler v9.0.1 is available. Packages can be found here: https://www.hpc-gridware.com/download-main/

Starting with patch 9.0.1, the new internal architecture of sge_qmaster is enabled, allowing the component to use
additional data stores that can be utilized by pools of threads.

Listener threads: The listener thread pool was already available in earlier versions of Grid Engine. Starting with version 9.0.0 of Cluster Scheduler, this pool received a dedicated datastore to forward incoming requests faster to the component that ultimately has to process the request. New in version 9.0.1 is that this datastore includes more information so that the listener threads themselves can directly answer certain requests without having to forward them. This reduces internal friction and makes the cluster more responsive even in high load situations.
Reader thread pool: The reader thread pool is activated and can now utilize a corresponding data store. This will boost the performance of clusters in large environments where also users tend to request the status of the system very often, by using client commands like qstat, qhost or other commands that send read-only requests to sge_qmaster. The additional data store needs to be enabled manually by setting following parameter in the qmaster_params of the cluster configuration:
```
> qconf -mconf
...
qmaster_params ...,DISABLE_SECONDARY_DS_READER=false
...
```
Please note that requests answered by the reader thread pool might deliver slightly outdated data compared to the requests answered with data from the main data store because both data stores can be slightly out of sync. The maximum deviation can be configured by setting the MAX_DS_DEVIATION in milliseconds within in the qmaster_params.
```
> qconf -mconf
...
qmaster_params ...,MAX_DS_DEVIATION=1000 
...
```
The default value is 1000 milliseconds. The value should be chosen carefully to balance the performance gain with the accuracy of the data.

With one of the upcoming patches we will introduce an addition concept of automatic-sessions that will allow to synchronize the data stores more efficiently so that client commands can be enforced to get the most recent data.

Enhanced monitoring: The monitoring of sge_qmaster has been enhanced to provide more detailed information about the utilization of the different thread pools. As also in the past the monitoring is enabled by setting the monitor time:

> qconf -mconf
...
qmaster_params ...,MONITOR_TIME=10
...

qping will then show statistics about the handled requests per thread.

qping -i 1 -f  <master_host> $SGE_QMASTER_PORT qmaster 1
...
10/11/2024 12:54:53 | reader: runs: 261.04r/s (
   GDI (a:0.00,g:2871.45,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s 
   OTHER (ql:0)) 
   out: 261.04m/s APT: 0.0007s/m idle: 80.88% wait: 0.01% time: 9.99s
10/11/2024 12:54:53 | reader: runs: 279.50r/s (
   GDI (a:0.00,g:3074.50,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s 
   OTHER (ql:0)) 
   out: 279.50m/s APT: 0.0007s/m idle: 79.08% wait: 0.01% time: 10.00s
10/11/2024 12:54:53 | listener: runs: 268.65r/s (
   in (g:268.34 a:0.00 e:0.00 r:0.30)/s 
   GDI (g:0.00,t:0.00,p:0.00)/s) 
   out: 0.00m/s APT: 0.0001s/m idle: 98.42% wait: 0.00% time: 9.99s
10/11/2024 12:54:53 | listener: runs: 255.37r/s (
   in (g:255.37 a:0.00 e:0.00 r:0.00)/s 
   GDI (g:0.00,t:0.00,p:0.00)/s) 
   out: 0.00m/s APT: 0.0001s/m idle: 98.54% wait: 0.00% time: 10.00s

Here is the download link to the full Release Notes of Gridware Cluster Scheduler v9.0.1

Assets 2

0 Join discussion

05 Oct 13:31

ernst-bablick

V900_TAG

9ea2e42

OCS v9.0.0

Open Cluster Scheduler v9.0.0 is available. Pre-built packages can be found here: https://www.hpc-gridware.com/download-main/

Assets 2

1 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major Enhancements

v9.0.5

qtelemetry (Developer Preview in GCS)

Out of the Box Support of various MPI Distributions

Easier Creation of Configuration Templates

Full List of Fixes

Release notes - Cluster Scheduler

v9.0.5

Improvement

New Feature

Sub-task

Task

Bug

v9.0.4

Enhanced NVIDIA GPU Support with qgpu

Automatic Session Management

Departments, Users and Jobs - Department View

Releases: hpc-gridware/clusterscheduler

OCS/GCS v9.0.5

Major Enhancements

v9.0.5

qtelemetry (Developer Preview in GCS)

Out of the Box Support of various MPI Distributions

Easier Creation of Configuration Templates

Full List of Fixes

Release notes - Cluster Scheduler

v9.0.5

Improvement

New Feature

Sub-task

Task

Bug

OCS/GCS v9.0.4

v9.0.4

OCS/GCS v9.0.3

GCS v9.0.2

Enhanced NVIDIA GPU Support with qgpu

Automatic Session Management

Departments, Users and Jobs - Department View

GCS v9.0.1

OCS v9.0.0