You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for ParallelCluster 3.10.0.
Add alinux2023 support.
Add support for external slurmdbd instance.
Update documentation.
Change the UID of the slurm user to 401 to match what ParallelCluster uses.
Otherwise munge flags security errors because the UID of the submitter doesn't match the head node.
Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated.
Resolves#242
Change the installer so that it checks to make sure that the cluster stack
isn't already being changed or in a bad state.
Resolves#221
Add support for ParallelCluster 3.10.1.
Resolves#243
<ahref="https://docs.aws.amazon.com/parallelcluster/latest/ug/HeadNode-v3.html#yaml-HeadNode-Dcv-Port">Port</a>: int
@@ -304,13 +309,18 @@ See [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#p
304
309
305
310
Optional
306
311
312
+
**Note**: Starting with ParallelCluster 3.10.0, you should use slurm/ParallelClusterConfig/[Slurmdbd](#slurmdbd) instead of slurm/ParallelClusterConfig/Database.
313
+
You cannot have both parameters.
314
+
307
315
Configure the Slurm database to use with the cluster.
308
316
309
317
This is created independently of the cluster so that the same database can be used with multiple clusters.
310
318
311
-
The easiest way to do this is to use the [CloudFormation template provided by ParallelCluster](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_07_slurm-accounting-v3.html#slurm-accounting-db-stack-v3) and then to just pass
312
-
the name of the stack in [DatabaseStackName](#databasestackname).
313
-
All of the other parameters will be pulled from the stack.
319
+
See [Create ParallelCluster Slurm Database](../deployment-prerequisites#create-parallelcluster-slurm-database) on the deployment prerequisites page.
320
+
321
+
If you used the [CloudFormation template provided by ParallelCluster](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_07_slurm-accounting-v3.html#slurm-accounting-db-stack-v3), then the easiest way to configure it is to pass
322
+
the name of the stack in slurm/ParallelClusterConfig/Database/[DatabaseStackName](#databasestackname).
323
+
All of the other parameters will be pulled from the outputs of the stack.
314
324
315
325
See the [ParallelCluster documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#Scheduling-v3-SlurmSettings-Database).
316
326
@@ -330,7 +340,7 @@ The following parameters will be set using the outputs of the stack:
330
340
331
341
Used with the Port to set the [Uri](https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmSettings-Database-Uri) of the database.
332
342
333
-
##### Port
343
+
##### Database: Port
334
344
335
345
type: int
336
346
@@ -353,11 +363,56 @@ This password is used together with AdminUserName and Slurm accounting to authen
353
363
354
364
Sets the [PasswordSecretArn](https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmSettings-Database-PasswordSecretArn) parameter in ParallelCluster.
355
365
356
-
##### ClientSecurityGroup
366
+
##### Database: ClientSecurityGroup
357
367
358
368
Security group that has permissions to connect to the database.
359
369
360
-
Required to be attached to the head node that is running slurmdbd so that the port connection to the database is allows.
370
+
Required to be attached to the head node that is running slurmdbd so that the port connection to the database is allowed.
371
+
372
+
#### Slurmdbd
373
+
374
+
**Note**: This is not supported before ParallelCluster 3.10.0. If you specify this parameter then you cannot specify slurm/ParallelClusterConfig/[Database](#database).
375
+
376
+
Optional
377
+
378
+
Configure an external Slurmdbd instance to use with the cluster.
379
+
The Slurmdbd instance provides access to the shared Slurm database.
380
+
This is created independently of the cluster so that the same database can be used with multiple clusters.
381
+
382
+
This is created independently of the cluster so that the same slurmdbd instance can be used with multiple clusters.
383
+
384
+
See [Create Slurmdbd instance](../deployment-prerequisites#create-slurmdbd-instance) on the deployment prerequisites page.
385
+
386
+
If you used the [CloudFormation template provided by ParallelCluster](https://docs.aws.amazon.com/parallelcluster/latest/ug/external-slurmdb-accounting.html#external-slurmdb-accounting-step1), then the easiest way to configure it is to pass
387
+
the name of the stack in slurm/ParallelClusterConfig/Database/[SlurmdbdStackName](#slurmdbdstackname).
388
+
All of the other parameters will be pulled from the parameters and outputs of the stack.
389
+
390
+
See the [ParallelCluster documentation for ExternalSlurmdbd](https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#Scheduling-v3-SlurmSettings-ExternalSlurmdbd).
391
+
392
+
##### SlurmdbdStackName
393
+
394
+
Name of the ParallelCluster CloudFormation stack that created the Slurmdbd instance.
395
+
396
+
The following parameters will be set using the outputs of the stack:
397
+
398
+
* Host
399
+
* Port
400
+
* ClientSecurityGroup
401
+
402
+
##### Slurmdbd: Host
403
+
404
+
IP address or DNS name of the Slurmdbd instance.
405
+
406
+
##### Slurmdbd: Port
407
+
408
+
Default: 6819
409
+
410
+
Port used by the slurmdbd daemon on the Slurmdbd instance.
411
+
412
+
##### Slurmdbd: ClientSecurityGroup
413
+
414
+
Security group that has access to use the Slurmdbd instance.
415
+
This will be added as an extra security group to the head node.
361
416
362
417
### ClusterName
363
418
@@ -373,6 +428,8 @@ For an existing secret can be the secret name or the ARN.
373
428
If the secret doesn't exist one will be created, but won't be part of the cloudformation stack so that it won't be deleted when the stack is deleted.
374
429
Required if your submitters need to use more than 1 cluster.
375
430
431
+
See [Create Munge Key](../deployment-prerequisites#create-munge-key) for more details.
Copy file name to clipboardExpand all lines: docs/deploy-parallel-cluster.md
-18
Original file line number
Diff line number
Diff line change
@@ -10,24 +10,6 @@ The current latest version is 3.9.1.
10
10
11
11
See [Deployment Prerequisites](deployment-prerequisites.md) page.
12
12
13
-
### Create ParallelCluster UI (optional but recommended)
14
-
15
-
It is highly recommended to create a ParallelCluster UI to manage your ParallelCluster clusters.
16
-
A different UI is required for each version of ParallelCluster that you are using.
17
-
The versions are list in the [ParallelCluster Release Notes](https://docs.aws.amazon.com/parallelcluster/latest/ug/document_history.html).
18
-
The minimum required version is 3.6.0 which adds support for RHEL 8 and increases the number of allows queues and compute resources.
19
-
The suggested version is at least 3.7.0 because it adds configurable compute node weights which we use to prioritize the selection of
20
-
compute nodes by their cost.
21
-
22
-
The instructions are in the [ParallelCluster User Guide](https://docs.aws.amazon.com/parallelcluster/latest/ug/install-pcui-v3.html).
23
-
24
-
### Create ParallelCluster Slurm Database
25
-
26
-
The Slurm Database is required for configuring Slurm accounts, users, groups, and fair share scheduling.
27
-
It you need these and other features then you will need to create a ParallelCluster Slurm Database.
28
-
You do not need to create a new database for each cluster; multiple clusters can share the same database.
29
-
Follow the directions in this [ParallelCluster tutorial to configure slurm accounting](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_07_slurm-accounting-v3.html#slurm-accounting-db-stack-v3).
30
-
31
13
## Create the Cluster
32
14
33
15
To install the cluster run the install script. You can override some parameters in the config file
Copy file name to clipboardExpand all lines: docs/deployment-prerequisites.md
+100-11
Original file line number
Diff line number
Diff line change
@@ -99,6 +99,78 @@ The version that has been tested is in the CDK_VERSION variable in the install s
99
99
100
100
The install script will try to install the prerequisites if they aren't already installed.
101
101
102
+
## Create ParallelCluster UI (optional but recommended)
103
+
104
+
It is highly recommended to create a ParallelCluster UI to manage your ParallelCluster clusters.
105
+
A different UI is required for each version of ParallelCluster that you are using.
106
+
The versions are list in the [ParallelCluster Release Notes](https://docs.aws.amazon.com/parallelcluster/latest/ug/document_history.html).
107
+
The minimum required version is 3.6.0 which adds support for RHEL 8 and increases the number of allows queues and compute resources.
108
+
The suggested version is at least 3.7.0 because it adds configurable compute node weights which we use to prioritize the selection of
109
+
compute nodes by their cost.
110
+
111
+
The instructions are in the [ParallelCluster User Guide](https://docs.aws.amazon.com/parallelcluster/latest/ug/install-pcui-v3.html).
112
+
113
+
## Create Munge Key
114
+
115
+
Munge is a package that Slurm uses to secure communication between servers.
116
+
The munge service uses a preshared key that must be the same on all of the servers in the Slurm cluster.
117
+
If you want to be able to use multiple clusters from your submission hosts, such as virtual desktops, then all of the clusters must be using the same munge key.
118
+
This is done by creating a munge key and storing it in secrets manager.
119
+
The secret is then passed as a parameter to ParallelCluster so that it can use it when configuring munge on all of the cluster instances.
120
+
121
+
To create the munge key and store it in AWS Secrets Manager, run the following commands.
Save the ARN of the secret for when you create the Slurmdbd instance and for when you create the configuration file.
128
+
129
+
See the [Slurm documentation for authentication](https://slurm.schedmd.com/authentication.html) for more information.
130
+
131
+
See the [ParallelCluster documentation for MungeKeySecretArn](https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmSettings-MungeKeySecretArn).
132
+
133
+
See the [MungeKeySecret configuration parameter](../config#mungekeysecret).
134
+
135
+
## Create ParallelCluster Slurm Database
136
+
137
+
The Slurm Database is required for configuring Slurm accounts, users, groups, and fair share scheduling.
138
+
It you need these and other features then you will need to create a ParallelCluster Slurm Database.
139
+
You do not need to create a new database for each cluster; multiple clusters can share the same database.
140
+
Follow the directions in this [ParallelCluster tutorial to configure slurm accounting](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_07_slurm-accounting-v3.html#slurm-accounting-db-stack-v3).
141
+
142
+
## Create Slurmdbd Instance
143
+
144
+
**Note**: Before ParallelCluster 3.10.0, the slurmdbd daemon that connects to the data was created on each cluster's head node.
145
+
The recommended Slurm architecture is to have a shared slurmdbd daemon that is used by all of the clusters.
146
+
Starting in version 3.10.0, ParallelCluster supports specifying an external slurmdbd instance when you create a cluster and provide a cloud formation template to create it.
147
+
148
+
Follow the directions in this [ParallelCluster tutorial to configure slurmdbd](https://docs.aws.amazon.com/parallelcluster/latest/ug/external-slurmdb-accounting.html#external-slurmdb-accounting-step1).
149
+
This requires that you have already created the slurm database.
150
+
151
+
Here are some notes on the required parameters and how to fill them out.
152
+
153
+
| Parameter | Description
154
+
|--------------|------------
155
+
| AmiId | You can get this using the ParallelCluster UI. Click on Images and sort on Operating system. Confirm that the version is at least 3.10.0. Select the AMI for alinux2023 and the arm64 architecture.
156
+
| CustomCookbookUrl | Leave blank
157
+
| DBMSClientSG | Get this from the DatabaseClientSecurityGroup output of the database stack.
158
+
| DBMSDatabaseName | This is an arbitrary name. It must be alphanumeric. I use slurmaccounting
159
+
| DBMSPasswordSecretArn | Get this from the DatabaseSecretArn output of the database stack
160
+
| DBMSUri | Get this from the DatabaseHost output of the database stack. Note that if you copy and paste the link you should delete the https:// prefix and the trailing '/'.
161
+
| DBMSUsername | Get this from the DatabaseAdminUser output of the database stack.
162
+
| EnableSlurmdbdSystemService | Set to true. Note the warning. If the database already exists and was created with an older version of slurm then the database will be upgraded. This may break clusters using an older slurm version that are still using the cluster. Set to false if you don't want this to happen.
163
+
| InstanceType | Choose an instance type that is compatible with the AMI. For example, m7g.large.
164
+
| KeyName | Use an existing EC2 key pair.
165
+
| MungeKeySecretArn | ARN of an existing munge key secret. See [Create Munge Key](#create-munge-key).
166
+
| PrivateIp | Choose an available IP in the subnet.
167
+
| PrivatePrefix | CIDR of the instance's subnet.
168
+
| SlurmdbdPort | 6819
169
+
| SubnetId | Preferably the same subnet where the clusters will be deployed.
170
+
| VPCId | The VPC of the subnet.
171
+
172
+
The stack name will be used in the slurm/ParallelClusterConfig/[SlurmdbdStackName](../config#slurmdbdstackname) configuration parameter.
173
+
102
174
## Security Groups for Login Nodes
103
175
104
176
If you want to allow instances like remote desktops to use the cluster directly, you must define
@@ -111,25 +183,30 @@ I'll call the three security groups the following names, but they can be whateve
111
183
* SlurmHeadNodeSG
112
184
* SlurmComputeNodeSG
113
185
186
+
First create these security groups without any security group rules.
187
+
The reason for this is that the security group rules reference the other security groups so the groups must all exist before any of the rules can be created.
188
+
After you have created the security groups then create the rules as described below.
189
+
114
190
### Slurm Submitter Security Group
115
191
116
192
The SlurmSubmitterSG will be attached to your login nodes, such as your virtual desktops.
0 commit comments