Skip to content

Commit 2d84608

Browse files
authored
Clean up security groups and permissions for extra mounts (#246)
Create a CDK script to automate the creation of security groups for external login nodes and for external FSx file systems. Add a parameter, AdditionalSecurityGroupsStackName to get the security group ids from the created stack and configure the head and compute node additional security groups. Update docs. Update deployment-prerequisites.md. Add security-groups.md. Replace RESEnvironmentName parameter with RESStackName. Get the RESEnvironment from the parameters of the RES stack. Delete SubmitterInstanceTags parameter because not used anywhere. Will add a new parameter to use configure/deconfigure external login nodes. Don't add extramount security groups to parallelcluster. Don't add extra mount security groups to create cluster lambda Update permissions to lambda that creates ParallelCluster. Add ec2:DeleteTags permission Add missing fsx permissions. Use cluster-manager instead of vdc-controller to create users/groups json. Add errors to SNS notification in CreateBuildFiles lambda. Handle special case where the same cluster name exists in multiple VPCs. This causes Route53 hosted zones with the same names and the a record for the head node gets created in the wrong hosted zone. Make sure to send SNS notification if parallelCluster create or update fails.
1 parent 2ae1b13 commit 2d84608

25 files changed

+1045
-585
lines changed

README.md

+14-9
Original file line numberDiff line numberDiff line change
@@ -14,30 +14,31 @@ Key features are:
1414
* Automatic scaling of AWS EC2 instances based on demand
1515
* Use any AWS EC2 instance type including Graviton2
1616
* Use of spot instances
17+
* Memory-aware scheduling
18+
* License-aware scheduling (Manages tool licenses as a consumable resource)
19+
* User and group fair share scheduling
1720
* Handling of spot terminations
1821
* Handling of insufficient capacity exceptions
1922
* Batch and interactive partitions (queues)
20-
* Manages tool licenses as a consumable resource
21-
* User and group fair share scheduling
2223
* Slurm accounting database
2324
* CloudWatch dashboard
2425
* Job preemption
2526
* Manage on-premises compute nodes
2627
* Configure partitions (queues) and nodes that are always on to support reserved instances (RIs) and savings plans (SPs).
28+
* Integration with [Research and Engineering Studio on AWS (RES)](https://aws.amazon.com/hpc/res/)
2729

2830
Features in the legacy version and not in the ParallelCluster version:
2931

3032
* Heterogenous clusters with mixed OSes and CPU architectures on compute nodes.
3133
* Multi-AZ support. Supported by ParallelCluster, but not currently implemented.
3234
* Multi-region support
3335
* AWS Fault Injection Simulator (FIS) templates to test spot terminations
34-
* Support for MungeKeySsmParameter
3536
* Multi-cluster federation
3637

3738
ParallelCluster Limitations
3839

3940
* Number of "Compute Resources" (CRs) is limited to 50 which limits the number of instance types allowed in a cluster.
40-
ParallelCluster can have multiple instance types in a CR, but with memory based scheduling enabled, they must all have the same number of cores and amount of memory.
41+
ParallelCluster can have multiple instance types in a compute resource (CR), but with memory based scheduling enabled, they must all have the same number of cores and amount of memory.
4142
* All Slurm instances must have the same OS and CPU architecture.
4243
* Stand-alone Slurm database daemon instance. Prevents federation.
4344
* Multi-region support. This is unlikely to change because multi-region services run against our archiectural philosophy.
@@ -57,11 +58,12 @@ ParallelCluster:
5758

5859
* Amazon Linux 2
5960
* CentOS 7
60-
* RedHat 7 and 8
61+
* RedHat 7, 8 and 9
62+
* Rocky Linux 8 and 9
6163

62-
This Slurm cluster supports both Intel/AMD (x86_64) based instances and ARM Graviton2 (arm64/aarch64) based instances.
64+
This Slurm cluster supports both Intel/AMD (x86_64) based instances and Graviton (arm64/aarch64) based instances.
6365

64-
[Graviton instances require](https://github.com/aws/aws-graviton-getting-started/blob/main/os.md) Amazon Linux 2 or RedHat 8 operating systems.
66+
[Graviton instances require](https://github.com/aws/aws-graviton-getting-started/blob/main/os.md) Amazon Linux 2 or RedHat/Rocky >=8 operating systems.
6567
RedHat 7 and CentOS 7 do not support Graviton 2.
6668

6769
This provides the following different combinations of OS and processor architecture.
@@ -72,10 +74,13 @@ ParallelCluster:
7274
* Amazon Linux 2 and x86_64
7375
* CentOS 7 and x86_64
7476
* RedHat 7 and x86_64
75-
* RedHat 8 and arm64
76-
* RedHat 8 and x86_64
77+
* RedHat 8/9 and arm64
78+
* RedHat 8/9 and x86_64
79+
* Rocky 8/9 and arm64
80+
* Rocky 8/9 and x86_64
7781

7882
Note that in ParallelCluster, all compute nodes must have the same OS and architecture.
83+
However, you can create as many clusters as you require.
7984

8085
## Documentation
8186

create-slurm-security-groups.sh

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash -xe
2+
3+
cd create-slurm-security-groups
4+
5+
python3 -m venv .venv
6+
source .venv/bin/activate
7+
python3 -m pip install -r requirements.txt
8+
pwd
9+
./create-slurm-security-groups.py "$@"
+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
*.swp
2+
package-lock.json
3+
.pytest_cache
4+
*.egg-info
5+
6+
# Byte-compiled / optimized / DLL files
7+
__pycache__/
8+
*.py[cod]
9+
*$py.class
10+
11+
# Environments
12+
.env
13+
.venv
14+
env/
15+
venv/
16+
ENV/
17+
env.bak/
18+
venv.bak/
19+
20+
# CDK Context & Staging files
21+
.cdk.staging/
22+
cdk.out/
23+
24+
cdk.context.json
+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
2+
# Welcome to your CDK Python project!
3+
4+
You should explore the contents of this project. It demonstrates a CDK app with an instance of a stack (`create_security_groups_stack`)
5+
which contains an Amazon SQS queue that is subscribed to an Amazon SNS topic.
6+
7+
The `cdk.json` file tells the CDK Toolkit how to execute your app.
8+
9+
This project is set up like a standard Python project. The initialization process also creates
10+
a virtualenv within this project, stored under the .venv directory. To create the virtualenv
11+
it assumes that there is a `python3` executable in your path with access to the `venv` package.
12+
If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv
13+
manually once the init process completes.
14+
15+
To manually create a virtualenv on MacOS and Linux:
16+
17+
```
18+
$ python3 -m venv .venv
19+
```
20+
21+
After the init process completes and the virtualenv is created, you can use the following
22+
step to activate your virtualenv.
23+
24+
```
25+
$ source .venv/bin/activate
26+
```
27+
28+
If you are a Windows platform, you would activate the virtualenv like this:
29+
30+
```
31+
% .venv\Scripts\activate.bat
32+
```
33+
34+
Once the virtualenv is activated, you can install the required dependencies.
35+
36+
```
37+
$ pip install -r requirements.txt
38+
```
39+
40+
At this point you can now synthesize the CloudFormation template for this code.
41+
42+
```
43+
$ cdk synth
44+
```
45+
46+
You can now begin exploring the source code, contained in the hello directory.
47+
There is also a very trivial test included that can be run like this:
48+
49+
```
50+
$ pytest
51+
```
52+
53+
To add additional dependencies, for example other CDK libraries, just add to
54+
your requirements.txt file and rerun the `pip install -r requirements.txt`
55+
command.
56+
57+
## Useful commands
58+
59+
* `cdk ls` list all stacks in the app
60+
* `cdk synth` emits the synthesized CloudFormation template
61+
* `cdk deploy` deploy this stack to your default AWS account/region
62+
* `cdk diff` compare deployed stack with current state
63+
* `cdk docs` open CDK documentation
64+
65+
Enjoy!

create-slurm-security-groups/app.py

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/usr/bin/env python3
2+
3+
import aws_cdk as cdk
4+
from aws_cdk import App, Environment
5+
from create_slurm_security_groups.create_slurm_security_groups_stack import CreateSlurmSecurityGroupsStack
6+
7+
app = cdk.App()
8+
9+
cdk_env = Environment(
10+
account = app.node.try_get_context('account_id'),
11+
region = app.node.try_get_context('region')
12+
)
13+
stack_name = app.node.try_get_context('stack_name')
14+
15+
CreateSlurmSecurityGroupsStack(app, stack_name, env=cdk_env, termination_protection = True,)
16+
17+
app.synth()

create-slurm-security-groups/cdk.json

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
{
2+
"app": "python3 app.py",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"requirements*.txt",
11+
"source.bat",
12+
"**/__init__.py",
13+
"python/__pycache__",
14+
"tests"
15+
]
16+
},
17+
"context": {
18+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19+
"@aws-cdk/core:checkSecretUsage": true,
20+
"@aws-cdk/core:target-partitions": [
21+
"aws",
22+
"aws-cn"
23+
],
24+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
25+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
26+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
27+
"@aws-cdk/aws-iam:minimizePolicies": true,
28+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
29+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
30+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
31+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
32+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
33+
"@aws-cdk/core:enablePartitionLiterals": true,
34+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
35+
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
36+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
37+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
38+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
39+
"@aws-cdk/aws-route53-patters:useCertificate": true,
40+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
41+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
42+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
43+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
44+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
45+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
46+
"@aws-cdk/aws-redshift:columnId": true,
47+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
48+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
49+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
50+
"@aws-cdk/aws-kms:aliasNameRef": true,
51+
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
52+
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
53+
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
54+
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
55+
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
56+
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
57+
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
58+
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
59+
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true,
60+
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true
61+
}
62+
}

0 commit comments

Comments
 (0)