Automate Amazon Redshift Cluster management operations using Terraform

Huzefa Khan
8 min readMar 31, 2023

--

AWS Redshift cluster management operations

Amazon Redshift is a cloud-based data warehouse that provides fast processing and storage capabilities for large-scale data, up to petabyte-level. This solution offers an excellent balance between performance and price, making it a popular choice among tens of thousands of customers for running essential business operations. Amazon Redshift comes equipped with various features that enable users to build highly scalable, performant, cost-efficient, and easily manageable workloads. For instance, users can adjust their Amazon Redshift cluster’s size up or down based on their workload needs, temporarily suspend billing by pausing clusters when not in use, and allow for relocation. These management tasks can be automated using the Amazon Redshift API, AWS Command Line Interface (CLI), AWS CloudFormation, and Terraform.

Terraform simplifies the process of managing your AWS resources, allowing you to devote more time to running your applications on AWS. With Terraform, you create a template that specifies the AWS resources you require, and the service handles the task of provisioning and configuring those resources for you. This approach streamlines the management of AWS resources, freeing you from having to manage them manually, enabling you to concentrate on your application development and operations on AWS.

In this post, we walk through how to use Terraform to automate some of the most common Amazon Redshift cluster management operations:

Create an Amazon Redshift cluster via the following methods:

  • Restore a cluster from a snapshot
  • Create an encrypted Amazon Redshift cluster

Perform cluster management operations:

  • Pause or resume a cluster
  • Perform elastic resize or classic resize
  • Add or remove Identity and Access Management (IAM) roles to cluster permissions
  • Rotate encryption keys
  • Modify snapshot retention period for automated and manual snapshots
  • Enable or disable snapshot copy to another AWS Region
  • Create a parameter group with required workload management (WLM) configuration and associate it to the Amazon Redshift cluster
  • Enable concurrency scaling by modifying WLM configuration
  • Enable or disable audit logging

The main focus of this article is on utilizing Terraform for Redshift operations rather than providing an introduction to Terraform and its advantages in production environments. Any discussion on these topics will be covered in a separate article. All the terraform script is tested.

Create an encrypted Amazon Redshift cluster

You can enable database encryption for your clusters to protect data at rest. To create an Amazon Redshift cluster with encryption, you can utilize the provided sample Terraform template. The template includes only essential properties to simplify the walkthrough process. However, for your actual production workload, it is recommended to follow the best practices outlined in the post “Automate Amazon Redshift cluster creation using Terraform.”

# Create an encrypted Amazon Redshift cluster
resource "aws_redshift_cluster" "main" {
cluster_identifier = "cfn-blog-redshift-cluster"
database_name = "dev"
master_username = "awsuser"
master_password = "Admi1n23$"
node_type = "ra3.4xlarge"
cluster_type = "multi-node"
publicly_accessible = "true"
number_of_nodes = 2
encrypted = true
}

Restore an Amazon Redshift cluster from a snapshot

Amazon Redshift creates automated snapshots of a cluster periodically, which serve as point-in-time backups. Additionally, you have the option to manually take a snapshot at any time. These snapshots contain data from all running databases on the cluster and serve to protect data while also facilitating the creation of new environments for tasks such as application testing and data mining. If you are starting a new development or enhancement project, it may be useful to create a new Amazon Redshift cluster with the same code and data as your production cluster. This allows you to develop and test code before deploying it by restoring the new cluster from the snapshot of the production cluster. To create a new Amazon Redshift cluster by restoring from an existing snapshot here is script

# -- Restore an Amazon Redshift cluster from a snapshot
resource "aws_redshift_cluster" "main2" {
cluster_identifier = "tf-redshift-cluster-restore"
database_name = "dev"
master_username = "awsuser"
master_password = "Admi1n23$"
node_type = "ra3.4xlarge"
cluster_type = "multi-node"
number_of_nodes = 2
snapshot_identifier = "cfn-blog-redshift-cluster-final-snapshot"
encrypted = true
}

Pause or resume cluster

If your Amazon Redshift cluster remains inactive for a certain period of time, you have the option to pause it and stop incurring on-demand billing charges. This is particularly useful for clusters used in development, where you can suspend billing when they are not in use. During the pause period, you are only charged for the storage used by the cluster. This feature offers great flexibility in managing operational costs for your Amazon Redshift clusters, allowing you to resume the cluster whenever you need it.

To pause a cluster, update the cluster’s here is Terraform script. For pause and resume clusters you must have to create an IAM policy with spacific permissions and schedule it accordingly. Have a look to script

Pause Cluster


# Pause cluster
resource "aws_iam_role" "example" {
name = "redshift_scheduled_action"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": [
"scheduler.redshift.amazonaws.com"
]
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF
}
resource "aws_iam_policy" "example" {
name = "redshift_scheduled_action"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"redshift:PauseCluster",
"redshift:ResumeCluster",
"redshift:ResizeCluster"
],
"Resource": "*"
}
]
}
EOF
}

resource "aws_iam_role_policy_attachment" "example" {
policy_arn = aws_iam_policy.example.arn
role = aws_iam_role.example.name
}

resource "aws_redshift_scheduled_action" "main" {
name = "redshift-cluster-1-scheduled-action"
schedule = "cron(00 23 * * ? *)"
iam_role = aws_iam_role.example.arn
target_action {
pause_cluster {
cluster_identifier = "redshift-cluster-1"
}
}
}

Resume Cluster

When you’re ready to use the cluster, you can resume it. To resume the cluster, update the cluster’s current terraform script template and change the state of the cluster.


# Resume cluster
resource "aws_redshift_scheduled_action" "example" {
name = "tf-redshift-scheduled-action"
schedule = "cron(00 12 * * ? *)"
iam_role = aws_iam_role.example.arn

target_action {
resume_cluster {
cluster_identifier = "redshift-cluster-1"
}
}
}

Perform elastic resize or classic resize

Data warehouse workloads are characterized by their dynamic needs, which can change frequently. For instance, you might need to accommodate more data into your data warehouse if you add a new line of business, or if you introduce a new analytics application for your business users, you might need to add new ETL processes to support it. In such cases, when your computational requirements change, you have the option of resizing your Amazon Redshift cluster using one of the following methods:

  • Elastic resize — This changes the node type, number of nodes, or both. Typically, it completes within 10–15 minutes when adding or removing nodes of the same type. Cross-instance elastic resize can take up to 45 minutes. We recommend using elastic resize whenever possible, because it completes much more quickly than classic resize. Elastic resize has some growth and reduction limits on the number of nodes.
  • Classic resize — You can also use classic resize to change the node type, number of nodes, or both. We recommend this option only when you’re resizing to a configuration that isn’t available through elastic resize, because it takes considerably more time depending on your data size.

You can automate both elastic resize and classic resize operations on Amazon Redshift clusters using Terraform. The default resize operation when initiated using a Terraform update is elastic resize. If elastic resize isn’t possible for your configuration,Terraform throws an error. You can force the resize operation to be classic resize by specifying the value of the property Classic to Boolean true

#Resize Cluster Action Elastic resize
resource "aws_redshift_scheduled_action" "example1" {
name = "tf-redshift-scheduled-action-Resize-Elastic"
schedule = "cron(00 23 * * ? *)"
iam_role = aws_iam_role.example.arn

target_action {
resize_cluster {
cluster_identifier = "redshift-cluster-1"
cluster_type = "multi-node"
node_type = "dc1.large"
number_of_nodes = 2
}
}
}
#Resize Cluster Action classic resize
resource "aws_redshift_scheduled_action" "example2" {
name = "tf-redshift-scheduled-action-Resize-classic"
schedule = "cron(00 23 * * ? *)"
iam_role = aws_iam_role.example.arn

target_action {
resize_cluster {
cluster_identifier = "redshift-cluster-1"
cluster_type = "multi-node"
node_type = "dc1.large"
number_of_nodes = 2
classic = true
}
}
}

Add or remove IAM roles to cluster permissions

Your Amazon Redshift cluster needs permissions to access other AWS services on your behalf. For the required permissions, add IAM roles to cluster permissions. You can add up to 10 IAM roles. For instructions on creating roles, see Create an IAM role.

#Add or remove IAM roles to cluster permissions
resource "aws_redshift_cluster_iam_roles" "example3" {
cluster_identifier = "redshift-cluster-1"
iam_role_arns = [aws_iam_role.example.arn]
}

Modify the snapshot retention period for automated and manual snapshots

Amazon Redshift captures automated snapshots of the cluster at regular intervals, which are stored for a default period of 24 hours. However, you have the flexibility to adjust the retention period of automated snapshots, with options ranging from 0 to 35 days. Once the retention period is over, Amazon Redshift automatically removes the snapshots, or you can delete them manually by disabling automated snapshots for the cluster or deleting the cluster itself.

In the event that you set the retention period to 0 days, the automated snapshots feature is disabled, and any existing snapshots are permanently deleted. Therefore, it is recommended that you exercise caution when setting the retention period to 0.

#Modify the snapshot retention period for automated and manual snapshots
#retention_period - (Optional) The number of days to retain automated snapshots in the destination region after they are copied from the source region. Defaults to 7.

resource "aws_redshift_cluster" "main1" {
cluster_identifier = "cfn-blog-redshift-cluster1"
database_name = "dev"
master_username = "awsuser"
master_password = "Admi1n23$"
node_type = "ra3.4xlarge"
cluster_type = "multi-node"
publicly_accessible = "true"
number_of_nodes = 2
encrypted = true
manual_snapshot_retention_period = 90
automated_snapshot_retention_period = 10
}

Enable or disable snapshot copies to another Region

You can configure your Amazon Redshift cluster to copy all new manual and automated snapshots for a cluster to another Region. You can choose how long to keep copied automated or manual snapshots in the destination Region. If the cluster is encrypted, because AWS Key Management Service (AWS KMS) keys are specific to a Region, you must configure a snapshot copy grant for a primary key in the destination Region. For information on how to create a snapshot copy grant, see Copying AWS KMS–encrypted snapshots to another AWS Region. Make sure that the snapshot copy grant is created before enabling snapshot copy to another Region using Terraform script.


#Enable or disable snapshot copies to another Region

resource "aws_redshift_snapshot_copy_grant" "test" {
snapshot_copy_grant_name = "my-grant"
}

resource "aws_redshift_cluster" "test" {
cluster_identifier = "cfn-blog-redshift-cluster1"
database_name = "dev"
master_username = "awsuser"
master_password = "Admi1n23$"
node_type = "ra3.4xlarge"
cluster_type = "multi-node"
publicly_accessible = "true"
number_of_nodes = 2
encrypted = true
snapshot_copy {
destination_region = "us-east-2"
grant_name = aws_redshift_snapshot_copy_grant.test.snapshot_copy_grant_name
}

}

Enable or disable audit logging

When you enable audit logging, Amazon Redshift creates and uploads the connection log, user log, and user activity logs to Amazon Simple Storage Service (Amazon S3). You can automate enabling and disabling audit logging using Terraform. To enable audit logging, update the base terraform to add the Logging property with the following sub-properties:

  • BucketName — The name of an existing S3 bucket where the log files are to be stored.

#Enable or disable audit logging

resource "aws_redshift_cluster" "main4" {
cluster_identifier = "cfn-blog-redshift-cluster"
database_name = "dev"
master_username = "awsuser"
master_password = "Admi1n23$"
node_type = "ra3.4xlarge"
cluster_type = "multi-node"
publicly_accessible = "true"
number_of_nodes = 2
encrypted = true
logging {
enable = true
log_destination_type = "s3"
bucket_name = "my-bucket-000921"
}

}

You have now learned how to automate management operations on Amazon Redshift clusters using Terraform.

Note: If you notice any error in my articles, feel free to reach out and let me know of the same, and I will update the blog post accordingly.

--

--

Huzefa Khan
Huzefa Khan

Written by Huzefa Khan

Passionate Sr. Data Engineer with years of experience in developing and architecting high-class data solutions https://www.linkedin.com/in/huzzefakhan/

No responses yet