
What is Amazon RDS?
Amazon RDS is a fully managed database service that allows you to provision and run relational databases in the cloud.
It supports several relational engines including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server.
It automates administration tasks such as hardware provisioning, database setup, patching, and backups.
In other words, Amazon RDS takes care of a bunch of things that normally you would be doing yourself on-premises.
Availability best practices
The idea of this post is to introduce you to some of the best practices for MS SQL Server availability on Amazon RDS.
Enable Multi-AZ for production workloads
Multi-AZ deployments set up the synchronous application and automatic failover.
Downtime typically is around a minute – SQL Server needs to detect that the primary node is unavailable in a quorum.
It’s a lot more resilient than the Single-AZ deployment.
However, this process takes slightly longer – it can take 10 to 15 minutes if the entire availability zone is offline and nothing can be done about that until it comes online.
Tweak checkpoints to reduce crash recovery times
For Single-AZ and Multi-AZ deployments when the SQL Server crashes, it has to go through crash recovery.
The server-level recovery interval option specifies the maximum amount of time required by the SQL Server Database Engine to recover the database after restarting the SQL Server. You can send this in a custom parameter group and apply that parameter group to your instance.
You can also set it at the database level using the ALTER DATABASE ... SET TARGET_RECOVERY_TIME.
Remember, if you reduce crash recovery time, your SQL Server will perform more aggressive checkpoints, so you will need to know how it impacts your performance.
Use Amazon RDS DB events to monitor failovers
Amazon RDS uses the Amazon Simple Notification Service (Amazon SNS) to provide notifications when an event occurs.
These notifications can be in any notification form such as an email, a text message, or a call to an HTTP endpoint.
Set client DNS TTL to less than 30 seconds
If your application’s IP and the cached value are no longer in the service (in the case of failover), it is a good practice to set the DNS TTL to less than 30 seconds.
Do not turn off transaction logging
Do not enable simple recovery mode, offline mode, or read-only mode. Multi-AZ requires transaction logging to enable the service.
Know how long it takes for your DB instance to failover
You should test your application’s ability to continue working if a failover occurs.
The failover time can be shortened if you ensure that you have sufficient provisioned IOPS (input/output operations per second) allocated for your workload.
Expect increased latencies during a failover, as Amazon RDS automatically transfers your data to a new standby instance as part of the process.
The data will be committed to two different DB instances, so there might be some latency until the standby DB instance has caught up to the new primary DB instance.
Speak with a SQL Expert
In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide operational peace of mind