What is Amazon RDS?
Amazon RDS is a fully-managed database service that allows you to provision and run relational databases in the cloud.
It supports several relational engines including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server .
It automates administration tasks such as hardware provisioning, database setup, patching, and backups.
In other words, amazon takes care of a bunch of things that normally you would be doing yourselves on-premises.
Availability Best practices
The idea of this post is to introduce you to some of the best practices for MS SQL Server availability on Amazon RDS.
Enable Multi-AZ for production workloads
Multi-AZ deployments set up the synchronous application and automatic failover.
Downtime typically is around a minute – SQL Server needs to detect that the primary node is unavailable in a quorum. It’s a lot more resilient than the single AZ deployment.
However, this process takes slightly longer – it can take 10 to 15 minutes if the entire availability zone was offline and nothing can be done about that until it comes online.
Tweak Checkpoints to reduce crash recovery times
For single-AZ and Multi-AZ deployments when SQL Server crashes it has to go through crash recovery. The server-level recovery interval option specifies the maximum amount of time required by the SQL Server Database Engine to recover the database after restarting the SQL Server. You can send this in a custom parameter group and apply that parameter group to your instance.
You can also set it at the database level using the “Alter database ..set target_recovery_time”.
Remember, if you reduce crash recovery time, your SQL Server will be doing more aggressive checkpoints, so you will need to know how it impacts in your performance.
Use Amazon RDS DB events to monitor failovers
Amazon RDS uses the Amazon Simple Notification Service (Amazon SNS) to provide notification when an event occurs.
These notifications can be in any notification form such as an email, a text message, or a call to an HTTP endpoint.
Set client DNS TTL to less than 30 seconds
If your application IP and the cached value might no longer in the service (in failover case), is a good practice to set DNS TTL to less than 30 seconds.
Do not turn off transaction logging
Do not enable simple recover mode, offline mode, or read-only mode. Muti-AZ required transaction logging to enable the service.
Know how long it takes for your DB instance to failover
You should test your application’s ability to continue working if a failover occurs.
The failover time can be shortened if you ensure that you have sufficient provisioned IOPS allocated for your workload.
During a failover will be elevated latencies as part of the process, Amazon RDS automatically replicates your data to a new standby instance. The data will be committed to two different DB instances, so there might be some latency until the standby DB instance has caught up to the new primary DB instance.