As Cloud computing is getting more popular, many companies choose to deploy a hybrid environment using a mix of on-premises data centers and public clouds.
Moving data around efficiently in a hybrid cloud environment is critical and challenging. Nowadays, two main options can synchronize tables or databases to Azure Cloud: The Data Sync and Transactional Replication.
Azure SQL Data Sync
Microsoft introduced Azure SQL Data Sync in 2012. This feature implements data distribution between on-premises SQL Server, Azure SQL VM, and Azure SQL databases in uni-direction or bi-direction.
- Member (Spoke) Databases: Databases that we want to be in sync with some other database. It could be either Azure SQL Database or a database in an instance of SQL Server
- Metadata Database: Database that holds every metadata information of SQL Data Sync group and members.
- Hub database: Database that acts as a repository for all the changes happening in all the spoke databases and is responsible for syncing each member.
- Sync Group: You need to create a Sync Group to define all the settings and configure the Hub, Metadata, and Member database. It is the starting point where everything exists.
- Conflict Resolution: Conflict resolution helps in which change needs to be persisted. Conflicts may occur when the data is modified on the Azure SQL hub and member database within the same sync cycle. In this case, you can choose which data will be discarded and which will be stored (from Hub or Members).
- Sync Agent: When you add an on-premises database as a member database, you have to install and configure a local sync agent. The sync agent connects with your Sync Group using the agent key.
- Automatic Sync: This is the option to define the frequency when sync automatically triggers. It could be anything from 5 minutes to 30 days. Also, you can set it for only manual sync.
Azure data sync is an intuitive service that can schedule sync between SQL Server or managed SQL instances in a few clicks. It tracks changes using insert, update, and delete triggers.
The Data Sync feature stores additional metadata with each database. It creates side tables in the user database for change tracking. These change tracking activities have an impact on your database workload.
Data Sync is useful to keep data updated across several Azure SQL Database or SQL Server databases. This capability may appeal to customers considering moving to the cloud and putting some of their applications in Azure (hybrid applications).
In many cases, it’s beneficial to separate different workloads across different databases to minimizes the performance impact on your production workload. When you need to run a reporting or analytics workload on this data, it’s helpful to have a second database, for example.
This method did not guarantee Transactional consistency. So there may be intervals of time where your destination is not up to date or reflect a diverged state concerning your source database.
- Cloud service: Azure SQL database
- Location East US
- On-premise: SQL Server 2019 (Virtual Machine)
Table: Customers (with 6 columns).
- Snapshot isolation must be enabled.
- A table can’t have an identity column that isn’t the primary key.
- Each table must have a primary key, a clustered index to use data sync.
- Don’t change the value of the primary key in any row. If you have to change a primary key value, delete the row and recreate it with the new primary key value. Changing the value of an existing primary key should result in data loss, even though sync does not report any issue.
- A primary key can’t have the following data types: sql_variant, binary, varbinary, image, XML.
- Data Sync can’t sync read-only or system-generated columns (Computed columns and System-generated columns for temporal tables).
- Unsupported data types: FileStream, SQL/CLR UDT,XMLSchemaCollection (XML supported),Cursor, RowVersion, Timestamp, Hierarchyid
- The names of objects (databases, tables, and columns) can’t contain the printable characters period (.), left square bracket ([), or right square bracket (]).
- Azure Active Directory authentication isn’t supported.
- Tables with the same name but different schema (for example, dbo.customers and sales.customers) aren’t supported.
- Columns with User-Defined Data Types aren’t supported
- For more details, see this Data Sync documentation.
Azure SQL Managed Instance using transactional replication
This feature is currently in public preview for SQL Managed Instance.
Transactional replication enables you to replicate data from a table in Azure SQL Managed Instance or a SQL Server instance to tables placed on remote databases.
SQL databases (Azure SQL databases, an instance database in Azure SQL managed instance and On-premises or on Azure VM) can be configured as push subscribers and replicate data from the Publisher, the central database. Transaction replication can only replicate data in one-way.
- Publisher: Publishes changes made on some tables (articles) by sending the updates to the Distributor. The Publisher can be an Azure SQL Managed Instance or a SQL Server instance.
- Distributor: Collects changes in the articles from a Publisher and distributes them to the Subscribers. The Distributor can be either an Azure SQL Managed Instance or a SQL Server instance (any version as long it is equal to or higher than the Publisher version).
- Subscriber: Receives changes made on the Publisher. A SQL Server instance and Azure SQL Managed Instance can be only a push when the Distributor is an Azure SQL Managed Instance, and the Subscriber is not. A database in Azure SQL Database can only be a push subscriber.
Transactional replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the Publisher are usually delivered to the Subscriber as they occur (in near real-time).
It only uses the transaction log to track changes, minimizing the impact on the Instance/database workload. Transactional replication is useful for publishing changes made in one or more tables, distributing and keeping them in a synchronized state (with minimal delay).
In general, the Publisher and the Distributor must be either in the cloud or on-premises. See the transactional replication configurations for available configurations.
It’s not a cheap method, even using Publisher and Distributor configured within a single SQL Managed Instance. The Replication feature requires you to meddle with many configurations and commands, needing database administration skills to set up correctly.
- Cloud service: Azure SQL Managed Instance
- Location East US
- General Purpose Gen5 (32 GB, 4 vCores)*
- On-premise: SQL Server 2019 (Virtual Machine)
*The Publisher and Distributor are configured within a single SQL Managed Instance and distributing changes to another on-premise SQL Server instance.
Table: Customer (with 6 columns).
- Azure SQL Managed Instance can support being a Subscriber from the following versions of SQL Server 2012 SP2 CU8 (11.0.5634.1), SQL Server 2014 RTM CU10 (12.0.4427.24)or SP1 CU3 (12.0.2556.4), SQL Server 2016 and later.
- Use SQL Authentication for connectivity between replication participants.
- Use an Azure Storage Account share for the working directory used by replication.
- Open TCP outbound port 445 in the subnet security rules to access the Azure file share.
- Open TCP outbound port 1433 when the SQL Managed Instance is the Publisher/Distributor, and the Subscriber is not. You may also need to change the SQL Managed Instance NSG outbound security rules.
- Configure VPN peering between the virtual networks of replication participants if the virtual networks are different.
- For more details, see this documentation.
This post aims to help you choose the right technologies and tools to implement different scenarios. In all methods, the synchronization process’s performance is directly related to the network and width of the tables (narrow tables require more overhead and traffic).
SQL Data Sync stores additional metadata with each database. The client will need to include this when calculating the space required.
Data Sync uses insert, update, and delete triggers to track changes. It creates side tables in the user database for change tracking. Thus, in some cases, it is necessary to increase the database tier.
Azure transactional replication feature eliminates the 5 minutes Latency mentioned in the case of Data Sync. Also, it is the only method with transactional consistency. Such capability comes with a few drawbacks too. The implementation of Transaction replication on Azure management instances requires a lot of configurations (Powershell or SQL CMD commands).
|Uni-directional (From Azure Managed Instance to on-premises)
|At least 5 Minutes of Latency
|Near-Real-Time (lower Latency)
|Azure sql data sync starts at lower prices
|Azure Management Instance starts at higher prices
|Need DBA Skills to implement
|Transaction consistency not maintained
|High impact on user Workload (Performance related to the tier)
|Lower impact on user workload (higher performance)
|Many Limitations (Number of databases, tables, etc..)
|Need to install and configure the Data Sync Agent on the local machine for SQL on-premises.
|Need to configure the network and setup VPN peering (in some cases). Ports 445 and 1433 allow SQL traffic on both the Azure firewall and the Windows firewall.