Sync SQL Server Data Between Azure and On-Premises: Best Methods

TL;DR

Two main ways to sync data between Azure and on-prem SQL Server: Azure SQL Data Sync (bi-directional, easy to set up, no transactional consistency) and transactional replication (uni-directional, near-real-time, DBA-grade setup).

Data Sync uses insert, update, and delete triggers plus side tables; minimum latency is 5 minutes and it adds workload overhead.
Transactional replication reads only the transaction log, so impact on the source is lower and latency is near real-time.
Replication needs SQL auth, an Azure Storage share, TCP 445 and 1433 open, and often VPN peering between virtual networks.

As cloud computing is getting more popular, many companies choose to deploy a hybrid environment using a mix of on-premises data centers and public clouds.

Microsoft Azure allows users to synchronize tables or databases between Azure SQL Databases (Azure Cloud) and on-premises SQL Server databases. Moving data around efficiently in a hybrid cloud environment is critical and challenging.

Nowadays, two main options can synchronize tables or databases to Azure Cloud:

Data Sync

Transactional Replication

Azure SQL Data Sync

Microsoft introduced Azure SQL Data Sync in 2012. This feature implements data distribution between on-premises SQL Server, Azure SQL VM, and Azure SQL databases in uni-direction or bi-direction.

Feature components

Member (Spoke) Databases: Databases that we want to be in sync with some other database. It could be either Azure SQL Database or a database in an instance of SQL Server
Metadata Database
Database that holds every metadata information of SQL Data Sync group and members.
Hub database
Database that acts as a repository for all the changes happening in all the spoke databases and is responsible for syncing each member.
Sync Group
You need to create a Sync Group to define all the settings and configure the Hub, Metadata, and Member database. It is the starting point where everything exists.
Conflict Resolution
Conflict resolution helps in which change needs to be persisted. Conflicts may occur when the data is modified on the Azure SQL hub and member database within the same sync cycle. In this case, you can choose which data will be discarded and which will be stored (from Hub or Members).
Sync Agent
When you add an on-premises database as a member database, you have to install and configure a local sync agent. The sync agent connects with your Sync Group using the agent key.
Automatic Sync
This is the option to define the frequency when sync automatically triggers. It could be anything from 5 minutes to 30 days. Also, you can set it for only manual sync.

Common configurations

Azure Data Sync is an intuitive service that can schedule sync between SQL Server or managed SQL instances in a few clicks. It tracks changes using insert, update, and delete triggers.

The Data Sync feature stores additional metadata with each database. It creates side tables in the user database for change tracking. These change tracking activities have an impact on your database workload.

Data Sync is useful to keep data updated across several Azure SQL Database or SQL Server databases. This capability may appeal to customers considering moving to the cloud and putting some of their applications in Azure (hybrid applications).

In many cases, it’s beneficial to separate different workloads across different databases to minimizes the performance impact on your production workload. When you need to run a reporting or analytics workload on this data, it’s helpful to have a second database, for example.

This method did not guarantee transactional consistency. So there may be intervals of time where your destination is not up to date or reflect a diverged state concerning your source database.

Performance experiment

Sources:

Cloud service: Azure SQL database
- Location: East US
On-premise: SQL Server 2019 (Virtual Machine)

Database: AW
Table: Customers (with 6 columns)

Limitations

Snapshot isolation must be enabled.
A table can’t have an identity column that isn’t the primary key.
Each table must have a primary key, a clustered index to use Data Sync.
- Don’t change the value of the primary key in any row. If you have to change a primary key value, delete the row and recreate it with the new primary key value. Changing the value of an existing primary key should result in data loss, even though sync does not report any issue.
A primary key can’t have the following data types: sql_variant, binary, varbinary, image, XML.
Data Sync can’t sync read-only or system-generated columns (Computed columns and System-generated columns for temporal tables).
Unsupported data types: FileStream, SQL/CLR UDT, XMLSchemaCollection (XML supported), Cursor, RowVersion, Timestamp, Hierarchyid.
The names of objects (databases, tables, and columns) can’t contain the printable characters period (.), left square bracket ([), or right square bracket (]).
Azure Active Directory authentication isn’t supported.
Tables with the same name but different schema (for example, dbo.customers and sales.customers) aren’t supported.
Columns with User-Defined Data Types aren’t supported.

For more details, see this Data Sync documentation.

Azure SQL Managed Instance using transactional replication

This feature is currently in public preview for SQL Managed Instance.

Transactional replication enables you to replicate data from a table in Azure SQL Managed Instance or a SQL Server instance to tables placed on remote databases.

SQL Databases (Azure SQL Databases, an instance database in Azure SQL Managed Instance and On-premises or on Azure VM) can be configured as push subscribers and replicate data from the Publisher, the central database. Transaction replication can only replicate data in one way.

Feature components

Publisher
Publishes changes made on some tables (articles) by sending the updates to the Distributor. The Publisher can be an Azure SQL Managed Instance or a SQL Server instance.
Distributor
Collects changes in the articles from a Publisher and distributes them to the Subscribers. The Distributor can be either an Azure SQL Managed Instance or a SQL Server instance (any version as long it is equal to or higher than the Publisher version).
Subscriber
Receives changes made on the Publisher. An SQL Server instance and Azure SQL Managed Instance can be only a push when the Distributor is an Azure SQL Managed Instance, and the Subscriber is not. A database in Azure SQL Database can only be a push subscriber.

Common configurations

Transactional replication typically starts with a snapshot of the publication database objects and data. As soon as the initial snapshot is taken, subsequent data changes and schema modifications made at the Publisher are usually delivered to the Subscriber as they occur (in near real-time).

It only uses the transaction log to track changes, minimizing the impact on the Instance/database workload.

Transactional replication is useful for publishing changes made in one or more tables, distributing and keeping them in a synchronized state (with minimal delay).

In general, the Publisher and the Distributor must be either in the cloud or on-premises. See the transactional replication configurations for available configurations.

It’s not a cheap method, even using Publisher and Distributor configured within a single SQL Managed Instance. The Replication feature requires you to meddle with many configurations and commands, needing database administration skills to set up correctly.

Performance experiments

Sources:

Cloud service: Azure SQL Managed Instance
- Location East US
- General Purpose Gen5 (32 GB, 4 vCores)*
On-premise: SQL Server 2019 (Virtual Machine)

*The Publisher and Distributor are configured within a single SQL Managed Instance and distributing changes to another on-premise SQL Server instance.

Database: AW
Table: Customer (with 6 columns)

Limitations

Azure SQL Managed Instance can support being a Subscriber from the following versions of SQL Server 2012 SP2 CU8 (11.0.5634.1), SQL Server 2014 RTM CU10 (12.0.4427.24)or SP1 CU3 (12.0.2556.4), SQL Server 2016 and later.
Use SQL Authentication for connectivity between replication participants.
Use an Azure Storage Account share for the working directory used by replication.
Open TCP outbound port 445 in the subnet security rules to access the Azure file share.
Open TCP outbound port 1433 when the SQL Managed Instance is the Publisher/Distributor, and the Subscriber is not. You may also need to change the SQL Managed Instance NSG outbound security rules.
Configure VPN peering between the virtual networks of replication participants if the virtual networks are different.

For more details, see this documentation.

Conclusion

This post aims to help you choose the right technologies and tools to implement different scenarios. In all methods, the synchronization process’s performance is directly related to the network and width of the tables (narrow tables require more overhead and traffic).

SQL Data Sync stores additional metadata with each database. The client will need to include this when calculating the space required.

Data Sync uses insert, update, and delete triggers to track changes. It creates side tables in the user database for change tracking. Thus, in some cases, it is necessary to increase the database tier.

Azure transactional replication feature eliminates the 5-minute latency mentioned in the case of Data Sync. Also, it is the only method with transactional consistency.

Such capability comes with a few drawbacks too. The implementation of transaction replication on Azure Managed Instances requires a lot of configurations. (PowerShell or SQL CMD commands).

Data Sync	Transaction Replication
Bi-Directional	Uni-directional (from Azure Managed Instance to on-premises)
At least 5 minutes of Latency	Near-Real-Time (lower Latency)
Azure SQL Data Sync starts at lower prices	Azure Managed Instance starts at higher prices
User-friendly implementation	Need DBA skills to implement
Transaction consistency not maintained	Transaction consistency
High impact on user workload (performance related to the tier)	Lower impact on user workload (higher performance)
Many limitations (number of databases, tables, etc.)	Few limitations
Need to install and configure the Data Sync Agent on the local machine for SQL on-premises	Need to configure the network and setup VPN peering (in some cases). Ports 445 and 1433 allow SQL traffic on both the Azure firewall and the Windows firewall

Speak with a SQL Expert

In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide  operational peace of mind

Schedule My Call Now ➜

Article by

Mark Varnas

Founder | CEO | SQL Veteran

Hey, I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable.

Different Ways To Sync SQL Server Data From Azure To On-Premise SQL Server

Azure SQL Data Sync

Feature components

Common configurations

Performance experiment

Limitations

Azure SQL Managed Instance using transactional replication

Feature components

Common configurations

Performance experiments

Limitations

Conclusion

Speak with a SQL Expert

Discover More

Discover what clients are saying about Red9