It may be hard to find a sample SQL database for practice that suits all your needs at once, but there are so many reasons why you need one:
- Grow your solution portfolio before applying to new jobs.
- Learn about new or specific technologies in a test environment.
- Provide a training environment to try different SQL queries and features.
- Test your backup and restore solutions before applying them to your production databases.
- Prototyping.
- Product demonstration.
- Etc.
That’s why we’ve created a list of databases for you, comparing their features and providing additional details on their uses and how to build them from scratch, including options to enable or disable specific components as needed.
If you want to work with any of these databases, you can either run the script provided by the creator or download a backup from our website and restore it to your instance.
Below you can find general information comparing them:
Release | Deprecation | Version | Creator | Size | |
---|---|---|---|---|---|
Northwinds | 2000 | 2000 | 2000 + | Microsoft | 80 MB |
Pubs | 2000 | 2000 | 2000 + | Microsoft | 16 MB |
AdventureWorks | 2017-10-26 | 2017 | 2008 + | Microsoft | 272 MB |
AdventureWorksDW | 2017-10-26 | 2017 | 2008 + | Microsoft | 208 MB |
Contoso | 2022-06-04 | — | 2019 + | Microsoft | 22 MB minimum |
WideWorldImporters | 2016 | — | 2016 + | Microsoft | 710 MB |
WideWorldImportersDW | 2016 | — | 2016 + | Microsoft | 487 MB |
Data.gov | — | — | 2019 + | U.S. Government | 208 MB |
Dba.stackexchange.com | — | — | 2019 + | StackExchange | 1287 MB |
Below you can find information about size, number of tables, biggest tables with record count, and a link to download a backup just to be restored:
Tables | Biggest tables and row count | Download | |
---|---|---|---|
Northwinds | 13 | Order Details 2155 Orders 830 Customers 91 | Northwinds.bak |
Pubs | 11 | Roysched 86 Employee 43 Titleauthor 25 | Pubs.bak |
AdventureWorks | 71 | SalesOrderDetail 121317 TransactionHistory 113443 TransactionHistoryArchive 89253 | AdventureWorks.bak |
AdventureWorksDW | 31 | FactProductInventory 776286 FactInternetSalesReason 64515 FactResellerSales 60855 | AdventureWorksDW.bak |
Contoso | 8 | OrderRows 208969 CurrencyExchange 202100 Orders 87053 | Contoso.100K.bak |
WideWorldImporters | 48 | ColdRoomTemperatures_Archive 3654736 StockItemTransactions 236667 OrderLines 231412 | WideWorldImporters-Full.bak |
WideWorldImportersDW | 29 | Movement 236667 Order 231412 Sale 228265 | WideWorldImportersDW-Full.bak |
Data.gov | 1 | Electric_Vehicle_Population_Data 150483 | data.gov.bak |
Dba.stackexchange.com | 8 | Votes 892171 PostHistory 814930 Badge 416662 | dba.stackexchange.com.bak |
Northwinds and Pubs
These are small, general-purpose databases designed to help you quickly set up a demo or prototype in any technology. Running the scripts is the fastest way to create them, and you can do so in any version you choose.
However, they won’t meet your needs for testing with large volumes of data unless you already have a method to populate them.
AdventureWorks
If it´s your first time creating a database and you want to learn about standard features like 3NF normalization, user-defined data types, XML, schema segregation, bulk insert CSV files, triggers, and documenting the database, then AdventureWorks perfectly suits your needs.
This database is also used when learning about Reporting Services (SSRS). You can find tutorials and solutions at this link.
The database creation script is located here.
When I ran it manually, I had to specify the codepage ‘RAW’ when loading the Address table (see this link) and install the FullText feature before getting a successful execution. Running the script took only 8 seconds, and you can run it in any SQL Server version starting from 2008.
In the same location is provided the script to create the AdventureWorksDW database, which you can use to learn about Integration Services (SSIS) and Analysis Services (SSAS); you can get the tutorials and solutions here. Running the script took only 4 seconds, and you can run it in any SQL Server version starting from 2008.
Contoso
This is another data warehouse database that you can use for learning and running Business Intelligence projects, such as Analysis Services processes, Power BI reports, and more.
Microsoft provides various backups (sizes: 10K, 100K, 1M, 10M), as well as compressed data files (sizes: 100M and 1G) that can be attached to an instance. Additionally, they offer a data generator program, which you can customize to create your own data distribution, along with its source code, allowing you to modify and extend it as needed.
If you download the backups or data files, you can restore or attach them in SQL Server 2019 or later. However, if you generate your own data using the executable, you can attach it to any version of SQL Server.
In my case, I restored the backup file using the code below:
USE [master]
RESTORE DATABASE [Contoso]
FROM DISK = N'C:\setup\Contoso.100K.bak'
WITH FILE = 1
,MOVE N'Contoso 100K' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\Contoso.mdf'
,MOVE N'Contoso 100K_log' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\Contoso_log.ldf'
,NOUNLOAD
,REPLACE
,STATS = 5
WideWorldImporters
This is the most recent and biggest dataset provided by Microsoft.
It not only includes OLTP (Online Transaction Processing) and DW databases (data warehouse), but also you’ll find on the download page the Integration Services projects, scripts to learn and use recent SQL Server features (always encrypted, in-memory, analytics, polybase, and row level security), and sample workloads in Windows Forms (multithreaded order insert and multithreaded in-memory vehicle location insert).
When you download the source code, you’ll also gain access to several additional resources: an IoT Smart Grid application, the ContosoDW database, guides on using Master Data Services, R Services, and Azure SQL DB elastic pools. On the samples page, you’ll find even more projects, including Power BI dashboards, Dynamic Data Masking, the OLTP project, the OLAP project, the Analysis Services Multidimensional project, and the Integration Services project for performing ETL between the OLTP and DW databases.”
In my case, I restored the OLTP backup file using the code below:
USE [master]
RESTORE DATABASE [WideWorldImporters]
FROM DISK = N'C:\setup\WideWorldImporters-Full.bak'
WITH FILE = 1
,MOVE N'WWI_Primary' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImporters.mdf'
,MOVE N'WWI_UserData' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImporters_UserData.ndf'
,MOVE N'WWI_Log' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImporters.ldf'
,MOVE N'WWI_InMemory_Data_1' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImporters_InMemory_Data_1'
,NOUNLOAD
,STATS = 5
And restored the DW backup file using the code below:
USE [master]
RESTORE DATABASE [WideWorldImportersDW]
FROM DISK = N'C:\setup\WideWorldImportersDW-Full.bak'
WITH FILE = 1
,MOVE N'WWI_Primary' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImportersDW.mdf'
,MOVE N'WWI_UserData' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImportersDW_UserData.ndf'
,MOVE N'WWI_Log' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImportersDW.ldf'
,MOVE N'WWIDW_InMemory_Data_1' TO N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\WideWorldImportersDW_InMemory_Data_1'
,NOUNLOAD
,STATS = 5
These aren’t the only free datasets available, but they are the easiest to start with since they come as backup files ready to be restored into your SQL Server instance.
While there are other repositories online with additional datasets and guides to build your own projects, loading them into SQL Server requires knowledge of Integration Services, bulk loading, or Polybase. Additionally, you’ll need to invest time in creating the project to load the data in the format you need.
Several of these repositories can be found at:
- 43 Free Datasets for Building an Irresistible Portfolio (2024) – Dataquest
- Publicly Available Big Data Sets – Hadoop Illuminated
- Top 100 Open Source Datasets for Data Science – Medium
- Navigating Free Datasets: Sourcing for Accuracy – Atlassian
- Big Data: 33 Brilliant And Free Data Sources Anyone Can Use – Forbes
For example, I downloaded from data.gov the EV Population Data which then I loaded into the database using SQL Server Import and Export Wizard as described here, but some columns required manually specifying bigger sizes and the result database contains only one table. The same can be done for other datasets provided there.
On census.gov, you can find more correlated data, with information collected yearly on the same topics. This makes it easier to build larger schemas by loading multiple related tables or expanding datasets by combining data from multiple years for the same table.
The sites Bureau of Transportation Statistics and National Highway Traffic Safety Administration provide you with means to build your own dataset by choosing the filters before downloading the data, thus reducing your time and efforts in building a dataset with only the information you want.
If you need multiple tables, then downloading the Stack Exchange Data Dump is a better option.
I downloaded the DBA database which contains 8 tables:
- Badges
- Comments
- PostHistory
- PostLinks
- Posts
- Tags
- Users
- Votes
Loading the Badges table is accomplished by running this script (modified from the code provided at this link):
CREATE TABLE Badge (
Id INT,
UserId INT,
Name VARCHAR(50),
Date DATETIME,
Class INT,
TagBased BIT);
DECLARE @records xml
SELECT @records = C FROM OPENROWSET (BULK 'C:\setup\dba.stackexchange.com\Badges.xml', SINGLE_BLOB) AS records(C)
DECLARE @hdoc int
EXEC sp_xml_preparedocument @hdoc OUTPUT, @records
INSERT INTO PostHistory
SELECT * FROM OPENXML (@hdoc, '/badges/row', 1)
WITH(
Id INT,
PostHistoryTypeId INT,
PostId INT,
RevisionGUID UNIQUEIDENTIFIER,
CreationDate DATETIME,
UserId INT,
Text VARCHAR(MAX),
ContentLicense VARCHAR(12))
EXEC sp_xml_removedocument @hdoc
The total elapsed time was 31 seconds to load 416662 rows into the Badges table, and the other tables are loaded in a similar way.
Note the backup provided lacks primary and foreign keys, which you can create if needed.
So these are 9 databases you can download directly and start working with, and code samples you can use to load data easily from any dataset you can find online.
Hope you find them valuable but if you need a database in a specific version with only the data you’re interested in, you can hire us to provide it to you.