Selecting the right database for your application

There is a trend we’re seeing across the industry, a single application using multiples databases.

The companies are no longer standardizing on a particular database.

Developers are now working backward from the use case and they’re billing and picking the right tool for the job (the right database for the use case).

There are many choices, we will go through eight different data models, talk about the purposes, technologies, and uses cases of each.

So, you can understand how these different models relate to each other and what makes them unique.

Relational

You are probably familiar with this type of model where data is stored in rows and tables.

Relational databases have a predefined structure known as a schema. The schema is established at the time the database is created. Any data that enters the database must conform to it. The model uses primary keys and foreign keys to maintain referential integrity. The foreign key value must match a primary key value in the parent table. These values are often (but not always) automatically generated numbers.

The result of this is that the foreign key values are usually difficult to decipher without running a query that joins the tables on the primary and foreign keys to extract the human-readable value. Some queries need to perform many joins, and when there’s a lot of data, this can impact performance. It still extremely useful when you have data that’s normalized, ensuring consistency and integrity of the distributed data.

The companies are not only using relational technology but increasing their usage, complementing relational debases with other data models to be able to add additional use cases and functionality that perhaps weren`t built into application in yesterday’s gates.

There is a lot of lift and shift, EMR, ERP and so on with relational databases, especially among enterprise applications trying to get to the cloud, to get out of self-managing or perhaps long-term licenses.

Amazon Aurora is a cloud-native relational database service from AWS that still the fastest growing service in AWS history and accelerating. It also provides you with other familiar database engines to choose like PostgreSQLMySQLMariaDBOracle Database, and SQL Server.

Figure 1 – Relational Model.

Key Value

A key-value database (also known as a key-value store database) is a type of NoSQL database that stores data as a collection of key/value pairs. This is the simplest method of storing data.

The key-value pair is a well-established concept in many programming languages. Programming languages typically refer to a key-value as an associative array or data structure. A key-value is also commonly referred to as a dictionary or hash. Their simplicity permits scale quickly and cheaply with access very fast to the data which makes them a great fit for non-relational data. An URL shortener like bit.ly, for example, just has to look up the original URL for the shortcode without any need for elaborate querying.

Amazon DynamoDB is a key-value and document database provide by Amazon web services (AWS). It delivers single-digit millisecond performance at any scale. DynamoDB is a fully managed, multi-region, multi-master database with built-in security, backup and restore, and in-memory caching for internet-scale applications. This service can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second.

Many of the world’s fastest-growing businesses such as Lyft, Airbnb, and Redfin as well as enterprises such as Samsung, Toyota, and Capital One depend on the scale and performance of DynamoDB to support their mission-critical workload.

Figure 2 – Key Value data.

Document

A document store is a non-relational database that stores data in XML, JSON or BSON documents.

Document stores feature a flexible schema that doesn’t enforce document structure. Users can create their desired structure in a document without affecting all other documents. Schema can be modified without causing downtime, which leads to high availability. They have metadata attribute embed in key-value pairs to make querying easier.

Developers don’t need to know during set-up what types of data will be stored, so this is a good choice when it isn’t clear in advance what sort of data will be incoming.

Document stores scale-out fast and efficiently horizontally. They sacrifice ACID compliance for flexibility.

Document databases vary widely in how to create, read, and search for data, ranging from RESTful APIs to functional programming concepts such as Map-Reduce.

This model database is used for unstructured or semi-structured data, content management, In-depth data analysis, Rapid prototyping, and so on.

Amazon DocumentDB from AWS gives you the performance, scalability, and availability you need when operating mission-critical workloads. In Amazon DocumentDB, the storage and compute are decoupled, allowing each to scale independently, and developers can increase the read capacity to millions of requests per second by adding up to 15 low latency read replicas in minutes, regardless of the size of your data. Amazon DocumentDB is designed for 99.99% availability and replicates six copies of your data across three AWS Availability Zones (AZs).

Figure 3 – Document databases.

In-memory

An in-memory database system (also known as IMDS) is a database management system that stores data entirely in main memory.

The main reason to keep a database in memory is performance. Reading and write data is faster than the data store to traditional (on-disk) database systems, which are designed for data storage on persistent media.

The design is typically simpler than that of on-disk databases, IMDSs can also impose significantly lower memory and CPU requirements.

Traditionally, relational databases can be characterized by the acronym ACID: Atomicity, Consistency, Isolation, and Durability. You lose the durability, as soon as you stop persisting a database.

Aside from methods of copying database contents to another durable location, an in-memory database can be subject to data loss if something stops working.

Another disadvantage is that all data must fit in limited memory. This is increasingly affordable, but memory sizes are still multi-gigabyte, while disk/flash disks can be multi-terabyte.

Real-time application, read-heavy application workloads, and compute-intensive workloads that must be built for ultra-low latency and high throughput can be benefited with this type of database.

Amazon ElastiCache from AWS works as an in-memory data store and cache to support the most demanding applications requiring sub-millisecond response times. By utilizing an end-to-end optimized stack running on customer dedicated nodes, it provides secure, blazing-fast performance. Also, ElastiCache can scale-out, scale-in, and scale-up to meet fluctuating application demands.

Figure 4 – In-memory data store.

Graph

A graph database uses vertices and edges (also referred to as nodes and relationships) to store data.

Most graph databases use a different architecture to relational databases, tending to be more flexible than the relational.

Typically, any schema of a graph is usually driven by the data. They are inherently “schema-less”, while some support “schema-full” (such as OrientDB) or “schema-mixed” modes. No schema was required to get this data into the database, itself determines the structure of the nodes and their relationships that can also have their properties.

Graph databases are suited to connected data, such as social media, recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security. It could be argued that graph databases are more suited to relationships than relational databases.

Graph databases excel when faced with large sets of associative data. A graph query can concern itself only with the data associated within the specified relationship/s, rather than querying a whole table of potentially millions of records.

AWS offers the Amazon Neptune service that is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. It supports popular graph models Property Graph and W3C’s RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune is fully managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

Figure 5 – Graph data.

Search

The search model database is a nonrelational, document-based data storage and retrieval solution specifically arranged and optimized for Indexing and searching semi-structured logs and data. It features flexible schema and fast retrieval of records, with advanced search options including full-text search, suggestions, and complex search expressions.

One of the most interesting search features is stemming that analyzes the root form of a word to find relevant records even when another form is used. For example, a user searching an employment database for “paying jobs” would also find positions tagged as “paid” and “pay.”

This model is used more as an intermediary or supplementary store than a primary database. It has low durability and poor security. There’s no innate authentication or access control.

Amazon Elasticsearch provider from AWS offers open-source APIs, managed Kibana, and integrations with Logstash and other AWS Services, enabling you to securely ingest data from any source and search, analyze, and visualize it in real time. Amazon Elasticsearch Service lets you pay only for what you use – there are no upfront costs or usage requirements. With it, you get the ELK stack you need, without the operational overhead.

Figure 6 – Search model database.

Time series

Time Series Databases (also known as TSDB) are optimized for time series or time-stamped data.The data are simply events or measurements that are tracked, monitored, downsampled, and aggregated over time.

A Time Series Database is built specifically for handling changes over time. This could be application performance monitoring, server metrics, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data.  Everything has or will have, a sensor. So now, everything inside and outside the company is emitting a relentless stream of metrics and events or time series data. TSDB is designed to support these new workloads—more data points, more data sources, more monitoring, more controls.  Amazon

Amazon Timestream from AWS gives you the scale and speed to process trillions of events per day, with up to 1,000X faster query performance at 1/10th the cost of relational databases. As your data grows over time, Timestream’s adaptive query processing engine understands its location and format, making your data simpler and faster to analyze. It also automates rollups, retention, tiering, and compression of data, so you can manage your data at the lowest possible cost. Timestream is serverless, so there are no servers to manage, freeing you to focus on building your applications.

Figure 7 – Time series databases.

Ledger

Quantum Ledger Database (also known as QLDB) is a NoSQL database that provides a complete, immutable, and verifiable history of all changes to application data.

QLDB uses an immutable transactional log, known as a journal, that tracks each application data change and maintains a complete and verifiable history of changes over time.

The model has taken away features such as consensus algorithms, blockchain networks, and sophistication of deploying networks on multiple nodes. Enterprises that are attracted to a blockchain can deploy and manage a database easily.

QLDB implements the ACID properties so it keeps the transaction valid and secure. Since it is a non-relational database, it can store large swaths of semi-unstructured data using a document-oriented data model. QLDB’s value proposition for enterprises is to take away the complexities of managing blockchain networks and to minimize infrastructure costs, finality times, and data propagation.

Many organizations such as importers, exporters, banks, shipping companies, and customs departments, to work with one another. Using QLDB, financial and trading consortiums can create a blockchain network where all parties can transact and process trade-related paperwork electronically instantly, without the need for a central trusted authority.

Amazon QLDB from AWS is a new class of database that eliminates the need to engage in the complex development effort of building your own ledger-like applications. It is easy to use because it provides developers with a familiar SQL-like API, a flexible document data model, and full support for transactions. QLDB is also serverless, so it automatically scales to support the demands of your application. There are no servers to manage and no read or write limits to configure. With QLDB, you only pay for what you use.

Figure 8 – Ledger database concepts.

Let our expert SQL Server Consulting team handle the technical complexities, so you can focus on driving your business forward with confidence.

Mark Varnas

Mark Varnas

Hey I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable. I channel my SQL into our SQL Managed Services, SQL Consulting and our internal database products.

Leave a Reply

Your email address will not be published. Required fields are marked *