7 min read • Oct 13, 2020
James is a Big Data Engineer and certified AWS Solutions Architect with a passion for data driven applications. He's spent the last 7-years helping his clients to design and implement huge scale streaming Big Data platforms, Cloud-based analytics stacks, and severless architectures.
How do you choose the database that’s right for your business needs? Our resident expert, James Cross, presents a pragmatic guide.
Relational databases have been around for a very long time - with the core idea based on the relational model first proposed by E.F Codd back in 1969. The core technologies that underpin most major relational database systems were developed in the 1980s. This was a time well before the internet was anything more than an academic fantasy. Yet relational database fundamentals have stood the test of time - properties like ACID transactions (Atomicity, Consistency, Isolation, Durability), data relationships, and the SQL query language are still a standard today. These fundamentals helped make relational databases immensely popular, and remain a cornerstone of IT infrastructure.
Relational databases have been continually enhanced and built into incredibly complex and capable systems for more than forty years. As modern internet workloads have in turn become more complex and demanding the notion that all data can be represented as rows and columns start to break down. For decades, a relational database was the only real choice - no matter the shape or purpose of the data - we would model it relationally. That is to say: the database was driving the data model. Do all application data models and use cases require schema normalization and referential integrity? The answer is a resounding no, particularly in the complementary domains of Big Data and Advanced Analytics where trying to fit vast multi-structured datasets into a relational model becomes a needless challenge.
As relational databases have grown - they attempt to support a huge range of different use cases for which they weren’t originally designed. This includes text indexing, graph structures, JSON, key vault storage, and more. This effort to provide a one-size-fits-all solution results in inevitable trade-offs and compromises. Complexity, scalability, failure blast radius, segregation, and operability suffer - not to mention the cost.
Consider a simple micro-services architecture web application that relies on rapidly scaling individual components in and out to manage overhead. Achieving this using a one size fits all relational database would be very difficult. Seldom can a single database design fit the needs of multiple, distinct use cases - and neither should it. The days of the one-size-fits-all monolithic database are long gone - there are other and better options available now worthy of consideration. As applications continue to move towards microservices architectures and developers break problems into smaller pieces, it makes sense to pick the database tool that best solves each problem.
Werner Vogels, the CTO of Amazon, observed that approximately 70% of database queries were selecting single records based only on a single filter key. That is - they’re not making use of the relational database model, and could be more efficiently and cheaply served using a key-value store. This use case has no need for referential integrity or transactions, yet by using a relational database still incurs the overhead. The result of this was DynamoDB - a nonrelational, NoSQL database built to achieve levels of scalability and resilience impossible with a relational database, yet also shrink its cost footprint down to almost 0.
All of this doesn't mean that relational databases don’t have their place in modern-day application development and design - quite the opposite. Tools like Amazon Aurora offer tremendous scale, fault tolerance, and all the best of cloud-native. The key is using a database (not just a relational database) beyond its intended purpose. Databases are designed and built to serve a particular purpose and solve specific problems. Matching the use case to the database will ultimately result in a more functional and simpler architecture that is cost-effective, resilient, easier to run, and easier to scale.
Always start with the use case, and use this to understand the type of data you will store, how you will access that data, and any constraints you have around cost consistency, and operability. Amazon commonly refers to this as database freedom. This approach almost always works best when considered in the context of microservices architectures - by definition, we’re dealing with a tightly scoped service component that (should) only be designed to do one thing, and do it extremely well.
When picking a database for your application, consider asking yourself some of the following questions:
Perhaps most importantly, you must consider the context in which you’re building an application, along with the skills of your engineering team. If you’re building a small in-house ticket-booking application with a team of experienced MySQL engineers, it hardly makes sense to use something like MongoDB just because it's cool and has some exciting scalability features.
The world of databases has changed beyond recognition over the past decade and continues to do so - with the number of database categories continuing to grow. As applications are increasingly required to handle volumes of users and traffic at the internet-scale, developers are being asked to carefully choose between relational, key-value, document, graph, in-memory, and search databases. Some use cases, which are needed to store data, may not require a database at all. For example, analytical workloads often rely on distributed data stores like HDFS and Amazon S3.
Let’s take a closer look at some of the current categories of "database", and the context in which it makes sense to use them:
Cloud-native applications have significant architectural differences from legacy applications. They have different goals, need to work at different scales, and often change more regularly. Concepts like auto-scaling, continuous integration, and continuous deployment are now fundamental characteristics of what we build, rather than nice to haves. This means we place different and varying demands on our datastores, and the old one-size-fits-all model has broken down. Take time to consider your options, and ask yourself questions about your aspirations for the applications database. In particular, the skills in your team, how you will use the data, what you will store, and how you will access it. Don’t be afraid to operate multiple applications with different databases - this isn’t fragmentation - it’s picking the right tool for the job. Particularly now - as we build more and more applications on the cloud - the operational overhead of managing databases is significantly reduced, even more so with fully-hosted options like DynamoDB and Aurora.
Strongly resist the temptation to use the same database approach for every application development challenge, as invariably the limitations of the database will end up driving your application design. This approach of Polyglot persistence can help you optimize for functionality, performance, and scale - and more importantly - the experience of your end customer.