Placeholder Image

Subtitles section Play video

  • My dad used to tell me, use the right tool for the job, and not vice versa.

  • When it comes to app development, choosing the right database is one of the single most important decisions that you'll ever make.

  • In today's video, we'll look at seven different database paradigms, along with some databases you've probably heard of, and some others that you haven't.

  • We'll look at how they work from a technical perspective, but most importantly, we'll try to understand what they're best used for, because my dad also used to tell me, don't bring a knife to a gunfight.

  • If you're new here, like and subscribe, and also check out my new top 7 playlist for more videos like this.

  • Our list will start from the most simple type of database, and gradually become more complex as we get to number 7.

  • And that brings us to our first paradigm, the Key-Value Database.

  • Popular databases in this space include Redis, Memcached, and Etcd.

  • The database itself is structured almost like a JavaScript object or Python dictionary.

  • You have a set of keys, where every key is unique, and points to some value.

  • In Redis, for example, we can read and write data using commands.

  • We use the set command, followed by a key and a value to write data, then the git command to retrieve that data in the future.

  • In the case of Redis and Memcached, all of the data is held in the machine's memory, as opposed to most other databases that keep all their data on the disk.

  • This limits the amount of data you can store, however it makes the database extremely fast, because it doesn't require a round trip to the disk for every operation.

  • In addition, it doesn't perform queries, joins, or anything like that, so your data modeling options are very limited, but again it's extremely fast, like sub-millisecond fast.

  • You wouldn't want to use a key-value store for your main app data.

  • Most often, they're used as a cache to reduce data latency.

  • Apps like Twitter, GitHub, and Snapchat all use Redis for real-time delivery of their data.

  • There are other use cases beyond caching, like message queues, PubSub, and gaming leaderboards, but more often than not, key-value databases are used as a cache on top of some other persistent data layer.

  • Now a database that only supports key-value pairs is obviously pretty limited, and that brings us to the wide-column database.

  • Popular options in this family include Cassandra and HBase.

  • A wide-column database is like you took a key-value database and added a second dimension to it.

  • At the outer layer, you have a keyspace, which holds one or more column families, and each column family holds a set of ordered rows.

  • This makes it possible to group related data together, but unlike a relational database, it doesn't have a schema, so it can handle unstructured data.

  • This is nice for developers, because you get a query language called CQL that's very similar to SQL, although much more limited and you can't do joins.

  • However, it's much easier to scale up and replicate data across multiple nodes.

  • Unlike an SQL database, it's decentralized and can scale horizontally.

  • A popular use case is for scaling a large amount of time series data, like records from an IoT device, weather sensors, or in the case of Netflix, a history of the different shows you've watched.

  • It's often used in situations where you have frequent writes, but infrequent updates and reads.

  • It's not going to be your primary app database.

  • For that, you'll need something more general purpose, like a document-oriented database.

  • Popular options in the Firestore, DynamoDB, CouchDB, and a few others.

  • In this paradigm, you have documents, where each document is a container for key-value pairs.

  • They're unstructured and don't require a schema.

  • Then the documents are grouped together in collections.

  • Fields within a collection can be indexed, and collections can be organized into a logical hierarchy, allowing you to model and retrieve relational data to a pretty significant degree.

  • They don't support joins, so instead of normalizing your data into a bunch of small parts, you're encouraged to embed the document.

  • This creates a tradeoff where reads from a friend and application are much faster, however writing or updating data tends to be more complex.

  • Document databases are far more general purpose than the other options we've looked at so far.

  • From a developer perspective, they're very easy to use.

  • They're often suitable for mobile games, IoT, content management, and many other use cases.

  • If you're not exactly sure how your data is structured at this point, a document database is probably the best place to start.

  • Where they generally fall short is when you have a lot of disconnected but related data that is updated often, like a social app that has many users who have many friends who have many comments who have many likes, and you want to see all the comments that your friends like.

  • Data like this needs to be joined, and it's not easily done in a document database at scale.

  • Luckily, we have this thing that's been around forever called the relational database.

  • You're likely familiar with this type of database with flavors like MySQL, Postgres, SQL Server, and many others.

  • They've been around for nearly 50 years and continue to be one of the most popular types of databases in today's world.

  • They were originally conceived by a computer scientist named Ted Codd.

  • He worked for IBM and spent years working out his theories on relational data modeling.

  • You can read his original paper online, and most of it goes way over my head, but you can appreciate the amount of math and science that went into the development of relational databases, and that's very likely why they remain so popular today.

  • A few years later, this would inspire the development of SQL, or Structured Query Language, or SQL if you prefer.

  • It's a special type of programming language called a query language that allows you to access and write data in the database.

  • Okay, but what do we actually mean when we say relational database?

  • Well, imagine you have a facility that builds airplanes.

  • The facility is your database, and on that database you might have different warehouses that hold different parts, like engines, wheels, and so on.

  • Each warehouse is like a database table for holding a certain type of part.

  • Each individual part has a serial number to uniquely identify it, and you can think of an individual part as a row in a table.

  • So now that we have all these parts separated into different warehouses, how do we build an airplane?

  • That's where relationships come in.

  • We can build an airplane by referencing the unique ID of the different parts that go into it.

  • Notice how each part has its own unique ID.

  • This is known as its primary key, then it defines its various parts by referencing their IDs.

  • These are known as foreign keys because they reference data in a different table.

  • Now if we want to join all this data together, we can run a query to do that.

  • So the main takeaway here is that an SQL database organizes data in its smallest normal form.

  • However, a potential drawback here is that it requires a schema.

  • If you don't know the right data shape up front, they can be a little harder to work with.

  • SQL databases are also ACID compliant, which means whenever there's a transaction in the database, data validity is guaranteed even if there are network or hardware failures.

  • That's essential for things like banks and financial institutions, but it makes this type of database inherently more difficult to scale.

  • However, it's worth noting that there are modern SQL databases like Cockroach that are specifically designed to operate at scale.

  • In any case, relational databases remain the most popular type of database in production today.

  • But what if instead of modeling a relationship in a schema, we just treated the relationship itself as data?

  • Enter the graph database, where your data is represented as nodes and the relationships between them as edges.

  • Popular options in this space include Neo4j and dgraph.

  • Let's imagine we want to set up a many-to-many relationship in an SQL database.

  • We do that by setting up a join table with the that define the relationship.

  • In a graph database, we don't need this middleman table.

  • We just define an edge and connect it to the other record.

  • We can now query this data with a statement that's much more concise and readable.

  • In addition, we can achieve much better performance, especially on larger datasets.

  • Graph databases can be a great alternative to SQL, especially if you're running a lot of joins and performance is taking a hit because of that.

  • They're often used for fraud detection in finance, for building internal knowledge graphs within companies, and to power engines like the one used by Airbnb.

  • Now let's imagine you want to build something like Google.

  • A user provides a small amount of text, then your database needs to return the most relevant results ranked in the proper order from a huge amount of data.

  • For that, you're going to want a full-text search engine.

  • Most of the databases in this space are based on top of the Apache Lucene project, which has been around since 1999, like Solr and Elasticsearch.

  • In addition, we have cloud-based like Algolia, and my new personal favorite, MeleSearch, a Rust-based full-text search engine.

  • If you want to check it out, I have a full tutorial on Fireship.io for pro members.

  • From a developer perspective, they work very similar to a document-oriented database.

  • You start with an index, then you add a bunch of data objects to it.

  • The difference is that under the hood, the search database will analyze all of the text in the document and create an index of the searchable terms.

  • So essentially, it works just like the index that you would find in the textbook.

  • When a user performs a search, it only has to scan the index as opposed to every document in the database, and that makes it very fast even on large datasets.

  • The database can also run a variety of different algorithms to rank those results, filter out irrelevant hits, handle typos, and so on.

  • This does add a lot of overhead, and they can be expensive to run at scale, but at the same time, they can add a ton of value to the user experience if you're building something like a type-ahead search box.

  • And with that, we've reached number seven, the multi-model database, which in my opinion is the most exciting paradigm on this list.

  • There are a few different options out there, but the database I want to focus on here is FaunaDB, which is very different than anything else we've looked at so far.

  • If you're a front-end developer, all you really care about is the data that you consume in the front-end application.

  • You just want some JSON.

  • You don't want to have to think about data modeling, schemas, replication, shards, or anything like that.

  • With FaunaDB, you describe how you want to access your data using GraphQL.

  • In this example, we have a user model and a post model, where a user can have many posts.

  • If we upload our GraphQL schema to Fauna, it automatically creates collections where we can store data, and an index to query the data.

  • Behind the scenes, it's figuring out how to take advantage of multiple database paradigms, like graph, relational, and document, and determining how to best use these paradigms based on the GraphQL code you provided.

  • You create data by adding documents to collections just like you would with a document database, but you're not with the inherent limitations when it comes to data modeling.

  • On top of that, it's ACID compliant, extremely fast, and you never have to worry about provisioning the actual infrastructure.

  • You just decide how you want to consume your data, and you let the cloud figure everything else out for you.

  • I'm going to go ahead and wrap things up there.

  • We didn't cover every single database paradigm.

  • There's a few others that you might want to know about, like time series databases, and also data warehouses.

  • And if you want to learn advanced data modeling concepts, consider becoming a pro member at Fireship.io and taking the Firestore data modeling course.

  • Thanks for watching, and I will see you in the next one.

My dad used to tell me, use the right tool for the job, and not vice versa.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it