Have you ever opened up YouTube and wondered how it’s able to store and efficiently serve thousands of petabytes of video? Or how you can upload your aesthetic brunch picture to Instagram and have it show up virtually instantly to all your followers’ feeds? Well, my friends, you’ve come to the right place! Together, through this wonderful Medium (pun intended) of online blogging let us discover the marvels (and pitfalls) of modern-day distributed systems.
What is a Distributed System?
Unless you’ve been living under the proverbial rock, you’ve probably heard of the term “distributed system”, but you might not know exactly what it means. That’s OK, let’s start with a formal definition from Wikipedia:
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
A simpler way of thinking about it is just a group of computers working together to accomplish a common goal. This can be a database system, where the data is split and stored on multiple computers, or a microservice chat app, where one server handles authentication and another handles messaging.
Want to read this story later? Save it in Journal.
The Importance of Distributed Systems
If you’re new to system design, you might be wondering: why use distributed systems when it seems complex to build and a hassle to maintain? Well, it turns out that distributed systems are the best way to effectively handle an extremely large amount of load. When it comes to scaling, there are generally two approaches you can take. First, there is vertical scaling, where you upgrade the hardware of a singular computer, e.g. get a faster CPU and add more memory. The second is horizontal scaling, where you get another computer, and split the load between the two. Vertical scaling can be useful in some situations, but for the most part, it quickly becomes infeasible due to cost, and eventually, physical limitations. After all, a single CPU can only hold so many transistors.
A Helpful Analogy
Say you own a restaurant and because you only have one employee, you’re struggling to serve all your customers during peak hours. What would be a better way to solve the problem — replace your existing employee with a faster employee, or just hire another employee? While it initially might be worth it to hire a faster employee, there will still be a limit to how many customers they can serve, because again, a human only has two arms. Hiring multiple employees will let you scale infinitely to serve however many customers you have. There are many other plus sides too. During non-peak hours you can have fewer employees, saving cost. Also, different employees can now be specialized at doing specific tasks — you can hire a pro chef to make higher quality food, enhancing the overall customer experience.
The World Ain’t All Sunshine and Rainbows
We have now seen some of the many benefits of using distributed systems — but it does come at the cost of being much harder to set up and manage. For example, what happens when one computer in the system crashes? How do we split the work among the remaining computers? In the restaurant analogy, you can think of it as having one of your employees call in sick.
There are quite a number of very tough problems in the world of distributed systems, and in future blog posts, we’ll cover some of the more famous ones. For now, keep the restaurant analogy tucked somewhere safe, as we’ll be revisiting it again.