Introduction to Distributed Systems: An Overview

_ shli001@uw.edu
Nov 28, 2020
5 min read

Introduction

A distributed system consists of multiple components located on different autonomous networked computers(nodes) that are connected using a distribution middleware that communicates and coordinates actions. Often, these distribution middlewares are message queues that enable the managed sending and receiving of messages from one component to another. However, they can also be Remote Procedure Calls (RPCs), which is analogous to local procedure calls but on other components. The purpose of a distributed system is to make it so that the components can work together and act as a single, coherent system to the end-user despite the fact that they are located on different computers and in different networks. In other words, a distributed system provides an interface for its user that is equivalent to that of a centralized system without any noticeable difference in terms of user experience. Services implemented as a distributed system are generally structured as a collection of components called microservices. These microservices can be a typical program or another distributed system, each providing parts of the functionalities that, together, made up the service. They are also independently deployable, which makes them ideal for a distributed architecture. Distributed systems are commonly used to build large-scale applications such as Content Delivery Networks(CDNs), databases, web servers, and blockchains because scalability, fault-tolerance, and performance are some of the key benefits offered by a distributed system. Most importantly, the advance of cloud services, such as FaaS and DBaaS, is made possible through the use of distributed systems as such services are generally available for use despite occasional server failures.

A distributed system is designed to tolerate the failure of individual computers so the remaining computers keep working and provide services to the users.

In this post, I will discuss three types of distributed architectures and some of the challenges associated with distributed systems. By the end of this post, you should be able to determine whether or not your next project should use a distributed architecture.

Types of Distributed Architectures

Software architecture is the blueprint for a system. It describes the system’s major components, their relationships, and how they interact with each other. Here are three types of architectures that are commonly used to build distributed systems.

Client-Server

In this model, clients request data from the server through the network. A server host runs one or more server programs, which share their resources with clients. For example, your browser sends a request to Amazon’s web server whenever you visit amazon.com, and the server responds with the web page that you see. In terms of services, the server can also be a microservice in the same system. This model is ‘distributed’ in the sense that the client and the server do not have to be on the same computer or network. Additionally, the server can be in itself another distributed system, potentially using a primary/replica architecture for fault-tolerance.

Peer to Peer

In this model, nodes are equally privileged, equipotent participants in the application. There are no additional machines used to provide services or manage resources. Responsibilities are uniformly distributed among nodes in the system, known as peers, which can serve as either client or server. For example, Bitcoin uses this model for its payment system. Using this model, there is no need for a centralized location(i.e. database) for information such as transaction history because the knowledge is stored on the miners and that each miner assumes equal responsibility for maintaining the blockchain(a distributed ledger) following a cryptographic protocol, which is why Bitcoin does not have a central authority. This model is ‘distributed’ in the sense that the miners do not have to be on the same computer or in the same network. Such a model would also require a consensus algorithm(i.e. Raft) to achieve any degree of consistency.

N-tier

This model is a variant of client-server architecture in which presentation, application processing, and data management functions are physically separated. By segregating an application into tiers, developers acquire the option of modifying or adding a specific layer, instead of reworking the entire application. For example, a three-tier architecture is typically composed of a presentation tier, a domain logic tier, and a data storage tier. This model is generally used when an application or server needs to forward requests to additional enterprise services on the network.

Challenges

Distributed systems were created out of necessity as services and applications needed to scale and new machines needed to be added and managed. In the design of distributed systems, the major trade-off to consider is complexity vs performance.

Consistency

A system is consistent if all nodes see and return the same data, at the same time. However, given the fact the nodes are distributed and can fail independently, it is challenging to synchronize the order of changes to data and states of the application. There are three consistency models, namely strong consistency, weak consistency, and eventual consistency. Generally, increasing the consistency level(i.e. weak to strong) decreases the availability of the system, which hurts performance. You may read this article for more information.

Idempotence

In a distributed environment, there are many things that can go wrong, such as requests can time out and nodes can stop working. Typically, the client would retry these requests. An idempotent system ensures that no matter, how many times a specific request is executed, the actual execution of this request only happens once. For example, when a client makes a request to pay, the request is successful, but the client disconnects due to network issues, and the client retries the request after he is connected again. With an idempotent system, the person paying would not get charged twice. With a non-idempotent system, they could.

Monitoring & Logging

Monitoring is necessary for visibility into the operation and failures of the distributed systems. Without monitoring, there is no way for developers to understand how the system performs. However, monitoring requires metrics emitted by the system. In a distributed system, such metrics can be difficult to collect, organize, and analyze because of the complexity of the system and all the events that occur concurrently.

Note on Scalability

There are basically two ways to scale a distributed system. You can either buy more powerful hardware that can meet your service demands, which is called Vertical Scaling, or you could add more machines to the system, which is called Horizontal Scaling. The fundamental tradeoff is quality vs. quantity. With horizontal scaling, you can add more machines (or nodes) to the system to increase its capacity, which is often more economical as powerful hardware can be quite expensive. Horizontal scaling is the most popular way to scale distributed systems, especially, as adding (virtual) machines to a cluster is often as easy as a click of a button.

Summary

With the ever-growing technological expansion of the world, distributed systems are becoming more and more widespread. A distributed system in its simplest definition is a group of computers working together to appear as a single computer to the end-user. Distributed systems are commonly used to build large-scale applications such as Content Delivery Networks(CDNs), databases, web servers, and blockchains because scalability, fault-tolerance, and performance are some of the key benefits offered by a distributed system. There are three types of architectures that are commonly used to build distributed systems, namely client-server, peer-to-peer, and n-Tier. When building large, distributed systems, the goal is usually to make them resilient, elastic, and scalable. However, in the design of distributed systems, the major trade-off to consider is complexity vs performance. Some of the challenges for building a distributed system are maintaining consistency, enabling idempotence, and supporting monitoring & logging of system status. With a distributed system, it is relatively easy to scale with horizontal scaling, which is a cheaper and more popular way to increase capability compared to vertical scaling. Moreover, the best thing about horizontal scaling is that you have no cap on how much you can scale — whenever performance degrades you simply add another machine, up to infinity potentially.