Distributed System Study

What are distributed systems?

As to my understanding, the concept of distributed system, is against one machine computer system.

When the demand for complicated computing, faster speed and parallel computing grows, people came up with the idea of multi-core processor in a single machine. While, when that demand grows even more, distributed computing systems become more important for large amount of data to store and computing.

All computers or computing systems need three basic functions: Storage, computation and communication. When we learn how a distributed system works, we actually learn about how a system stores data, what  strategies or algorithms used for computing and what the mechanisms for communicating between each part of the system.

Thus, to study a distributed system, we need to figure out that how this system stores data distributedly, how to compute distributedly and how each node (machine) communicates with each other.

The paragraph below is cited from wiki: https://en.wikipedia.org/wiki/Distributed_computing

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components.[1]Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

How does a distributed system work?

This is a big question and not easy to answer in short words, but we can use some examples to reveal the facts about how does a distributed system work. Here I would like to use Hadoop as an example to elaborate this point.

Before dig into a specific distributed system, answer the following questions will help a lot.

What should we care most when design a distributed system?

To answer this question, we should understand CAP theorem.

(cited from: UCI, EECS219 lecture)

CAP represents three properties of a distributed system:

1. Consistency: Every node in the system contains the same data

(e.g. replicas are never out of data)

2. Availability: Every request to a non-failing node in the system returns a response

3. Partition Tolerance: System properties (consistency and/or availability) hold even when the system is partitioned (communicate lost) and data is lost (node lost)

You can have at most two of these three properties for any shared-data system

• To scale out, you have to partition. That leaves either consistency or availability to choose from

– In many cases, you would choose availability over consistency.

 

How to store data in a distributed system?

As we all know,  files are the basic form to store data in a computer system. Distributed system stores data in the way that very similar to the file system in a computer. To guarantte consistency, distributed system usually keep replica in all nodes.

How to do computation in a distributed system?

One of the most popular framework for computing is MapReduce.

How to deal with node failure? Master Node? slave Node?

Use Hadoop as an example. It is a master-slaver architecture. To detect a slave node’s fail. The system uses heartbeat mechanism, which is used to prove a slave node is alive. While the master node is a single point failure.

cited from :

http://stackoverflow.com/questions/33494697/secondary-namenode-usage-in-hadoop-2-x/33719153#33719153

http://stackoverflow.com/questions/33311585/how-does-hadoop-namenode-failover-process-works/33313804#33313804

The master nodes in distributed Hadoop clusters host the various storage and processing management services, described in this list, for the entire Hadoop cluster. Redundancy is critical in avoiding single points of failure, though single points of failure can not be totally avoided.

  • NameNode: Manages HDFS storage. To ensure high availability, you have both an active NameNode and a standby NameNode. Each runs on its own, dedicated master node.

  • Checkpoint node (or backup node): Provides checkpointing services for the NameNode. This involves reading the NameNode’s edit log for changes to files in HDFS (new, deleted, and appended files) since the last checkpoint, and applying them to the NameNode’s master file that maps files to data blocks.

    In addition, the Backup Node keeps a copy of the file system namespace in memory and keeps it in sync with the state of the NameNode. For high availability deployments, do not use a checkpoint node or backup node — use a Standby NameNode instead. In addition to being an active standby for the NameNode, the Standby NameNode maintains the checkpointing services and keeps an up-to-date copy of the file system namespace in memory.

  • JournalNode: Receives edit log modifications indicating changes to files in HDFS from the NameNode. At least three JournalNode services (and it’s always an odd number) must be running in a cluster, and they’re lightweight enough that they can be colocated with other services on the master nodes.

In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs).

When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is reads these edits from the JNs and apply to its own name space.

In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

It is vital for an HA cluster that only one of the NameNodes be Active at a time. ZooKeeper has been used to avoid split brain scenario so that name node state is not getting diverged due to failover.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNodealso runs a ZKFC, and that ZKFC is responsible for:

Health monitoring – the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFCconsiders the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

ZooKeeper session management – when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special “lock” znode. This lock uses ZooKeeper’s support for “ephemeral” nodes; if the session expires, the lock node will be automatically deleted.

ZooKeeper-based election – if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has “won the election”, and is responsible for running a failover to make its local NameNode active.

 

 

 

 

 

Leave a comment