Hadoop Cluster SSH Communications

14Mar - by Simon - 0 - In Projects

Another in the series of articles for the Hadoop on Pi project. Ahead of software installation to the newly created Pi cluster, we need to enable Hadoop Cluster SSH Communications by creating users and SSH keys. This will permit the Hadoop software to communicate between machines in the cluster.

SSH or Secure Shell enables user-machine and machine-machine communication in an encrypted manner using encryption keys. The software on the Pi is a collection of utilities under the OpenSSH packages. Keys are generated on the Pi from the command line and can also be easily copied between machines using the same utilities so that machine-to-machine comms is enabled. Note that “machine-to-machine” is a bit of a misnomer – SSH is emulating a user connecting from one machine to the other but the effect is the same whether it is software-initiated or a real-life user.

This article is specific to the project to create an Hadoop cluster on the Pi so is relevant to that network / implementation. Hopefully it is also of general interest – it certainly taught me how easy it is to set up keys and copy them securely between systems. The network design is:

Hadoop on Pi - Network Design

 

Quick Checklist

The process is relatively simple but you do need to be methodical to ensure that keys are copied completely to all machines and that you test interconnection from each to all:

  1. Create the user and group
  2. Generate and copy the keyfile to all hosts in the network
  3. Repeat the process on each machine in the network
  4. Log in and test connectivity from each machine

 

Create the Hadoop user and group

Simple Linux admin stuff here. Login to data-master as root (or log in as the pi user and su to root). Create the group (I have kept it simple for my cluster = “hadoop”) and then the user, adding it into the hadoop group (“hduser”). Finally, give sudo privileges to hduser:

sudo su - root
addgroup hadoop
adduser --ingroup hadoop hduser
adduser hduser sudo

This creates the user, group and home directory (/home/hduser) on the master node. Repeat on each of the slave nodes (data-slave01, data-slave02 and data-slave03).

Generate and copy SSH keys

Log in to the data-master node as hduser. The SSH keys then need to be created and copied to each slave node. This is very straightforward using ssh-keygen then ssh-copy-id. With the keygen utility, I have not used a passphrase and have included a –C parameter to indicate where the keys originated – not strictly necessary but useful for documentation and script purposes – to keep track of which key has been copied where.

cd /home/hduser
ssh-keygen -t rsa -b 4096 -C hduser@data-master
ssh-copy-id hduser@data-master
ssh-copy-id hduser@data-slave01
ssh-copy-id hduser@data-slave02
ssh-copy-id hduser@data-slave03

It is important to test connectivity from each machine to every other machine so – once the above commands have been completed successfully – SSH to each machine in the network as hduser and ensure it connects correctly without requesting a password or passphrase.

ssh data-master
ssh data-slave01
ssh data-slave02
ssh data-slave02

Note: Don’t get lost ! It is easy to attempt to ssh from the wrong machine above if you are not methodical about ssh’ing and then logging out (exit’ing). If easier then leave the tests until keys have been generated at – and copied from – each of the machines in the network.

Rinse and Repeat (and test)

Complete the above for each node in the network, logging on as hduser and then executing similar commands (change the –C comment parameter in each). So for hduser logged in on data-slave01, execute the following:

cd /home/hduser
ssh-keygen -t rsa -b 4096 -C hduser@data-slave01
ssh-copy-id hduser@data-master
ssh-copy-id hduser@data-slave01
ssh-copy-id hduser@data-slave02
ssh-copy-id hduser@data-slave03

Please remember that your Hadoop cluster will fail if any part of this process is not completed properly so be sure to test that you can SSH from every machine to every other machine – connecting successfully without being prompted for a password or passphrase. You are then good to go – if you are following the Hadoop Pi project article then you can now install Hadoop.

Leave a Reply

Your email address will not be published. Required fields are marked *