Hadoop Cluster SSH Communications
Another in the series of articles for the Hadoop on Pi project. Ahead of software installation to the newly created Pi cluster, we need to enable Hadoop Cluster SSH Communications by creating users and SSH keys. This will permit the Hadoop software to communicate between machines in the cluster.
SSH or Secure Shell enables user-machine and machine-machine communication in an encrypted manner using encryption keys. The software on the Pi is a collection of utilities under the OpenSSH packages. Keys are generated on the Pi from the command line and can also be easily copied between machines using the same utilities so that machine-to-machine comms is enabled. Note that “machine-to-machine” is a bit of a misnomer – SSH is emulating a user connecting from one machine to the other but the effect is the same whether it is software-initiated or a real-life user.
This article is specific to the project to create an Hadoop cluster on the Pi so is relevant to that network / implementation. Hopefully it is also of general interest – it certainly taught me how easy it is to set up keys and copy them securely between systems. The network design is:
The process is relatively simple but you do need to be methodical to ensure that keys are copied completely to all machines and that you test interconnection from each to all:
- Create the user and group
- Generate and copy the keyfile to all hosts in the network
- Repeat the process on each machine in the network
- Log in and test connectivity from each machine
Create the Hadoop user and group
Simple Linux admin stuff here. Login to data-master as
root (or log in as the
pi user and su to
root). Create the group (I have kept it simple for my cluster = “
hadoop”) and then the user, adding it into the
hadoop group (“
hduser”). Finally, give
sudo privileges to
sudo su - root addgroup hadoop adduser --ingroup hadoop hduser adduser hduser sudo
This creates the user, group and home directory (
/home/hduser) on the master node. Repeat on each of the slave nodes (
Generate and copy SSH keys
Log in to the data-master node as hduser. The SSH keys then need to be created and copied to each slave node. This is very straightforward using ssh-keygen then ssh-copy-id. With the keygen utility, I have not used a passphrase and have included a –C parameter to indicate where the keys originated – not strictly necessary but useful for documentation and script purposes – to keep track of which key has been copied where.
cd /home/hduser ssh-keygen -t rsa -b 4096 -C hduser@data-master ssh-copy-id hduser@data-master ssh-copy-id hduser@data-slave01 ssh-copy-id hduser@data-slave02 ssh-copy-id hduser@data-slave03
It is important to test connectivity from each machine to every other machine so – once the above commands have been completed successfully – SSH to each machine in the network as
hduser and ensure it connects correctly without requesting a password or passphrase.
ssh data-master ssh data-slave01 ssh data-slave02 ssh data-slave02
Note: Don’t get lost ! It is easy to attempt to ssh from the wrong machine above if you are not methodical about
ssh’ing and then logging out (
exit’ing). If easier then leave the tests until keys have been generated at – and copied from – each of the machines in the network.
Rinse and Repeat (and test)
Complete the above for each node in the network, logging on as
hduser and then executing similar commands (change the –C comment parameter in each). So for
hduser logged in on
data-slave01, execute the following:
cd /home/hduser ssh-keygen -t rsa -b 4096 -C hduser@data-slave01 ssh-copy-id hduser@data-master ssh-copy-id hduser@data-slave01 ssh-copy-id hduser@data-slave02 ssh-copy-id hduser@data-slave03
Please remember that your Hadoop cluster will fail if any part of this process is not completed properly so be sure to test that you can SSH from every machine to every other machine – connecting successfully without being prompted for a password or passphrase. You are then good to go – if you are following the Hadoop Pi project article then you can now install Hadoop.