Skip to main content

Setting up a Single Node Cluster

How to Install Hadoop on Ubuntu
https://hadoop.apache.org/
Hadoop: Setting up a Single Node Cluster.
Hadoop in Secure Mode

important

All production Hadoop clusters use Kerberos to authenticate callers and secure access to HDFS data as well as restriction access to computation services (YARN etc.).

Supported Platforms

GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.

Required Software

Required software for Linux include:

  • Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
  • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons if the optional start and stop scripts are to be used. Additionally, it is recommmended that pdsh also be installed for better ssh resource management.

Download and Install

At the moment, Apache Hadoop 3.x fully supports Java 8 and 11

In our case:

sudo update-java-alternatives --list
java-1.11.0-openjdk-amd64      1111       /usr/lib/jvm/java-1.11.0-openjdk-amd64
java-1.17.0-openjdk-amd64 1711 /usr/lib/jvm/java-1.17.0-openjdk-amd64
java-1.21.0-openjdk-amd64 2111 /usr/lib/jvm/java-1.21.0-openjdk-amd64
java-1.8.0-openjdk-amd64 1081 /usr/lib/jvm/java-1.8.0-openjdk-amd64

We set java-1.11.0-openjdk-amd64:

sudo update-java-alternatives --set java-1.11.0-openjdk-amd64

Verify:

java -version
openjdk version "11.0.26" 2025-01-21
OpenJDK Runtime Environment (build 11.0.26+4-post-Ubuntu-1ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.26+4-post-Ubuntu-1ubuntu122.04, mixed mode, sharing)

Set Up Hadoop User and Configure SSH

Create Hadoop User

sudo adduser hadoop

Install OpenSSH

sudo apt install openssh-server openssh-client -y

Switch to the newly created user

su - hadoop

Enable Passwordless SSH for Hadoop User

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

The new user can now SSH without entering a password every time. Verify everything is set up correctly by using the hadoop user to SSH to localhost:

ssh localhost

Download and Install Hadoop on Ubuntu

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
tar xzf hadoop-3.4.1.tar.gz

Configure Single Node Hadoop

.bashrc

Go to /home/hadoop

nano .bashrc

and add:

export HADOOP_HOME=/home/hadoop/hadoop-3.4.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Run the command below to apply the changes to the current running environment (/home/hadoop/):

source ./.bashrc

hadoop-env.sh

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

add
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 - in our case.

core-site.xml

nano $HADOOP_HOME/etc/hadoop/core-site.xml

add

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

hdfs-site.xml

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

add

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

mapred-site.xml

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

add

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

add

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>

Format HDFS NameNode

hdfs namenode -format

Start Hadoop Cluster

start-dfs.sh

Output:

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes

Start YARN

start-yarn.sh

Output:

Starting resourcemanager
Starting nodemanagers

Run the following command to check if all the daemons are active and running as Java processes:

jps

Sample output:

15584 Jps
14916 ResourceManager
15252 NodeManager

Access Hadoop from Browser

NameNode web interface: http://localhost:9870
DataNode web interface: http://localhost:9864
http://localhost:8088

Start / Stop Commands

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

$HADOOP_HOME/sbin/stop-yarn.sh
$HADOOP_HOME/sbin/stop-dfs.sh