Setting up a Single Node Cluster

How to Install Hadoop on Ubuntu
https://hadoop.apache.org/
Hadoop: Setting up a Single Node Cluster.
Hadoop in Secure Mode

important

All production Hadoop clusters use Kerberos to authenticate callers and secure access to HDFS data as well as restriction access to computation services (YARN etc.).

Supported Platforms

GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.

Required Software

Required software for Linux include:

Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons if the optional start and stop scripts are to be used. Additionally, it is recommmended that pdsh also be installed for better ssh resource management.

Download and Install

At the moment, Apache Hadoop 3.x fully supports Java 8 and 11

In our case:

sudo update-java-alternatives --list

java-1.11.0-openjdk-amd64      1111       /usr/lib/jvm/java-1.11.0-openjdk-amd64
java-1.17.0-openjdk-amd64      1711       /usr/lib/jvm/java-1.17.0-openjdk-amd64
java-1.21.0-openjdk-amd64      2111       /usr/lib/jvm/java-1.21.0-openjdk-amd64
java-1.8.0-openjdk-amd64       1081       /usr/lib/jvm/java-1.8.0-openjdk-amd64

We set java-1.11.0-openjdk-amd64:

sudo update-java-alternatives --set java-1.11.0-openjdk-amd64

Verify:

java -version

openjdk version "11.0.26" 2025-01-21
OpenJDK Runtime Environment (build 11.0.26+4-post-Ubuntu-1ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.26+4-post-Ubuntu-1ubuntu122.04, mixed mode, sharing)

Set Up Hadoop User and Configure SSH

Create Hadoop User

sudo adduser hadoop

Install OpenSSH

sudo apt install openssh-server openssh-client -y

Switch to the newly created user

su - hadoop

Enable Passwordless SSH for Hadoop User

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

The new user can now SSH without entering a password every time. Verify everything is set up correctly by using the hadoop user to SSH to localhost:

ssh localhost

Download and Install Hadoop on Ubuntu

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz

tar xzf hadoop-3.4.1.tar.gz

Configure Single Node Hadoop

.bashrc

Go to /home/hadoop

nano .bashrc

and add:

export HADOOP_HOME=/home/hadoop/hadoop-3.4.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Run the command below to apply the changes to the current running environment (/home/hadoop/):

source ./.bashrc

hadoop-env.sh

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

add
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 - in our case.

core-site.xml

nano $HADOOP_HOME/etc/hadoop/core-site.xml

add

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/tmpdata</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

hdfs-site.xml

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

add

<configuration>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hadoop/dfsdata/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hadoop/dfsdata/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration>

mapred-site.xml

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

add

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

yarn-site.xml

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

add

<configuration>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>

Format HDFS NameNode

hdfs namenode -format

Start Hadoop Cluster

start-dfs.sh

Output:

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes

Start YARN

start-yarn.sh

Output:

Starting resourcemanager
Starting nodemanagers

Run the following command to check if all the daemons are active and running as Java processes:

jps

Sample output:

Jps
ResourceManager
NodeManager

Access Hadoop from Browser

NameNode web interface: http://localhost:9870
DataNode web interface: http://localhost:9864
http://localhost:8088

Start / Stop Commands

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

$HADOOP_HOME/sbin/stop-yarn.sh
$HADOOP_HOME/sbin/stop-dfs.sh

Supported Platforms​

Required Software​

Download and Install​

At the moment, Apache Hadoop 3.x fully supports Java 8 and 11​

In our case:​

Set Up Hadoop User and Configure SSH​

Create Hadoop User​

Install OpenSSH​

Switch to the newly created user​

Enable Passwordless SSH for Hadoop User​

Download and Install Hadoop on Ubuntu​

Configure Single Node Hadoop​

.bashrc​

hadoop-env.sh​

core-site.xml​

hdfs-site.xml​

mapred-site.xml​

yarn-site.xml​

Format HDFS NameNode​

Start Hadoop Cluster​

Start YARN​

Run the following command to check if all the daemons are active and running as Java processes:​

Access Hadoop from Browser​

Start / Stop Commands​