Setting up a Single Node Cluster
How to Install Hadoop on Ubuntu
https://hadoop.apache.org/
Hadoop: Setting up a Single Node Cluster.
Hadoop in Secure Mode
All production Hadoop clusters use Kerberos to authenticate callers and secure access to HDFS data as well as restriction access to computation services (YARN etc.).
Supported Platforms
GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
Required Software
Required software for Linux include:
- Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
- ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons if the optional start and stop scripts are to be used. Additionally, it is recommmended that pdsh also be installed for better ssh resource management.
Download and Install
At the moment, Apache Hadoop 3.x fully supports Java 8 and 11
In our case:
sudo update-java-alternatives --list
java-1.11.0-openjdk-amd64 1111 /usr/lib/jvm/java-1.11.0-openjdk-amd64
java-1.17.0-openjdk-amd64 1711 /usr/lib/jvm/java-1.17.0-openjdk-amd64
java-1.21.0-openjdk-amd64 2111 /usr/lib/jvm/java-1.21.0-openjdk-amd64
java-1.8.0-openjdk-amd64 1081 /usr/lib/jvm/java-1.8.0-openjdk-amd64
We set java-1.11.0-openjdk-amd64
:
sudo update-java-alternatives --set java-1.11.0-openjdk-amd64
Verify:
java -version
openjdk version "11.0.26" 2025-01-21
OpenJDK Runtime Environment (build 11.0.26+4-post-Ubuntu-1ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.26+4-post-Ubuntu-1ubuntu122.04, mixed mode, sharing)
Set Up Hadoop User and Configure SSH
Create Hadoop User
sudo adduser hadoop
Install OpenSSH
sudo apt install openssh-server openssh-client -y
Switch to the newly created user
su - hadoop
Enable Passwordless SSH for Hadoop User
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
The new user can now SSH without entering a password every time. Verify everything is set up correctly by using the hadoop user to SSH to localhost:
ssh localhost
Download and Install Hadoop on Ubuntu
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
tar xzf hadoop-3.4.1.tar.gz
Configure Single Node Hadoop
.bashrc
Go to /home/hadoop
nano .bashrc
and add:
export HADOOP_HOME=/home/hadoop/hadoop-3.4.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Run the command below to apply the changes to the current running environment (/home/hadoop/
):
source ./.bashrc
hadoop-env.sh
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
add
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
- in our case.
core-site.xml
nano $HADOOP_HOME/etc/hadoop/core-site.xml
add
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
hdfs-site.xml
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
add
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
add
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
add
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Format HDFS NameNode
hdfs namenode -format
Start Hadoop Cluster
start-dfs.sh
Output:
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes