Skip to main content

MapReduce: Join

Based on...

Description

MapReduce Join operation is used to combine two large datasets.

Prerequisites

  • Linux
  • JDK 11
  • Maven
  • Hadoop 3.4.1

HDFS

Create files in HDFS

su - hadoop
start-dfs.sh
start-yarn.sh

Create local files:

/home/hadoop/examples/MapReduceJoin/input/DeptName.txt
/home/hadoop/examples/MapReduceJoin/input/DeptStrength.txt

DeptName.txt and DeptStrength.txt see Java Program

Copy the above files to HDFS:


hdfs dfs -copyFromLocal /home/hadoop/examples/MapReduceJoin/input/DeptName.txt /examples/MapReduceJoin/input/
hdfs dfs -copyFromLocal /home/hadoop/examples/MapReduceJoin/input/DeptStrength.txt /examples/MapReduceJoin/input/

Check if both files have been copied:

hdfs dfs -ls /examples/MapReduceJoin/input

Output:

-rw-r--r--   1 hadoop supergroup         59 2025-03-15 10:30 /examples/MapReduceJoin/input/DeptName.txt
-rw-r--r-- 1 hadoop supergroup 50 2025-03-15 10:31 /examples/MapReduceJoin/input/DeptStrength.txt

Java Program

Source Code: https://github.com/ZbCiok/zjc-examples/tree/main/streams/hadoop/MapReduceJoin

Program Structure

.
├── input
│   ├── DeptName.txt
│   └── DeptStrength.txt
├── pom.xml
└── src
├── main
│   ├── java
│   │   └── MapReduceJoin
│   │   ├── DeptEmpStrengthMapper.java
│   │   ├── DeptNameMapper.java
│   │   ├── JoinDriver.java
│   │   ├── JoinReducer.java
│   │   └── TextPair.java
│   └── resources
└── test
└── java

Run

mvn clean package

hadoop jar /home/hadoop/examples/MapReduceJoin/MapReduceJoin-1.0-SNAPSHOT.jar /examples/MapReduceJoin/input/DeptStrength.txt /examples/MapReduceJoin/input/DeptName.txt /examples/MapReduceJoin/output/

Output:

...
...

Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=85

Verify

hdfs dfs -ls /examples/MapReduceJoin/output

Output:

-rw-r--r--   1 hadoop supergroup          0 2025-03-15 10:59 /examples/MapReduceJoin/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 85 2025-03-15 10:59 /examples/MapReduceJoin/output/part-00000

Content

hdfs dfs -cat /examples/MapReduceJoin/output/part-00000

Output:

A11	50		Finance
B12 100 HR
C13 250 Manufacturing

References