MapReduce: Join
Description
MapReduce Join operation is used to combine two large datasets.
Prerequisites
- Linux
- JDK 11
- Maven
- Hadoop 3.4.1
HDFS
Create files in HDFS
su - hadoop
start-dfs.sh
start-yarn.sh
Create local files:
/home/hadoop/examples/MapReduceJoin/input/DeptName.txt
/home/hadoop/examples/MapReduceJoin/input/DeptStrength.txt
DeptName.txt and DeptStrength.txt see Java Program
Copy the above files to HDFS:
hdfs dfs -copyFromLocal /home/hadoop/examples/MapReduceJoin/input/DeptName.txt /examples/MapReduceJoin/input/
hdfs dfs -copyFromLocal /home/hadoop/examples/MapReduceJoin/input/DeptStrength.txt /examples/MapReduceJoin/input/
Check if both files have been copied:
hdfs dfs -ls /examples/MapReduceJoin/input
Output:
-rw-r--r-- 1 hadoop supergroup 59 2025-03-15 10:30 /examples/MapReduceJoin/input/DeptName.txt
-rw-r--r-- 1 hadoop supergroup 50 2025-03-15 10:31 /examples/MapReduceJoin/input/DeptStrength.txt
Java Program
Source Code: https://github.com/ZbCiok/zjc-examples/tree/main/streams/hadoop/MapReduceJoin
Program Structure
.
├── input
│ ├── DeptName.txt
│ └── DeptStrength.txt
├── pom.xml
└── src
├── main
│ ├── java
│ │ └── MapReduceJoin
│ │ ├── DeptEmpStrengthMapper.java
│ │ ├── DeptNameMapper.java
│ │ ├── JoinDriver.java
│ │ ├── JoinReducer.java
│ │ └── TextPair.java
│ └── resources
└── test
└── java
Run
mvn clean package
hadoop jar /home/hadoop/examples/MapReduceJoin/MapReduceJoin-1.0-SNAPSHOT.jar /examples/MapReduceJoin/input/DeptStrength.txt /examples/MapReduceJoin/input/DeptName.txt /examples/MapReduceJoin/output/
Output:
...
...
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=85
Verify
hdfs dfs -ls /examples/MapReduceJoin/output
Output:
-rw-r--r-- 1 hadoop supergroup 0 2025-03-15 10:59 /examples/MapReduceJoin/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 85 2025-03-15 10:59 /examples/MapReduceJoin/output/part-00000
Content
hdfs dfs -cat /examples/MapReduceJoin/output/part-00000
Output:
A11 50 Finance
B12 100 HR
C13 250 Manufacturing
References