HBase: Importing Data from HDFS
Description
Local CSV file
--> HDFS
--> HBase
- Input Data Preparation
- Create HBase Table
- Load data into HBase
- Read data from HBase
Prerequisites
- HDFS: hadoop-3.4.1
- HBase: hbase-2.5.11-hadoop3
Developing
Input Data Preparation
Create local dir:
/home/hadoop/examples/HBaseImportingDatafromHDFS
Download data:
https://drive.google.com/uc?id=1zO8ekHWx9U7mrbx_0Hoxxu6od7uxJqWw&export=download
file customers-100.csv
to:
/home/hadoop/examples/HBaseImportingDatafromHDFS
Read CSV structure:
Index,Customer Id,First Name,Last Name,Company,City,Country,Phone 1,Phone 2,Email,Subscription Date,Website
Load to HDFS:
hdfs dfs -mkdir /examples/HBaseImportingDatafromHDFS
hadoop fs -copyFromLocal /home/hadoop/examples/HBaseImportingDatafromHDFS/customers-100.csv /examples/HBaseImportingDatafromHDFS/customers-100.csv
Check:
hdfs dfs -ls /examples/HBaseImportingDatafromHDFS
-rw-r--r-- 1 hadoop supergroup 17261 2025-03-19 14:01 /examples/HBaseImportingDatafromHDFS/customers-100.csv
Create HBase Table
create 'customers-100', 'Customer_Id', 'First_Name', 'Last_Name', 'Company', 'City', 'Country', 'Phone_1', 'Phone_2', 'Email', 'Subscription_Date', 'Website'
Check:
describe 'customers-100'
Load data into HBase
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,Customer_Id,First_Name,Last_Name,Company,City,Country,Phone_1,Phone_2,Email,Subscription_Date,Website customers-100 /examples/HBaseImportingDatafromHDFS/customers-100.csv
Output:
...
...
Total vcore-milliseconds taken by all map tasks=1656
Total megabyte-milliseconds taken by all map tasks=1695744
Map-Reduce Framework
Map input records=101
Map output records=67
Input split bytes=140
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=11
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=371847168
Virtual memory (bytes) snapshot=2827165696
Total committed heap usage (bytes)=524288000
Peak Map Physical memory (bytes)=371847168
Peak Map Virtual memory (bytes)=2827165696
ImportTsv
Bad Lines=34
File Input Format Counters
Bytes Read=17261
File Output Format Counters
Bytes Written=0
hbase:001:0> scan 'customers-100':
hbase:001:0> scan 'customers-100'
ROW COLUMN+CELL
1 column=City:, timestamp=2025-03-22T16:08:00.164, value=East Leonard
1 column=Company:, timestamp=2025-03-22T16:08:00.164, value=Rasmussen Group
1 column=Country:, timestamp=2025-03-22T16:08:00.164, value=Chile
1 column=Customer_Id:, timestamp=2025-03-22T16:08:00.164, value=DD37Cf93aecA6Dc
1 column=Email:, timestamp=2025-03-22T16:08:00.164, value=zunigavanessa@smith.info
1 column=First_Name:, timestamp=2025-03-22T16:08:00.164, value=Sheryl
1 column=Last_Name:, timestamp=2025-03-22T16:08:00.164, value=Baxter
1 column=Phone_1:, timestamp=2025-03-22T16:08:00.164, value=229.077.5154
1 column=Phone_2:, timestamp=2025-03-22T16:08:00.164, value=397.884.0519x718
1 column=Subscription_Date:, timestamp=2025-03-22T16:08:00.164, value=2020-08-24
1 column=Website:, timestamp=2025-03-22T16:08:00.164, value=http://www.stephenson.com/
10 column=City:, timestamp=2025-03-22T16:08:00.164, value=Elaineberg
10 column=Company:, timestamp=2025-03-22T16:08:00.164, value=Beck-Hendrix
10 column=Country:, timestamp=2025-03-22T16:08:00.164, value=Timor-Leste
...
...