Contributing Specific Amount of Data Node Storage to Hadoop Cluster

5 min readOct 15, 2020

Hey , In this Article you will get to know how to contribute Limited/Specific amount of Storage as data/slave node to Hadoop Cluster by Creating Linux Partitions . Here , I set-up Hadoop cluster on AWS Cloud so we don’t need to provide our resources , we can utilise the resources ,such as RAM/CPU , Hard-disks , etc , of AWS Cloud .

What is Big data Hadoop ?

Hadoop Cluster on AWS

What is Hadoop ?

medium.com

Steps to Contribute Specific Amount of Data Node Storage to Hadoop Cluster -

Launched Name Node and 4 Data Node on AWS Cloud
Created 4 EBS Volume and attached to each Data Node
Install hadoop and jdk software in all the Data Node ,Name Node and Hadoop Client .
Created Partition of Attached Volume in Data Node
Format the Partition in every Data Node
Mount the partition in Directory to store data
Configure Name Node and Data Node
Contributed Specific amount of Data Node storage to Hadoop Cluster

1) How to launch Instance on AWS Cloud ?

How to Launch EC2 Instance on AWS Cloud

Click Launch Instance

medium.com

2) Creation of EBS Volume

Here , I Created 4 EBS Volume of 20GB in ap-south-1a.

EBS Volume are created but they are not attached to any Instance , So their state is “available” means they can be attached to any instance and EBS Volume whose state is “in-use” .

3) How to Attach EBS Volume to Data Node ( EC2 Instance )

Here , We have to provide Instance ID so that EBS Volume attached to that Instance . By default , device name is “/dev/sdf”

Now , All the EBS Volume are attached to Data Node and their state are “in-use” . It means they are attached to Instance .

4) Configuring Name-Node

First we have to install JDK and Hadoop .

5) Adding EBS Volume to Data Node

Though EBS Volume is attached to the Data Node but they are providing any Additional Volume to the Data Node .

So , In Order to Add Volume to the Instance ( DN ) we have to do the following -

Creation of Partition
Format the Partition
Mounting the Partition in 1 Directory

Creating Partition of EBS Volume attached to Data Node

Here , the name of Volume attached to DN is /dev/xvdf .

fdisk /dev/xvdf

Now , EBS Volume is partitioned but they are not adding volume to the DN . So , we have to format the partition in order to attach it to DN ( Instance )

mkfs.ext4 /dev/xvdf3mkdir /diyaksh_datamount /dev/xvdf3 /diyaksh_data

Similarly , We have to follow these steps in all the data node to Adding Volume to all the data node .

Now , we have to configure the Data-Node and start it .

Now , Here we can see that Actual Storage of Data Node is 48 GB but they are contributing only Specific amount of storage to the hadoop Cluster .

Now , Client can put the data in hadoop Cluster but here Name Node is in Safe Mode . So , First we have to off the Safe Node and then upload the data in Hadoop Cluster .