Contributing Specific Amount of Data Node Storage to Hadoop Cluster

Akash Pandey
5 min readOct 15, 2020

Hey , In this Article you will get to know how to contribute Limited/Specific amount of Storage as data/slave node to Hadoop Cluster by Creating Linux Partitions . Here , I set-up Hadoop cluster on AWS Cloud so we don’t need to provide our resources , we can utilise the resources ,such as RAM/CPU , Hard-disks , etc , of AWS Cloud .

What is Big data Hadoop ?

Steps to Contribute Specific Amount of Data Node Storage to Hadoop Cluster -

  • Launched Name Node and 4 Data Node on AWS Cloud
  • Created 4 EBS Volume and attached to each Data Node
  • Install hadoop and jdk software in all the Data Node ,Name Node and Hadoop Client .
  • Created Partition of Attached Volume in Data Node
  • Format the Partition in every Data Node
  • Mount the partition in Directory to store data
  • Configure Name Node and Data Node
  • Contributed Specific amount of Data Node storage to Hadoop Cluster

1) How to launch Instance on AWS Cloud ?

Name Node and Data Node

2) Creation of EBS Volume

EBS Volume

Here , I Created 4 EBS Volume of 20GB in ap-south-1a.

Creation of EBS Volume

EBS Volume are created but they are not attached to any Instance , So their state is “available” means they can be attached to any instance and EBS Volume whose state is “in-use” .

EBS Volume Created

3) How to Attach EBS Volume to Data Node ( EC2 Instance )

Here , We have to provide Instance ID so that EBS Volume attached to that Instance . By default , device name is “/dev/sdf”

Attach Volume

Now , All the EBS Volume are attached to Data Node and their state are “in-use” . It means they are attached to Instance .

4) Configuring Name-Node

First we have to install JDK and Hadoop .

Name Node

5) Adding EBS Volume to Data Node

Though EBS Volume is attached to the Data Node but they are providing any Additional Volume to the Data Node .

Before Partition

So , In Order to Add Volume to the Instance ( DN ) we have to do the following -

  • Creation of Partition
  • Format the Partition
  • Mounting the Partition in 1 Directory

Creating Partition of EBS Volume attached to Data Node

Here , the name of Volume attached to DN is /dev/xvdf .

fdisk /dev/xvdf
Partition of HD

Now , EBS Volume is partitioned but they are not adding volume to the DN . So , we have to format the partition in order to attach it to DN ( Instance )

Partition Created
mkfs.ext4 /dev/xvdf3mkdir /diyaksh_datamount /dev/xvdf3 /diyaksh_data
Volume added to DN after Formating

Similarly , We have to follow these steps in all the data node to Adding Volume to all the data node .

Now , we have to configure the Data-Node and start it .

Now , Here we can see that Actual Storage of Data Node is 48 GB but they are contributing only Specific amount of storage to the hadoop Cluster .

Now , Client can put the data in hadoop Cluster but here Name Node is in Safe Mode . So , First we have to off the Safe Node and then upload the data in Hadoop Cluster .

Here , Only 1 Instance is attached to the Name Node .

Now , Here we can see that all the Data Node is attached to the Name Node and Contributing Specific amount of Storage to the hadoop Cluster .

Since 4 data node is Attached to the Name node , Therefore data which is uploaded by client have by default 3 Replicas .

Thank You :)

--

--

Akash Pandey

I am a Computer Science Undergraduate , who is seeking for opportunity to do work in challenging work environment .