Contributing Specific Amount of Data Node Storage to Hadoop Cluster
Hey , In this Article you will get to know how to contribute Limited/Specific amount of Storage as data/slave node to Hadoop Cluster by Creating Linux Partitions . Here , I set-up Hadoop cluster on AWS Cloud so we don’t need to provide our resources , we can utilise the resources ,such as RAM/CPU , Hard-disks , etc , of AWS Cloud .
What is Big data Hadoop ?
Steps to Contribute Specific Amount of Data Node Storage to Hadoop Cluster -
- Launched Name Node and 4 Data Node on AWS Cloud
- Created 4 EBS Volume and attached to each Data Node
- Install hadoop and jdk software in all the Data Node ,Name Node and Hadoop Client .
- Created Partition of Attached Volume in Data Node
- Format the Partition in every Data Node
- Mount the partition in Directory to store data
- Configure Name Node and Data Node
- Contributed Specific amount of Data Node storage to Hadoop Cluster
1) How to launch Instance on AWS Cloud ?
2) Creation of EBS Volume
Here , I Created 4 EBS Volume of 20GB in ap-south-1a.
EBS Volume are created but they are not attached to any Instance , So their state is “available” means they can be attached to any instance and EBS Volume whose state is “in-use” .
3) How to Attach EBS Volume to Data Node ( EC2 Instance )
Here , We have to provide Instance ID so that EBS Volume attached to that Instance . By default , device name is “/dev/sdf”
Now , All the EBS Volume are attached to Data Node and their state are “in-use” . It means they are attached to Instance .
4) Configuring Name-Node
First we have to install JDK and Hadoop .
5) Adding EBS Volume to Data Node
Though EBS Volume is attached to the Data Node but they are providing any Additional Volume to the Data Node .
So , In Order to Add Volume to the Instance ( DN ) we have to do the following -
- Creation of Partition
- Format the Partition
- Mounting the Partition in 1 Directory
Creating Partition of EBS Volume attached to Data Node
Here , the name of Volume attached to DN is /dev/xvdf .
fdisk /dev/xvdf
Now , EBS Volume is partitioned but they are not adding volume to the DN . So , we have to format the partition in order to attach it to DN ( Instance )
mkfs.ext4 /dev/xvdf3mkdir /diyaksh_datamount /dev/xvdf3 /diyaksh_data
Similarly , We have to follow these steps in all the data node to Adding Volume to all the data node .
Now , we have to configure the Data-Node and start it .
Now , Here we can see that Actual Storage of Data Node is 48 GB but they are contributing only Specific amount of storage to the hadoop Cluster .
Now , Client can put the data in hadoop Cluster but here Name Node is in Safe Mode . So , First we have to off the Safe Node and then upload the data in Hadoop Cluster .
Here , Only 1 Instance is attached to the Name Node .
Now , Here we can see that all the Data Node is attached to the Name Node and Contributing Specific amount of Storage to the hadoop Cluster .
Since 4 data node is Attached to the Name node , Therefore data which is uploaded by client have by default 3 Replicas .
Thank You :)