BIG DATA

Akash Pandey
4 min readSep 16, 2020

What is Big Data ?

Big Data is not a technology but it is a problem in a Data World ( Storage Problem) .

The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. There are challenges to managing such a huge volume of data such as capture, store, data analysis, data transfer, data sharing, etc.

Many differenet Companies define Big Data in different way . Like , Oracle define it as -

“Big data is the data characterized by 4 key attributes: volume, variety, velocity and value”

Google define it as -

extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions”

IBM define it as -

Big data is the data characterized by 3 attributes: volume, variety, and velocity.”

10 Years Back , When we moved to desktops Data at that time is in Megabyte or Gigabyte . But Now a days , data increased exponently and reached TerraBytes and PetaBytes .

Now today, big companies like Microsoft, Google, eBay,IBM ,Oracle , Amazon, and Netflix all have a lot of data about users. The whole data in the world has doubled in just the last two years. Today, data is available in Exabytes and Zettabytes.

The 3 V’s

Your data is priceless — but only if you can use it. Your business needs to conquer the three ‘V’s of big data (Volume, Variety & Velocity) and turn it into information you can act on.

3 V’s of Bigdata

Big Data Companies to know

  • Google
  • IBM
  • Alteryx
  • Salesforce
  • Amazon
  • Microsoft
  • Oracle
  • Segment
  • VMWare
  • Facebook

Google

For the last two decades Google has evolved and exponentially grown from the search engine we all know and use to the multinational company it is today. Google provides integrated and end to end Big Data solutions based on innovation at Google and help the different organization to capture, process, analyze and transfer a data in a single platform. Google is expanding its Big Data Analytics; BigQuery is a cloud-based analytics platform that analyzes a huge set of data quickly.

IBM

IBM is the biggest vendor for Big Data-related products and services. IBM Big Data solutions provide features such as store data, manage data and analyze data.

There are numerous sources from where this data comes and accessible to all users, Business Analysts, Data Scientist, etc. DB2, Informix, and InfoSphere are popular database platforms by IBM which supports Big Data Analytics. There are also famous analytics applications by IBM such as Cognos and SPSS

Oracle

Oracle offers fully integrated cloud applications, platform services with more than 420,000 customers and 136,000 employees across 145 countries. It has a Market capitalization of $182.2 billion and sales of $37.4 B as per Forbes list.Oracle leverages the benefits of big data in the cloud. It helps organizations to define its data strategy and approach which includes big data and cloud technology.

Amazon

Amazon is well known for its cloud-based platform. It also offers Big Data products and its main product is Hadoop-based Elastic MapReduce. DynamoDB Big Data database, the redshift, and NoSQL are data warehouses and are work with Amazon Web Services.

Microsoft

Recently Microsoft has acquired Revolution Analytics which is a Big Data Analytics platform written in “R” programming language. This language used for building Big Data apps that do not require a skill of Data Scientist.

Why Big Data ?

Now, the biggest question here is why companies rely on Big Data and what are its main advantages.

The answer to this question is straight forward. Companies use this data for many purposes, the main purpose being understanding the customer behavior — their liking, their understandings, etc. Moreover, it is used for predictions and more.

How to Solve Bigdata Problem ?

Distributed Storage Concept

Hadoop solves the Big data problem using the concept HDFS (Hadoop Distributed File System). … Hadoop solves the problem of Big data by storing the data in distributed form in different machines. There are plenty of data but that data have to be store in a cost effective way and process it efficiently.

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. History. Today’s World.

Thank You

--

--

Akash Pandey

I am a Computer Science Undergraduate , who is seeking for opportunity to do work in challenging work environment .