Big Data Hadoop Hive SQL Query Hello World
Prerequisite
Prerequisite
- Big Data
- Hadoop
- SQL
If you are reading this blog you should know about Big Data and Hadoop.
Big Data is a technology revolution in the RDBMS world, however big data hadoop distributed file system can be written as a flat file with different formats like CSV, Tab Delimited etc.,
Also in order to process these data you need to be an expert in Java to write a Map Reduce program.
To make use of Big Data for non-Java users like Data Analysts, there is feature to Query the flat files using SQL has been introduced. This is Apache Hive https://hive.apache.org/
http://en.wikipedia.org/wiki/Apache_Hive
Big Data is a technology revolution in the RDBMS world, however big data hadoop distributed file system can be written as a flat file with different formats like CSV, Tab Delimited etc.,
Also in order to process these data you need to be an expert in Java to write a Map Reduce program.
To make use of Big Data for non-Java users like Data Analysts, there is feature to Query the flat files using SQL has been introduced. This is Apache Hive https://hive.apache.org/
http://en.wikipedia.org/wiki/Apache_Hive
Hive was introduced by Facebook and now used by Netflix. It is a powerful querying tool in Big Data hadoop.
Basically Hive is capable of transforming your SQL queries into Map Reduce programs.
The following are the steps to be done
1. Create Hive Table with Meta data information
2. Load data into Hive Table ( 2 Types )
a. Loading data in local file system to Hadoop & Hive
b. Loading data in hadoop file system to Hive
3. Query the table
I have a Test Data as below, ( It has 2 fields ID & PHONE NAME)
1,iphone
2,blackberry
3,nokia
4,sony
5,samsung
6,htc
7,micromax
To get started you need have hive installed already and hadoop file system configured with Name Node, Job Tracker, Data Node, Task Tasker etc.,
Step 1: Launch the hive console from the command line / terminal
Basically Hive is capable of transforming your SQL queries into Map Reduce programs.
The following are the steps to be done
1. Create Hive Table with Meta data information
2. Load data into Hive Table ( 2 Types )
a. Loading data in local file system to Hadoop & Hive
b. Loading data in hadoop file system to Hive
3. Query the table
I have a Test Data as below, ( It has 2 fields ID & PHONE NAME)
1,iphone
2,blackberry
3,nokia
4,sony
5,samsung
6,htc
7,micromax
To get started you need have hive installed already and hadoop file system configured with Name Node, Job Tracker, Data Node, Task Tasker etc.,
Step 1: Launch the hive console from the command line / terminal
Step 2: Create the table with the
CREATE TABLE PHONE ( ID INT, PHONE_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
Step 3: Loading data in local file system to the table
LOAD DATA LOCAL INPATH '/home/training/PHONE.txt' OVERWRITE INTO TABLE PHONE;
Step 4: Query your table which is created and data loaded in Hive
select * from PHONE;
Well you should be good with local mode.
Let's have a quick peek at the Server Mode (Type 2). If you have to load data from Hadoop File to Hive, first we need to send the file from local file system to hadoop file system.
Step 1: Place the file from local file system to HDFS (Hadoop Distributed File System)
hadoop fs -put PHONE.txt
Step 2: Verify if the file has been placed in the HDFS
hadoop fs -ls PHONE*
Step 3: Create the table with meta data information
CREATE TABLE PHONE_SERVER ( ID INT, PHONE_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
Step 4: Load the data from HDFS to HIVE Table
LOAD DATA INPATH '/user/training/PHONE.txt' OVERWRITE INTO TABLE PHONE_SERVER;
Step 5: Verify by performing a SQL Query and check the results
select * from PHONE_SERVER;