Wednesday, July 16, 2014

Big Data Hadoop Hive SQL Query Hello World

Big Data Hadoop Hive SQL Query Hello World

Prerequisite
  • Big Data 
  • Hadoop 
  • SQL

If you are reading this blog you should know about Big Data and Hadoop.

Big Data is a technology revolution in the RDBMS world, however big data hadoop distributed file system can be written as a flat file with different formats like CSV, Tab Delimited etc.,

Also in order to process these data you need to be an expert in Java to write a Map Reduce program.

To make use of Big Data for non-Java users like Data Analysts, there is feature to Query the flat files using SQL has been introduced. This is Apache Hive https://hive.apache.org/

http://en.wikipedia.org/wiki/Apache_Hive

Hive was introduced by Facebook and now used by Netflix. It is a powerful querying tool in Big Data hadoop.

Basically Hive is capable of transforming your SQL queries into Map Reduce programs.


The following are the steps to be done

1. Create Hive Table with Meta data information
2. Load data into Hive Table ( 2 Types )
      a. Loading data in local file system to Hadoop & Hive
      b. Loading data in hadoop file system to Hive
3. Query the table


I have a Test Data as below, ( It has 2 fields ID & PHONE NAME)

1,iphone
2,blackberry
3,nokia
4,sony
5,samsung
6,htc
7,micromax


To get started you need have hive installed already and hadoop file system configured with Name Node, Job Tracker, Data Node, Task Tasker etc.,


Step 1:  Launch the hive console from the command line / terminal



Step 2:  Create the table with the 

CREATE TABLE PHONE ( ID INT, PHONE_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;



Step 3:  Loading data in local file system to the table 

LOAD DATA LOCAL INPATH '/home/training/PHONE.txt' OVERWRITE INTO TABLE PHONE;


Step 4:  Query your table which is created and data loaded in Hive

select * from PHONE;



Well you should be good with local mode.


Let's have a quick peek at the Server Mode (Type 2). If you have to load data from Hadoop File to Hive, first we need to send the file from local file system to hadoop file system.


Step 1: Place the file from local file system to HDFS (Hadoop Distributed File System)

hadoop fs -put PHONE.txt



Step 2: Verify if the file has been placed in the HDFS

hadoop fs -ls PHONE*



Step 3:  Create the table with meta data information

CREATE TABLE PHONE_SERVER ( ID INT, PHONE_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;



Step 4:  Load the data from HDFS to HIVE Table

LOAD DATA INPATH '/user/training/PHONE.txt' OVERWRITE INTO TABLE PHONE_SERVER;

Step 5: Verify by performing a SQL Query and check the results

select * from PHONE_SERVER;