Showing posts with label Hive. Show all posts
Showing posts with label Hive. Show all posts

Wednesday, July 16, 2014

Big Data Hadoop Hive SQL Query Hello World

Big Data Hadoop Hive SQL Query Hello World

Prerequisite
  • Big Data 
  • Hadoop 
  • SQL

If you are reading this blog you should know about Big Data and Hadoop.

Big Data is a technology revolution in the RDBMS world, however big data hadoop distributed file system can be written as a flat file with different formats like CSV, Tab Delimited etc.,

Also in order to process these data you need to be an expert in Java to write a Map Reduce program.

To make use of Big Data for non-Java users like Data Analysts, there is feature to Query the flat files using SQL has been introduced. This is Apache Hive https://hive.apache.org/

http://en.wikipedia.org/wiki/Apache_Hive

Hive was introduced by Facebook and now used by Netflix. It is a powerful querying tool in Big Data hadoop.

Basically Hive is capable of transforming your SQL queries into Map Reduce programs.


The following are the steps to be done

1. Create Hive Table with Meta data information
2. Load data into Hive Table ( 2 Types )
      a. Loading data in local file system to Hadoop & Hive
      b. Loading data in hadoop file system to Hive
3. Query the table


I have a Test Data as below, ( It has 2 fields ID & PHONE NAME)

1,iphone
2,blackberry
3,nokia
4,sony
5,samsung
6,htc
7,micromax


To get started you need have hive installed already and hadoop file system configured with Name Node, Job Tracker, Data Node, Task Tasker etc.,


Step 1:  Launch the hive console from the command line / terminal



Step 2:  Create the table with the 

CREATE TABLE PHONE ( ID INT, PHONE_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;



Step 3:  Loading data in local file system to the table 

LOAD DATA LOCAL INPATH '/home/training/PHONE.txt' OVERWRITE INTO TABLE PHONE;


Step 4:  Query your table which is created and data loaded in Hive

select * from PHONE;



Well you should be good with local mode.


Let's have a quick peek at the Server Mode (Type 2). If you have to load data from Hadoop File to Hive, first we need to send the file from local file system to hadoop file system.


Step 1: Place the file from local file system to HDFS (Hadoop Distributed File System)

hadoop fs -put PHONE.txt



Step 2: Verify if the file has been placed in the HDFS

hadoop fs -ls PHONE*



Step 3:  Create the table with meta data information

CREATE TABLE PHONE_SERVER ( ID INT, PHONE_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;



Step 4:  Load the data from HDFS to HIVE Table

LOAD DATA INPATH '/user/training/PHONE.txt' OVERWRITE INTO TABLE PHONE_SERVER;

Step 5: Verify by performing a SQL Query and check the results

select * from PHONE_SERVER;





Tuesday, July 15, 2014

Big Data Hadoop Hive Getting Max of a Count


This Example about a query to get Max of a count.

Pre-Requiste
- Basic Hadoop Knowledge
- Basic Hive Knowledge
- Basic SQL Knowledge


My Input Data is something like as below,

1,iphone,2000,abc
2,iphone,3000,abc1
3,nokia,4000,abc2
4,sony,5000,abc3
5,nokia,6000,abc4
6,iphone,7000,abc5
7,nokia,8500,abc6
Problem:
In Hive we can perform group by as we do in ANSI SQL Example as below,

select d.phnName,count(*) from phnDetails d group by d.phnName

The output of the above query is as below,

iphone 3
nokia 3
sony 1

You might have a scenario something like to retrieve only values for equal to Max

Example if you need output like as below,

iphone 3
nokia 3

Resolution:

We need to use multiple Sub Queries to perform this operation

select c.phnName, c.counter 
from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c 
join 
(select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f) g 
where c.counter = g.countmax;

Output is as below,



Queries built in multiple iterations as below,

CREATE TABLE phnDetails ( id INT, phnName STRING, price INT, details STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/training/Phone/phones.txt' OVERWRITE INTO TABLE phnDetails;

select * from phnDetails;


select d.phnName, count(*) from phnDetails d group by d.phnName;


select c.phnName, c.counter from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c ;

select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f ;

select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f ;

select c.phnName, c.counter 
from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c 
join 
(select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f) g 
where c.counter = g.countmax;