Zack Tutorials: Big Data Hadoop Hive SQL Query Hello World

Wednesday, July 16, 2014

Big Data Hadoop Hive SQL Query Hello World

Big Data Hadoop Hive SQL Query Hello World

Prerequisite

Big Data
Hadoop
SQL

If you are reading this blog you should know about Big Data and Hadoop.

Big Data is a technology revolution in the RDBMS world, however big data hadoop distributed file system can be written as a flat file with different formats like CSV, Tab Delimited etc.,

Also in order to process these data you need to be an expert in Java to write a Map Reduce program.

To make use of Big Data for non-Java users like Data Analysts, there is feature to Query the flat files using SQL has been introduced. This is Apache Hive https://hive.apache.org/

http://en.wikipedia.org/wiki/Apache_Hive

Hive was introduced by Facebook and now used by Netflix. It is a powerful querying tool in Big Data hadoop.

Basically Hive is capable of transforming your SQL queries into Map Reduce programs.

The following are the steps to be done

1. Create Hive Table with Meta data information
2. Load data into Hive Table ( 2 Types )
a. Loading data in local file system to Hadoop & Hive
b. Loading data in hadoop file system to Hive
3. Query the table

I have a Test Data as below, ( It has 2 fields ID & PHONE NAME)

1,iphone
2,blackberry
3,nokia
4,sony
5,samsung
6,htc
7,micromax

To get started you need have hive installed already and hadoop file system configured with Name Node, Job Tracker, Data Node, Task Tasker etc.,

Step 1: Launch the hive console from the command line / terminal

Step 2: Create the table with the

CREATE TABLE PHONE ( ID INT, PHONE_NAME STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

Step 3: Loading data in local file system to the table

LOAD DATA LOCAL INPATH '/home/training/PHONE.txt' OVERWRITE INTO TABLE PHONE;

Step 4: Query your table which is created and data loaded in Hive

select * from PHONE;

Well you should be good with local mode.

Let's have a quick peek at the Server Mode (Type 2). If you have to load data from Hadoop File to Hive, first we need to send the file from local file system to hadoop file system.

Step 1: Place the file from local file system to HDFS (Hadoop Distributed File System)

hadoop fs -put PHONE.txt