Showing posts with label count of max in hive. Show all posts
Showing posts with label count of max in hive. Show all posts

Tuesday, July 15, 2014

Big Data Hadoop Hive Getting Max of a Count


This Example about a query to get Max of a count.

Pre-Requiste
- Basic Hadoop Knowledge
- Basic Hive Knowledge
- Basic SQL Knowledge


My Input Data is something like as below,

1,iphone,2000,abc
2,iphone,3000,abc1
3,nokia,4000,abc2
4,sony,5000,abc3
5,nokia,6000,abc4
6,iphone,7000,abc5
7,nokia,8500,abc6
Problem:
In Hive we can perform group by as we do in ANSI SQL Example as below,

select d.phnName,count(*) from phnDetails d group by d.phnName

The output of the above query is as below,

iphone 3
nokia 3
sony 1

You might have a scenario something like to retrieve only values for equal to Max

Example if you need output like as below,

iphone 3
nokia 3

Resolution:

We need to use multiple Sub Queries to perform this operation

select c.phnName, c.counter 
from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c 
join 
(select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f) g 
where c.counter = g.countmax;

Output is as below,



Queries built in multiple iterations as below,

CREATE TABLE phnDetails ( id INT, phnName STRING, price INT, details STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/training/Phone/phones.txt' OVERWRITE INTO TABLE phnDetails;

select * from phnDetails;


select d.phnName, count(*) from phnDetails d group by d.phnName;


select c.phnName, c.counter from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c ;

select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f ;

select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f ;

select c.phnName, c.counter 
from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c 
join 
(select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f) g 
where c.counter = g.countmax;