Monday, 16 November 2015

How to Perform Group by Based on Single Column using AWK

Input Data : F_Data_File.txt

PRODUCT:PRICE
IPHONE:60000
MI:20000
NOKIA:20000
IPHONE:60000
MI:20000
NOKIA:20000
IPHONE:60000
MI:20000
NOKIA:20000
IPHONE:60000
IPHONE:60000
MI:20000

Desired Output:

MI<->4<->80000
IPHONE<->5<->300000
NOKIA<->3<->60000

Solution:

$ awk 'BEGIN {FS=":";OFS="<->"}
NR==1 {print "PRODUCT:COUNT:SUM(PRICE)"}
NR>1 {V_Count[$1]++;V_Sum[$1]=V_Sum[$1]+$2}
END{for (k in V_Count) print k,V_Count[k],V_Sum[k]}' F_Data_File.txt

More simplified:

$ awk 'BEGIN {FS=":";OFS="<->"}
NR==1 {print "PRODUCT:COUNT:SUM(PRICE)"}
NR>1 {V_Count[$1]++}
NR>1 {V_Sum[$1]=V_Sum[$1]+$2}
END{for (k in V_Count) print k,V_Count[k],V_Sum[k]}' F_Data_File.txt

Debug Mode:

$ awk 'BEGIN {FS=":";OFS="<->"}
NR==1 {print "PRODUCT:COUNT:SUM(PRICE)"}
NR>1 {V_Count[$1]++; print $1 "--->"V_Count[$1]}
NR>1 {V_Sum[$1]=V_Sum[$1]+$2; print $1 "---"V_Sum[$1]}
END{print "--* Start of END block---"; for (k in V_Count) print k,V_Count[k],V_Sum[k]}' F_Data_File.txt
     
Output of Debug mode:

PRODUCT:COUNT:SUM(PRICE)
IPHONE--->1
IPHONE---60000
MI--->1
MI---20000
NOKIA--->1
NOKIA---20000
IPHONE--->2
IPHONE---120000
MI--->2
MI---40000
NOKIA--->2
NOKIA---40000
IPHONE--->3
IPHONE---180000
MI--->3
MI---60000
NOKIA--->3
NOKIA---60000
IPHONE--->4
IPHONE---240000
IPHONE--->5
IPHONE---300000
MI--->4
MI---80000
--* Start of END block---
MI<->4<->80000
IPHONE<->5<->300000
NOKIA<->3<->60000

Explanation:

1. V_Count array holds the product as index and if same index is available in array it will just increase the count.
2. V_Sum array hold the product as index and storing SUM associated with that Product.
3. NR is a standard AWK variable stands for Number of Rows.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...