Tuesday, 24 November 2015

AWK Basics For beginners

AWK Basic Syntax:

awk -F"<Delimiter>"  'BEGIN{INITIALIZATION}
                      {ACTION} #-- For every line in file
                      END{END BLOCK}'  <FILE_NAME>


1. BEGIN BLCOK: Block executes only once at the start of AWK command, used for initialization.
2. EXECUTION BLOCK: Block is the heart of AWK command carry all the processing logic.
3. END BLOCK: Block executes only once in the END.
4. FILE_NAME: holds the file that needs to be processed by AWK.


AWK views a text file as records and fields represent by $1 (First Field),$2 (Second Field)...so on. $0 has special meaning in AWK, it represent complete record means, it hold the current processing line from the file. Please go through earlier post on Inbuilt Variables in AWK.

Input File: F_Data_File.txt

EMPID|EMPNAME|EMPDEPT|EMPSAL|LOCATION
10001|A1|HR|10000|USA
10002|A2|FIN|20000|USA
10003|A3|NSS|30000|IND
10004|A4|SEC|40000|USA
10005|A5|TECH|50000|IND
10006|A6|TECH|60000|IND
10007|A7|TECH|70000|IND


1. Display complete file

$ awk '{print}' F_Data_File.txt
(or)
$ awk '{print $0}' F_Data_File.txt

2. Print only first field from the file

awk '{print $1}' F_Data_File.txt

Output:

EMPID|EMPNAME|EMPDEPT|EMPSAL|LOCATION
10001|A1|HR|10000|USA
10002|A2|FIN|20000|USA
10003|A3|NSS|30000|IND
10004|A4|SEC|40000|USA
10005|A5|TECH|50000|IND
10006|A6|TECH|60000|IND
10007|A7|TECH|70000|IND


Has printed complete file as we have not provided any delimiter(default delimiter is SPACE).How to give delimiter? Let see.

$ awk -F"|"  '{print $1}' F_Data_File.txt

Check -F switch, this switch is used to provide field separator. Above command will print only first field from the pipe delimited file.

Output:
EMPID
10001
10002
10003
10004
10005
10006
10007


3. print 1st & 2nd field separated by space

$ awk -F"|"  '{print $1 " " $2}' F_Data_File.txt

4. print 1st & 2nd field separated by <==>

$ awk -F"|"  '{print $1 "<==>" $2}' F_Data_File.txt

4. How to Initialize field Separator

$ awk 'BEGIN{FS="|";OFS=",";}{print $1 OFS $2 OFS $3 OFS $4 OFS $5}'  F_Data_File.txt

Here we have Initialized Input field separator (FS) and output Field Separator(OFS)
The output will be displayed as comma(,) separated.

Output:


EMPID,EMPNAME,EMPDEPT,EMPSAL,LOCATION
10001,A1,HR,10000,USA
10002,A2,FIN,20000,USA
10003,A3,NSS,30000,IND
10004,A4,SEC,40000,USA
10005,A5,TECH,50000,IND
10006,A6,TECH,60000,IND
10007,A7,TECH,70000,IND


5. Print row number in front of every record

$ awk 'BEGIN{FS="|";}{print NR ":" $0}'  F_Data_File.txt

Output:


1:EMPID|EMPNAME|EMPDEPT|EMPSAL|LOCATION
2:10001|A1|HR|10000|USA
3:10002|A2|FIN|20000|USA
4:10003|A3|NSS|30000|IND
5:10004|A4|SEC|40000|USA
6:10005|A5|TECH|50000|IND
7:10006|A6|TECH|60000|IND
8:10007|A7|TECH|70000|IND


6. Calculate Number of Field from every row

$ awk 'BEGIN{FS="|";}{print "Number of Field in Row:" NR " is=>" NF}'  F_Data_File.txt

Output:


Number of Field in Row:1 is=>5
Number of Field in Row:2 is=>5
Number of Field in Row:3 is=>5
Number of Field in Row:4 is=>5
Number of Field in Row:5 is=>5
Number of Field in Row:6 is=>5
Number of Field in Row:7 is=>5
Number of Field in Row:8 is=>5


7. Print first & Last field from the file

$ awk 'BEGIN{FS="|"}{print $1 "-->" $NF}'  F_Data_File.txt

Output:


EMPID-->LOCATION
10001-->USA
10002-->USA
10003-->IND
10004-->USA
10005-->IND
10006-->IND
10007-->IND



8. Print only First line of the file

$ awk 'BEGIN{FS="|";}NR==1{print $0}'  F_Data_File.txt

Output:


EMPID|EMPNAME|EMPDEPT|EMPSAL|LOCATION

9. Read file from 3rd row

$ awk 'BEGIN{FS="|";}NR>=3{print $0}'  F_Data_File.txt

Output:


10002|A2|FIN|20000|USA
10003|A3|NSS|30000|IND
10004|A4|SEC|40000|USA
10005|A5|TECH|50000|IND
10006|A6|TECH|60000|IND
10007|A7|TECH|70000|IND


10. Read line in multiple of 3 i.e. 3rd,5th,8th so on

$ awk 'BEGIN{FS="|";}NR%3==0{print $0}'  F_Data_File.txt

Output:


10002|A2|FIN|20000|USA
10005|A5|TECH|50000|IND

11. Print LAST line of the File

$ awk 'END{print}'  F_Data_File.txt

12. Count number of Rows in a file (wc-l F_Data_File.txt)

$ awk -F"|" '{V_Row_Cnt++}END{print V_Row_Cnt}' F_Data_File.txt
(or)

$ awk 'END { print NR }' F_Data_File.txt

Output: 8

13. Calculate total salary of the employees:

$ awk -F"|" 'NR>1{V_Sum_Sal=$4 + V_Sum_Sal }END{print V_Sum_Sal}' F_Data_File.txt

Output: 280000

NR>1 as first row is header.

14. Count empty lines from the file.

$ awk 'NF==0{print NR":"}' F_Data_File.txt
(or)
$ awk '/^$/{print NR":"}' F_Data_File.txt


Output: Will display the line number which are empty
5:
7:

NF stands for number of fields, if number of fields are 0 means row does not have any data. ^ represent start of the line and $ represent end of line, and if there is nothing in between 6 and $ means line is empty.

$ awk 'NF==0{V_Count++}{print V_Count}' F_Data_File.txt
(or)
$ awk '/^$/{V_Count++}{print V_Count}' F_Data_File.txt


Output: 2

15. Remove Empty lines from the file.

$ awk 'NF' F_Data_File.txt
$ awk 'NF > 0' F_Data_File.txt
$ awk '!NF==0{print NR":"}' F_Data_File.txt
$ awk '!/^$/{print NR":"}' F_Data_File.txt
$ awk '/./{print NR":"}' F_Data_File.txt


Output: All the above command will display non-empty lines available in the file.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...