Sunday, 22 November 2015

How to Remove Duplicate Records in File Using AWK

In Previous post, How To Find Duplicate Records in File Using AWK, we have seen how to find duplicate records, here we will see how to remove duplicate records using AWK command.

Input File: F_Data_File.txt

EMPID|ENAME|DEPT|FLAG
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
10001|A1|TRANS|Y
10003|A3|FIN|N

$ awk '!($0 in V_Uniq_Rec) {V_Uniq_Rec[$0];print}' F_Data_File.txt

Output:
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N

Explanation:

It reads, If a record ($0) is not available in array V_Uniq_Rec then return true and store it in array V_Uniq_Rec. If it is already available in array V_Uniq_Rec then it will return FALSE and will not store in array and will be eliminated. This command will not change the file data, it will display the unique records in console.

Another way of writing above AWK command is which is very common and used frequently to remove duplicate.

$ awk '!V_Uniq_Rec[$0]++' F_Data_File.txt

Output:
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...