In Previous post, How To Find Duplicate Records in File Using AWK, we have seen how to find duplicate records, here we will see how to remove duplicate records using AWK command.
Input File: F_Data_File.txt
EMPID|ENAME|DEPT|FLAG
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
10001|A1|TRANS|Y
10003|A3|FIN|N
$ awk '!($0 in V_Uniq_Rec) {V_Uniq_Rec[$0];print}' F_Data_File.txt
Output:
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
Explanation:
It reads, If a record ($0) is not available in array V_Uniq_Rec then return true and store it in array V_Uniq_Rec. If it is already available in array V_Uniq_Rec then it will return FALSE and will not store in array and will be eliminated. This command will not change the file data, it will display the unique records in console.
Another way of writing above AWK command is which is very common and used frequently to remove duplicate.
$ awk '!V_Uniq_Rec[$0]++' F_Data_File.txt
Output:
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
Input File: F_Data_File.txt
EMPID|ENAME|DEPT|FLAG
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
10001|A1|TRANS|Y
10003|A3|FIN|N
$ awk '!($0 in V_Uniq_Rec) {V_Uniq_Rec[$0];print}' F_Data_File.txt
Output:
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
It reads, If a record ($0) is not available in array V_Uniq_Rec then return true and store it in array V_Uniq_Rec. If it is already available in array V_Uniq_Rec then it will return FALSE and will not store in array and will be eliminated. This command will not change the file data, it will display the unique records in console.
Another way of writing above AWK command is which is very common and used frequently to remove duplicate.
$ awk '!V_Uniq_Rec[$0]++' F_Data_File.txt
Output:
10001|A1|TRANS|Y
10002|A2|MED|Y
10003|A3|FIN|N
10004|A4|HR|Y
10005|A5|CSR|N
No comments:
Post a Comment