In this article we will see how to find Duplicate record in a fixed width file using awk command.
Input File:F_Input_File.txt
10001A1TRANSY10000
10001A1TRANSY10000
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000
Find out V_Duplicate records based on the value available from 1st position to 7th position.
$ awk '{V_Duplicate=substr($0,1,7);V_Filter[V_Duplicate]++} END{for(i in V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';
[OR]
$ awk '{V_Filter[substr($0,1,7)]++} END{for(i in V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';
Output:
10001A1|2
Explanation:
1. substr($0,1,7) will use complete record ($0) as string and then will slice a string from position 1 to 7.
2. Please follow the post How to Use Associative Array in AWK for further explanation on how awk array process the data.
Input File:F_Input_File.txt
10001A1TRANSY10000
10001A1TRANSY10000
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000
Find out V_Duplicate records based on the value available from 1st position to 7th position.
$ awk '{V_Duplicate=substr($0,1,7);V_Filter[V_Duplicate]++} END{for(i in V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';
[OR]
$ awk '{V_Filter[substr($0,1,7)]++} END{for(i in V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';
Output:
10001A1|2
Explanation:
1. substr($0,1,7) will use complete record ($0) as string and then will slice a string from position 1 to 7.
2. Please follow the post How to Use Associative Array in AWK for further explanation on how awk array process the data.
Nice article with each and every explanation for all the people who are starting with unix. :)
ReplyDelete