Tuesday, 22 December 2015

How To Find Duplicate Records In Fixed Width File In UNix/Linux

In this article we will see how to find Duplicate record in a fixed width file using awk command.

Input File:F_Input_File.txt

10001A1TRANSY10000
10001A1TRANSY10000
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000


Find out V_Duplicate records based on the value available from 1st position to 7th position.
$ awk '{V_Duplicate=substr($0,1,7);V_Filter[V_Duplicate]++} END{for(i in  V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';

[OR]

$ awk '{V_Filter[substr($0,1,7)]++} END{for(i in  V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';
Output:
10001A1|2


Explanation:
1. substr($0,1,7) will use complete record ($0) as string and then will slice a string from position 1 to 7.
2. Please follow the post How to Use Associative Array in AWK for further explanation on how awk array process the data.

1 comment:

  1. Nice article with each and every explanation for all the people who are starting with unix. :)

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...