How To Find Duplicate Records In Fixed Width File In UNix/Linux

December 22, 2015

How To Find Duplicate Records In Fixed Width File In UNix/Linux

In this article we will see how to find Duplicate record in a fixed width file using awk command.

Input File:F_Input_File.txt

10001A1TRANSY10000
10001A1TRANSY10000
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000

Find out V_Duplicate records based on the value available from 1st position to 7th position.
$ awk '{V_Duplicate=substr($0,1,7);V_Filter[V_Duplicate]++} END{for(i in V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';

[OR]
$ awk '{V_Filter[substr($0,1,7)]++} END{for(i in V_Filter) {print i"|" V_Filter[i]}}' F_Input_File.txt | awk -F"|" '$2>1{print $0}';
Output:
10001A1|2

Explanation:
1. substr($0,1,7) will use complete record ($0) as string and then will slice a string from position 1 to 7.
2. Please follow the post How to Use Associative Array in AWK for further explanation on how awk array process the data.

Comments

NITIKA JAIN23 December 2015 at 03:05
Nice article with each and every explanation for all the people who are starting with unix. :)
ReplyDelete
Replies

Add comment

Search This Blog

SHELL SCRIPTING

How To Find Duplicate Records In Fixed Width File In UNix/Linux

Comments

Post a Comment

Popular Posts

How To Replace Text Using Sed in Unix/Linux

How to Delete Lines Using Sed Command In Unix/Linux