In this article we will see how can we redirect the duplicate & unique records from a fixed width file in two different files. The below solution can be implemented for delimited files as well. let see how can we implement it. Duplicate records based on the value available from 1st position to 7th position and redirect in a log file.
Please go through How if-else Works Without Operators In AWK for more understanding of this post.
Please go through How if-else Works Without Operators In AWK for more understanding of this post.
Input File:F_Input_File.txt
10001A1TRANSY10000
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000
if (V_Filter[V_Duplicate]++)
{
print $0 > "F_Dup_Record.log"
}
else
{
print $0 >"F_Unique_Record.txt"
}
} ' F_Input_File.txt
[OR]
You can write above syntax as below and can reduce some line of code. But the 1st one is more descrptive and understandable.
$ awk '{if (V_Filter[substr($0,1,7)]++)
{
print $0 > "F_Dup_Record.log"
}
else
{
print $0 >"F_Unique_Record.txt"
}
} ' F_Input_File.txt
Output:
1. F_Dup_Record.log will hold all the V_Duplicate records.
2. F_Unique_Record.txt will hold all the unique records(It will also have the a copy of the V_Duplicate record).
$ cat F_Unique_Record.txt
10001A1TRANSY10000
10002A2MEDY20000
10003A3FINN10000
10004A4HRY20000
10005A5CSRN50000
$ cat F_Dup_Record.log
10001A1TRANSY10000
Explanation:
To understand, how this command is working, first we need to understand how a++ works.
a++ is postfix operation, meaning that the value of a will get changed after the evaluation of expression or in other word it says add 1 to a, returns the old value of a i.e return 0.
Assume we have 3 records in file as below:
aabb
aa
Let see our case, array V_Filter[V_Duplicate]++ says add 1 to the VALUE of V_Filter[V_Duplicate] and return the old value of V_Filter[V_Duplicate] which is 0 at the start.
Step 1: V_Filter[aa] = V_Filter[aa] + 1 => will return 0 as old value and assign 1 to V_Filter[aa] after the evolution; as 0 is considered false, if evaluate it to failure of condition hence passed to the unique(else) block.
Step 2: V_Filter[bb] = V_Filter[bb] + 1 => will return 0 as old value assign 1 to V_Filter[bb] ; as 0 is considered false, if evalute it to failure of condition hence passed to the unique(else) block.
Step 3: When 3rd record will be processed, scenario would be something like below:
V_Filter[aa] = V_Filter[aa] + 1 => will return old value, for V_Filter[aa] old value is 1 from the step 1, and assign new value to V_Filter[aa] which is 2 after evalution. as 1(non-zero : Old return value) is considered true hence passed to the V_Duplicate(if) block.
Note: We need to focus on what is being returned during the postfix operation, that is old value. If you focus on old value you will understand the logic used here.V_Filter[aa] = V_Filter[aa] + 1 => will return old value, for V_Filter[aa] old value is 1 from the step 1, and assign new value to V_Filter[aa] which is 2 after evalution. as 1(non-zero : Old return value) is considered true hence passed to the V_Duplicate(if) block.
No comments:
Post a Comment