Tuesday, 19 January 2016

How To Convert Fixed Width File To Delimited Using SED In Unix/Linux

In this article, we will see how to convert a fixed width file into a delimited file using sed command. We will see step by step how sed perform this operation. Please go through with the following post How To Use & In Sed Command In Unix/Linux , if you have not.
We have a file F_Input_File.txt which have fixed width data, we have to allocate this data across 5 columns:
Col1: From 1 to 4
Col2: From 5 to 6
Col3: From 7 to 8
Col4: From 9 to 13
Col5: From 14 to 16
Input File: F_Input_File.txt
1001A1HR50000USA
Solution:
$ sed -e 's/./&,/4' -e 's/./&,/7' -e 's/./&,/10' -e 's/./&,/16' F_Inout_File.txt
Output:
1001,A1,HR,50000,USA
What is that? What exactly has happened? Let’s understand it.
Explanation: 
1. We can execute multiple sed command using -e switch.
2. dot (.) matches any number of character.

1st -e switch will read "1001A1HR50000USA" as input:
$ sed -e 's/./&,/4'
What is happening here ??
1.   Dot (.) matches everything until 4th occurrence of any character.
If you remember the syntax:
sed 's/reg_exp/replacement/[occurrence]' F_Input_File    
Occurrence
1
2
3
4
Data
1
0
0
1
REG_EXP
.
.
.
.

     2.  & will hold 1001
          Result of REG_EXP (.) = 1001
          Replacement (&,)=1001,
          After the 2nd -e switch record will be: 1001,A1HR50000USA

2nd -e switch will receive input as "1001,A1HR50000USA"
$ sed -e 's/./&,/7'
1.   dot (.) will match until 7th occurrence of any character.
Occurrence
1
2
3
4
5
6
7
Data
1
0
0
1
,
A
1
REG_EXP
.
.
.
.
.
.
.

2.   & will hold 1001,A1
Result of REG_EXP (.) = 1001,A1
Replacement (&,) =1001,A1,
After the 2nd -e switch record will be: 1001,A1,HR50000USA

3rd -e switch will receive input as "1001,A1,HR50000USA"
$ sed -e 's/./&,/10'
1.   Again, (dot) will match until 10th occurrence of any character.
Occurrence
1
2
3
4
5
6
7
8
9
10
Data
1
0
0
1
,
A
1
,
H
R
REG_EXP
.
.
.
.
.
.
.
.
.
.

2.   & will hold 1001,A1,HR
Result of REG_EXP (.) = 1001,A1,HR
Replacement (&,) =1001,A1,HR,          
After the 3rd -e switch record will be: 1001,A1,HR,50000USA

4th -e switch will receive input as "1001,A1,HR,50000USA"
$ sed -e 's/./&,/10'
1.  Dot(.) will match until 16th occurrence of any character.
Occurrence
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Data
1
0
0
1
,
A
1
,
H
R
,
5
0
0
0
0
REG_EXP
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

 2. & will hold 1001,A1,HR,50000USA
Result of REG_EXP (.) = 1001,A1,HR,50000
Replacement (&,) =1001,A1,HR,50000,
After the 4th -e switch record will be: 1001,A1,HR,50000,USA

Exactly what we want…!!!
Conclusion: We have already seen the same functionality using awk command, you may also check that out How to Convert Fixed Width File to Delimited Using AWK& is a very powerful switch available in sed command to format the data. 
Keep Reading, Keep Learning, Keep Sharing...!!!

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...