In this article, we will see how to convert a fixed width file into a delimited file using sed command. We will see step by step how sed perform this operation. Please go through with the following post How To Use & In Sed Command In Unix/Linux , if you have not.
We have a file F_Input_File.txt which have fixed width data, we have to allocate this data across 5 columns:
Col1: From 1 to 4
Col2: From 5 to 6
Col3: From 7 to 8
Col4: From 9 to 13
Col5: From 14 to 16
Input File: F_Input_File.txt
1001A1HR50000USA
Solution:
$ sed -e 's/./&,/4' -e 's/./&,/7' -e 's/./&,/10' -e 's/./&,/16' F_Inout_File.txt
Output:
1001,A1,HR,50000,USA
What is that? What exactly has happened? Let’s understand it.
Explanation:
1. We can execute multiple sed command using -e switch.
2. dot (.) matches any number of character.
$ sed -e 's/./&,/4'
What is happening here ??
1. Dot (.) matches everything until 4th occurrence of any character.
If you remember the syntax:
sed 's/reg_exp/replacement/[occurrence]' F_Input_File
sed 's/reg_exp/replacement/[occurrence]' F_Input_File
Occurrence
|
1
|
2
|
3
|
4
|
Data
|
1
|
0
|
0
|
1
|
REG_EXP
|
.
|
.
|
.
|
.
|
2. & will hold 1001
Result of REG_EXP (.) = 1001
Replacement (&,)=1001,
After the 2nd -e switch record will be: 1001,A1HR50000USA
After the 2nd -e switch record will be: 1001,A1HR50000USA
2nd -e switch will receive input as "1001,A1HR50000USA"
$ sed -e 's/./&,/7'
1. dot (.) will match until 7th occurrence of any character.
Occurrence
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
Data
|
1
|
0
|
0
|
1
|
,
|
A
|
1
|
REG_EXP
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
2. & will hold 1001,A1
Result of REG_EXP (.) = 1001,A1
Replacement (&,) =1001,A1,
After the 2nd -e switch record will be: 1001,A1,HR50000USA
After the 2nd -e switch record will be: 1001,A1,HR50000USA
3rd -e switch will receive input as "1001,A1,HR50000USA"
$ sed -e 's/./&,/10'
1. Again, (dot) will match until 10th occurrence of any character.
Occurrence
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
Data
|
1
|
0
|
0
|
1
|
,
|
A
|
1
|
,
|
H
|
R
|
REG_EXP
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
2. & will hold 1001,A1,HR
Result of REG_EXP (.) = 1001,A1,HR
Replacement (&,) =1001,A1,HR,
After the 3rd -e switch record will be: 1001,A1,HR,50000USA
4th -e switch will receive input as "1001,A1,HR,50000USA"
$ sed -e 's/./&,/10'
1. Dot(.) will match until 16th occurrence of any character.
Occurrence
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
Data
|
1
|
0
|
0
|
1
|
,
|
A
|
1
|
,
|
H
|
R
|
,
|
5
|
0
|
0
|
0
|
0
|
REG_EXP
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
2. & will hold 1001,A1,HR,50000USA
Result of REG_EXP (.) = 1001,A1,HR,50000
Replacement (&,) =1001,A1,HR,50000,
After the 4th -e switch record will be: 1001,A1,HR,50000,USA
Exactly what we want…!!!
Conclusion: We have already seen the same functionality using awk command, you may also check that out How to Convert Fixed Width File to Delimited Using AWK. & is a very powerful switch available in sed command to format the data.
Keep Reading, Keep Learning, Keep Sharing...!!!
No comments:
Post a Comment