Friday, 22 January 2016

Split Nth Column of A CSV File Into Multiple Columns

Scenario 1: Where Each Column Is Enclosed With In The Double Quotes(")

Input File: F_Input_File.txt
"1001/A1/HR","Developer/Unix","50000/USA"

Output File: F_Output_File.txt
"1001/A1/HR","Developer/Unix","50000","USA"

We want to break 3rd column of the csv (, delimited) file into 2 columns hence output file will have 4 columns ( initially it is 3)

$ awk -F"," 'BEGIN{OFS=","}{ gsub("/","\",\"",$3);}1' F_Input_File.txt > F_Output_File.txt

[OR]

$ awk -F"," 'BEGIN{OFS=","}{ gsub("/","\",\"",$3);print}' F_Input_File.txt > F_Output_File.txt

[OR]

$ awk -F"," 'BEGIN{OFS=","}{ gsub("/","\",\"",$3);print $0}' F_Input_File.txt > F_Output_File.txt

Output:
"1001/A1/HR","Developer/Unix","50000","USA"

Explanation:
 
      SYNTAX: gsub("<reg_exp/ToBeSearched>","<Replacement>","<String>")
  • gsub function is used to replace all the occurrence of forward slash(/) with comma(,) in 3rd column ($3)
  • As we need to enclosed newly created columns in double quotes(") hence has used backward slash(\), escape character (In Green), in 2nd attribute of gsub function to escape double quote in red colour. If we do not put escape character then gsub will raise invalid number of argument error. Double quotes in Pink colour are regular quotes.
  • 1 is used to print the complete line so does print and print $0 as shown in 3 different ways.

Scenario 2: Where Each Column Is Not Enclosed With In The Double Quotes(")

Input File: F_Input_File.txt
1001/A1/HR,Developer/Unix,50000/USA

Output File: F_Output_File.txt
1001/A1/HR,Developer/Unix,50000,USA

$ awk -F"," 'BEGIN{OFS=","}{gsub("/",",",$3);}1' F_Input_File.txt > F_Output_File.txt

[OR]

$ awk -F"," 'BEGIN{OFS=","}{gsub("/",",",$3);print}' F_Input_File.txt > F_Output_File.txt

[OR]

$ awk -F"," 'BEGIN{OFS=","}{ gsub("/",",",$3);print $0}' F_Input_File.txt > F_Output_File.txt

Output:
1001/A1/HR,Developer/Unix,50000,USA

Explanation: 
We have used the same logic, the only difference is the backward slash(\), escape character(in Green) is not used in 2nd parameter of gsub as this time we do not need to enclose our new columns in double quote(")

Keeping Reading, Keeping Learning, Keeping Sharing.....!!!

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...