First of all, I Wish You All a Happy & Prosperous New Year 2016. Wish you all Knowledge, Success & Healthy life.
Keep Reading, Keep Learning, Keep Sharing...!!!
This article will describe the practical uses of split command in Unix/Linux. Split command is used to split the large file in multiple small files based on size and lines. let see how it works ...!!
Base File: F_Input_Data.txt
$ wc -l F_Input_Data
2000 F_Input_Data
wc -l is used to count number of lines available in the file, here our file contains 2000 lines. I will use --verbose option available in split to display the progress of the operation.
1. How to split a file split(default)
$ split --verbose F_Input_Data
Output: It will split the base file in 2 smaller files containing 1000 line each.
creating file `xaa'
creating file `xab'
$ wc -l xaa xab
1000 xaa
1000 xab
2000 total
Note: In above example we have seen below properties of split:
1. The default size for each split file is 1000 lines.
2. Default spilt files are xaa,xab,xac ... so on.
3. Default suffix length is 2 (that is the reason we have received aa, ab with x.
We can control number of suffix, how? we will see in further examples.)
$ split --verbose -a 1 F_Input_Data
Output: See carefully how file name has changed when we introduced -a switch. Default value of a is 2.
creating file `xa'
creating file `xa'
$ wc -l xaa xab
1000 xa
1000 xb
2000 total
2. How to assign numeric suffix to the split files
We have seen that split files are having character suffix(like aa,ab,a,b). We can force split to user numeric suffix as well and same is done by using -d switch. let see how it works.
$ split --verbose -d F_Input_Data
Output: Split has split the file in two smaller files and now new file are having numeric suffix in the name (01,02).
creating file `x00'
creating file `x01
We can control the suffix length in file name here as well using -a switch.
$ split --verbose -d -a 1 F_Input_Data
Output: We can read command as,
split the file +
use numeric suffix +
length of the suffix should be 1.
creating file `x0'
creating file `x1
3. How to assign user defined Prefix in split file name
$ split --verbose F_Input_Data F_Input_Data_
Output: Observe carefully, now split has used user defined prefix instead of 'x'. This kind of naming provides more clarity than default naming of split file. Check red and green colour carefully.
creating file `F_Input_Data_ab'
Still above naming is not much clear, to have more clarity, lets split the file using number suffix.
$ split --verbose -d F_Input_Data F_Input_Data_
Output: This time split files are more readable and can be maintained easily as compare to character suffix. Check red and green colour carefully.
creating file `F_Input_Data_00'
creating file `F_Input_Data_01'
4. How to split the file based on Number of lines.
We have seen how default split works in default more and pass 1000 line in each split file. We can overwrite this default behaviour by using -l switch and can pass number of line as per our requirement. let see how it works.
$ split --verbose -l 500 -d F_Input_Data F_Input_Data_
Output: have created 4 files each having 500 lines. this is very use full while dealing with file splitting. We can change suffix/prefix as explained above here as well.
creating file `F_Input_Data_00'
creating file `F_Input_Data_01'
creating file `F_Input_Data_02'
creating file `F_Input_Data_03'
5. How to split the file based on the size.
-rw------- 1 baba baba 8893 Dec 30 01:42 F_Input_Data
$ split --verbose -b 2000 -d F_Input_Data F_Input_Data_
Output: -b switch used to instruct file to split it based on bytes.
File F_Input_Data size is 8893 byte and we have instruct split to split it into 2000 byte each. It has split the file in 4 of 2000 byte files and 1 of 893 byte file as shown below.
creating file `F_Input_Data_00'
creating file `F_Input_Data_01'
creating file `F_Input_Data_02'
creating file `F_Input_Data_03'
creating file `F_Input_Data_04'
$ ls -ltr
-rw------- 1 baba baba 2000 Dec 30 02:13 F_Input_Data_00
-rw------- 1 baba baba 893 Dec 30 02:13 F_Input_Data_04
-rw------- 1 baba baba 2000 Dec 30 02:13 F_Input_Data_03
-rw------- 1 baba baba 2000 Dec 30 02:13 F_Input_Data_02
-rw------- 1 baba baba 2000 Dec 30 02:13 F_Input_Data_01
$ split --verbose -b 2k -d F_Input_Data F_Input_Data_
Output: Here we have split the file in 1 kilo byte size (1024 byte). See the output of the above command. It has split the file in 5 file [ each of 2048 byte (1024 +1024 =2k) + 1 of size 701 byte ] total size is [ 4*2048 + 701 =8893] same as original file.
creating file `F_Input_Data_00'
creating file `F_Input_Data_01'
creating file `F_Input_Data_02'
creating file `F_Input_Data_03'
creating file `F_Input_Data_04'
$ ls -ltr
-rw------- 1 baba baba 701 Dec 30 02:20 F_Input_Data_04
-rw------- 1 baba baba 2048 Dec 30 02:20 F_Input_Data_03
-rw------- 1 baba baba 2048 Dec 30 02:20 F_Input_Data_02
-rw------- 1 baba baba 2048 Dec 30 02:20 F_Input_Data_01
-rw------- 1 baba baba 2048 Dec 30 02:20 F_Input_Data_00
$ split --verbose -b 1m -d F_Input_Data F_Input_Data_
output: Will split the file in mega byte.
Common Error With split:
As we have seen, split by default uses x as a prefix and aa,ab,ac.. as suffix for file naming. We have also seen that the default suffix length is 2 hence it uses aa,ab,ac... and so on as suffix.
Now assume you have a very large file and you are splitting that file in 1000 small files than you will receive following error: split suffix exhausted
What, exhausted?? what is that ? let see.
1st file name is xaa
2nd file name is xab
3rd file name is xac
.
.
26th file name is xaz
27th file name is xba
28th file name is xbb
.
.
52th file name is xbz
53rd file name is xca
.
.
676th file name is xzz
There will not be any further combination of suffix for naming and causes split to terminate abruptly.
As we have seen for default suffix (a=2) there would be 26 * 26 = 676 possible combination of suffix hence it can be used for 676 file naming. see below 676 is derived.
xa{a..z} => 26
xb{a..z} => 26
xc{a..z} => 26
.
.
xz{a..z} => 26
Solution: If you want to split file in huge number set suffix count accordingly so initialise switch -a accordingly.
Conclusion: We have seen how split works in real time to split a large file based on number of line and file size. We have seen that we can not do much in naming of the split files. If you want complete control over the name of the split files then go through following article on file splitting using AWK.
How to Split File Dynamically using AWK
How to Split File Dynamically (Part 2) using AWK
Keep Reading, Keep Learning, Keep Sharing...!!
No comments:
Post a Comment