Friday, July 6, 2012
3:02 PM

csplit : Split a file based on patterns

The command "csplit" can be used to split a file into different files based on certain pattern in the file or line numbers.

For example let us say we have the file, temp, with the following contents

temp:



we can split the file into two new files ,each having part of the contents of the original file, using csplit.

The syntax of csplit is



Pattern as integer number:

When the pattern is an integer number it makes cplit to copy line upto that line number,no including the line, into a new file and contents after that into a new file.

Example:



The numbers printed after the command is executed is the number of bytes written into each new file that got created. After the execution of the command we can see that we have two new files "xx00" and "xx01", these are the files created by csplit.



By looking at the contents of xx00 and xx01 we can observe that csplit has split the first 4 lines of file temp into one file and lines 5 till end of temp to another file .

The names of the new files created are by default of the format "xx00", "xx01"... etc.

Pattern as a Regular expression:

We can also pass regular expressions as patterns to split the file.

Example



We have passed the pattern as the string "two", which means we want to split the file temp at the first occurrence of the the string two.



As we can see, the file xx00 only has the first line which is before the line containing the string "two" .

Repeating a pattern:

A pattern can be made to repeat any number of times by passing the integer after the pattern with in {} i.e



csplit will then split the original file into integer number of new files, if the pattern repeats integer number of times else it will split maximum number of times possible which is less than the integer passed.

Examples:



In the above command we passed "1" as the pattern and 7 as integer for repeat. Thus the file was split 7 times with one line in each file.
The repetition can be made to occur as many times as possible by passing "*" instead of a number.

Example:



The above command split the file temp into a new file on each occurrence of the string /Line/ .

The output file prefix can be changed by using the option -f



As we can see what ever string is passed after -f will be used as the prefix.
The number of digits used in the suffix can be changed using the option -n



As we can see the number of digits after prefix "file" is only one, which was passed with the option "-n".


0 comments:

Post a Comment