Enough about lines for now. Let's turn our attention to extracting columns and delimited fields in a text file. For instance, one task is to extract columns 5 to 7 in a file. Sometimes, the data you want reside in variable-length fields that are delimited by some character, say ",". A sample task is to extract the second field in a comma-delimited file.
As usual, there are more than 1 way to accomplish the tasks. The tools that we will use are cut, awk, and perl.
The text file is somefile.
$ cat > somefile
1234567890
1234567890
1234567890
1234567890
To extract fixed columns (say columns 5-7 of a file):
$ cut -c5-7 somefile
567
567
567
567
$ perl -pe '$_ = substr($_, 4, 3) . "\n"' somefile
567
567
567
567
The current line ($_) is replaced with substr($_, 4, 3), the substring starting from column 4 (perl is 0-based) for 3 characters.
To illustrate extracting a particular field, let's use /etc/passwd, a colon-delimited file. Say we extract the 6th field (home directory of users).
$ cut -d: -f6 /etc/passwd
$ awk -F : '{print $6}' /etc/passwd
$ perl -p -e '$_ = (split(/[:\n]/))[5] . "\n"' /etc/passwd
Here, I used the split function to separate out the words delimited by colon and the new line. The output of split is a list, and we assign the 5th element (perl is 0-based) to the current line. \n is necessary as a delimiter [:\n]; otherwise extracting the last field will have an extra new line.
If you think of some simple way to do this, please share with us using comments.
0 comments:
Post a Comment