Extracting columns and fields from a text file

I posted about extracting lines from a text file ([1], [2]).

Enough about lines for now. Let's turn our attention to extracting columns and delimited fields in a text file. For instance, one task is to extract columns 5 to 7 in a file. Sometimes, the data you want reside in variable-length fields that are delimited by some character, say ",". A sample task is to extract the second field in a comma-delimited file.

As usual, there are more than 1 way to accomplish the tasks. The tools that we will use are cut, awk, and perl.

The text file is somefile.

 $ cat > somefile
1234567890
1234567890
1234567890
1234567890

To extract fixed columns (say columns 5-7 of a file):

$ cut -c5-7 somefile
567
567
567
567

$ perl -pe '$_ = substr($_, 4, 3) . "\n"'  somefile
567
567
567
567

The current line ($_) is replaced with substr($_, 4, 3), the substring starting from column 4 (perl is 0-based) for 3 characters.

To illustrate extracting a particular field, let's use /etc/passwd, a colon-delimited file. Say we extract the 6th field (home directory of users).

$ cut -d: -f6 /etc/passwd

$ awk -F : '{print $6}' /etc/passwd

$ perl -p -e '$_ = (split(/[:\n]/))[5] . "\n"' /etc/passwd

Here, I used the split function to separate out the words delimited by colon and the new line. The output of split is a list, and we assign the 5th element (perl is 0-based) to the current line. \n is necessary as a delimiter [:\n]; otherwise extracting the last field will have an extra new line.

If you think of some simple way to do this, please share with us using comments.

Extracting columns and fields from a text file

Related Post

0 comments:

Post a Comment