Tuesday, February 7, 2023
HomeNetworkingManipulating textual content with awk, gawk and sed

Manipulating textual content with awk, gawk and sed


The awkgawk and sed instructions on Linux are extraordinarily versatile instruments for manipulating textual content, rearranging columns, producing studies and modifying file content material.

Utilizing awk and gawk

To pick parts of command output utilizing gawk, you possibly can strive instructions like these under. The primary shows the primary discipline within the output of the date command. The second shows the final discipline. Since NF represents the variety of fields within the command output, $NF represents the worth of the final discipline.

$ date | awk '{print $1}'
Sat
$ date | awk '{print $NF}'
2023

Word that on Linux techniques right this moment, awk is normally a symbolic hyperlink to gawk, so you possibly can sort both command and get the identical consequence. Here is what you will in all probability see once you do an extended itemizing of the awk executable:

$ ls -l /usr/bin/awk
lrwxrwxrwx. 1 root root 4 Jul 21  2021 /usr/bin/awk -> gawk

The “l” in the beginning identifies /usr/bin/awk as a symbolic hyperlink.

To interrupt a line of textual content into items, the textual content should use a standard delimiter to separate the items you could work with. The default delimiter for each awk or gawk is white house. The -F argument, nonetheless, lets you specify no matter delimiter separates the chunks of textual content that you’re working with. As you in all probability know, any character can function a delimiter so long as it’s solely used as a delimiter. Delimiters are normally blanks, tabs, colons, semicolons and such. You may, nonetheless, break on a letter or different character in case your knowledge requires that.

To rearrange the parts of a line of textual content separated by commas, you would use a command like this:

$ echo "one,two,three" | gawk -F ',' '{print $3,$2,$1}'
three two one

Whereas “x” isn’t used as a delimiter, even that will work.

$ echo onextwoxthree | gawk -F 'x' '{print $2,$3,$1}'
two three one

NOTE: With out the commas, the consequence could be “onetwothree”.

Word that gawk lets you prepare the items of knowledge in any order and you could ignore fields that you just don’t want to embrace in your output.

Breaking on white house

Repeated blanks (a/ok/a “white house”) function single delimiters not like different characters. Whatever the variety of clean characters are in every stretch of white house, the strings of letters are simply rearranged and displayed with single blanks between the phrases.

$ echo one      two    three | gawk -F ' ' '{print $2,$3,$1}'
two three one

Enhancing recordsdata “in place”

You can too use gawk to instantly make modifications to recordsdata through the use of the inplace possibility. Within the instance under, a brand new file is created utilizing the fortune command after which line numbers are added to every line utilizing gawk. NR represents the road quantity which is adopted by a interval and an area character.

$ fortune > fortune
$ gawk -i inplace '{print NR ". " $0}' fortune

To view the modifications, show the file once more.

$ cat fortune
1. Work is of two varieties: first, altering the place of matter at or close to
2. the earth's floor relative to different matter; second, telling different folks
3. to take action.
4.              -- Bertrand Russell

To change the textual content, you would use a command just like the one under which is able to change “two” with “2”:

$ awk -i inplace '{gsub("two", "2")} 1' fortune
$ cat fortune
1. Work is of two varieties: first, altering the place of matter at or close to
2. the earth's floor relative to different matter; second, telling different folks
3. to take action.
4.              -- Bertrand Russell

To reverse this alteration, you would strive a command like this, however discover that it modifications one of many line numbers as properly.

$ awk -i inplace '{gsub("2", "two")} 1' fortune
$ cat fortune
1. Work is of two varieties: first, altering the place of matter at or close to
two. the earth's floor relative to different matter; second, telling different folks
3. to take action.
4.              -- Bertrand Russell

To keep away from altering textual content that you just don’t need to change, add one thing to your command that may distinguish the goal textual content from the remainder. Within the command under, I’ve added a clean earlier than the two in order that the road quantity will not be affected.

$ awk -i inplace '{gsub(" 2", " two")} 1' fortune
$ cat fortune
1. Work is of two varieties: first, altering the place of matter at or close to
two. the earth's floor relative to different matter; second, telling different folks
3. to take action.
4.              -- Bertrand Russell

Separating columns of knowledge

To separate tab-separated columns of knowledge into particular person recordsdata, you would use a script like this one:

#!/bin/bash

echo -n "file> "
learn file

awk -F 't' '{ print $1 }' $file > col1
echo ==============================

awk -F 't' '{ print $2 }' $file > col2
echo ==============================

awk -F 't' '{ print $3 }' $file > col3

Assuming the unique file has at the very least three columns of knowledge separated by tabs, you’d find yourself with three separate recordsdata with content material. If the unique file has solely two fields, the third file will comprise no textual content. If the fields are separated by blanks as an alternative of tabs, solely the primary will comprise textual content.

Utilizing sed to change file content material

You can too use sed instructions to make modifications to recordsdata. For instance, to switch each occasion of 1 phrase or phrase with one other, you would use a command just like the one proven under. The file first appears like this:

$ cat txt
That is the previous textual content.

The sed command under replaces “the previous” with “higher”.

$ sed -i "s/the previous/higher/" txt
$ cat txt
That is higher textual content.

If any strains within the file comprise a number of cases of the phrase or phrase, you could add the “g” (international possibility) to the sed command or it should solely change the primary occasion in every line.

$ cat txt
That is the previous textual content.
I just like the previous textual content. Do you just like the previous textbooks?
$ sed -i "s/the previous/higher/g" txt
$ cat txt
That is higher textual content.
I like higher textual content. Do you want higher textbooks?

Wrap-up

The awkgawk and sed instructions might help when you could manipulate textual content in a roundabout way — particularly if you could make numerous modifications to numerous textual content. To look into additional into how you should use these instructions, take a look at the hyperlinks under that may take you to a number of of my earlier articles.

Copyright © 2023 IDG Communications, Inc.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments