Predictive Hacks

Unix Sed Command Tutorial with Examples

Sed (stream editor) is a very powerful tool for parsing and transforming text that was developed back in 1973 at Bell Labs. Usually, you can find sed pre-installed in any UNIX variant, but it is possible that you will need to install it.

In this tutorial, we will show you the basic use of sed commands with examples. For the examples, we will work with the following two files.

eg.csv

ID,Name,Dept,Gender
1,George,DS,M
2,Billy,DS,M
3,Nick,IT,M
4,George,IT,M
5,Nikki,HR,F
6,Claudia,HR,F
7,Maria,Sales,F
8,Jimmy,Sales,M
9,Jane,Marketing,F
10,George,DS,M

myfile.txt

This is the first line bla bla bla
This is the second line bla bla bla
This is the third line bla bla bla
some other text is here
This is Predictive Hacks a Data Science blog
and this is the last line of this non sense text

Before we start with the examples, let’s check if the sed is installed in our system.

sed --version

Since we have installed sed, we are good to go!

How to Replace Values by Line

The most common command of sed is the 's/pattern/value/' where it searches for a string “pattern” and it replaces it with the required value. Note that the “s” comes from “substitute“. Let’s dive into an example, where we would like to change the “bla” with “BLA” from the myfile.txt.

sed 's/bla/BLA/' myfile.txt

And we get:

This is the first line BLA bla bla
This is the second line BLA bla bla
This is the third line BLA bla bla
some other text is here
This is Predictive Hacks a Data Science blog

Note: As we can see, it replaced only the first occurrence of “bla” with “BLA” per each line.

If we want to replace every occurrence, then we should add the “g” in the expression (‘s/pattern/value/g’). The “g” comes from “global”.

sed 's/bla/BLA/g' myfile.txt

And we get:

This is the first line BLA BLA BLA
This is the second line BLA BLA BLA
This is the third line BLA BLA BLA
some other text is here
This is Predictive Hacks a Data Science blog

As we can see, we replaced every occurrence of “bla” with “BLA”.

How to Replace the n-th Occurrence with a Value

Let’s say that we want to replace the third occurrence of the word “bla” with “BLA” within each line. We can easily do it as follows:

sed 's/bla/BLA/3' myfile.txt

And we get:

This is the first line bla bla BLA
This is the second line bla bla BLA
This is the third line bla bla BLA
some other text is here
This is Predictive Hacks a Data Science blog

As we can see, only the third occurrence of “bla” of each line was replaced with the “BLA”.

The use of “&”

We can call the matched pattern with the use of “&” symbol. Let’s say, that we want to enclose the first numbers of the eg.csv into parenthesis. Let’s see how we can do it.

sed 's/^[0-9]\{1,2\}/(&)/g' eg.csv

And we get:

ID,Name,Dept,Gender
(1),George,DS,M
(2),Billy,DS,M
(3),Nick,IT,M
(4),George,IT,M
(5),Nikki,HR,F
(6),Claudia,HR,F
(7),Maria,Sales,F
(8),Jimmy,Sales,M
(9),Jane,Marketing,F
(10),George,DS,M

Explanation: The “^[0-9]\{1,2\}” part searches for any line which starts with 1 or 2 digits. Then, the matched text is denoted by “&”, so the “(&)” means to enclosed the first 1 to 2 digits into a parenthesis.

Multiple Commands

We can run multiple commands using the “-e” flag. Let’s say, that we would like to replace the “This” with “That” and the “is” with the “was” , in the same command.

sed -e 's/This/That/g' -e 's/is/was/g' myfile.txt

And we get:

That was the first line bla bla bla
That was the second line bla bla bla
That was the third line bla bla bla
some other text was here
That was Predictive Hacks a Data Science blog

Specifying a Range of Lines

We can specify a range of lines for the sed command. More particularly, we can specify:

  • A single line with a single number.
  • The last line with the “$” sign.
  • Lines with a regular expression using the “/…/” notation
  • A range of lines with the “,” symbol.
  • An invert expression using the “!” symbol.

How to Replace a text in a Particular Line

Let’s say that we want to replace the “bla” with “BLA” in the third line only.

sed '3 s/bla/BLA/g' myfile.txt

And we get:

This is the first line bla bla bla
This is the second line bla bla bla
This is the third line BLA BLA BLA
some other text is here
This is Predictive Hacks a Data Science blog

Notice that the changes occurred in the third line only.

Printing Specific Lines

We can print specific lines. For example, let’s say that we want to print the first 5 lines.

sed -n '1,5 p' eg.csv

And we get:

ID,Name,Dept,Gender
1,George,DS,M
2,Billy,DS,M
3,Nick,IT,M
4,George,IT,M

Using the $ sign, we can print up to the last line. For example, let’s say that we want to get from the 5th line up to the end:

sed -n '5,$ p' eg.csv

And we get:

4,George,IT,M
5,Nikki,HR,F
6,Claudia,HR,F
7,Maria,Sales,F
8,Jimmy,Sales,M
9,Jane,Marketing,F
10,George,DS,M

Finally, using the !p notation we can invert the expression. For example, let’s say that we want to print all the lines apart from the first five.

sed -n '1,5 !p' eg.csv

And we get:

5,Nikki,HR,F
6,Claudia,HR,F
7,Maria,Sales,F
8,Jimmy,Sales,M
9,Jane,Marketing,F
10,George,DS,M

Printing Lines based on Regular Expression

Let’s say that I want to return all the lines that contain “bla” up until the first occurrence of “here“.

sed -n '/bla/ , /here/p' myfile.txt

And we get:

This is the first line bla bla bla
This is the second line bla bla bla
This is the third line bla bla bla
some other text is here

Printing the Replaced Lines Only

We can print only the replaced lines by using the -n and -p flags. For example, we will replace the “DS” with the “Data Science” and we will return only the lines where there was a replacement.

sed -n 's/DS/Data Science/p' eg.csv

And we get:

1,George,Data Science,M
2,Billy,Data Science,M
10,George,Data Science,M

Deleting Lines

We can “delete” lines with the -d flag. When we say deleting, we mean that the lines will not be printed. Do not worry about your original file, it will not be affected. Let’s say that I want to”delete”, i.e. to filter out, all lines that contain “bla“.

sed '/bla/d' myfile.txt

And we get:

some other text is here
This is Predictive Hacks a Data Science blog

Replacing Single Characters with Single Characters

Similar to the UNIX “tr” command, we can replace single characters with single characters using the -y flag. We can replace many single characters at once using mapping. For example, let’s say that I want to change the lower case vowels to the upper case.

  • a –> A
  • e –> E
  • i –> I
  • o –> O
  • u –> U
sed 'y/aeiou/AEIOU/' myfile.txt

And we get:

ThIs Is thE fIrst lInE blA blA blA
ThIs Is thE sEcOnd lInE blA blA blA
ThIs Is thE thIrd lInE blA blA blA
sOmE OthEr tExt Is hErE
ThIs Is PrEdIctIvE HAcks A DAtA ScIEncE blOg

As we can see, all the vowels are in the upper case.

Adding a Blank Line After Each Line

We can insert a blank line after each line as follows:

$ sed G myfile.txt

And we get:

This is the first line bla bla bla

This is the second line bla bla bla

This is the third line bla bla bla

some other text is here

This is Predictive Hacks a Data Science blog

and this is the last line of this non sense text

How to Delete Blank Lines

In the previous example, we added blank lines. Let’s see how we can remove blank lines using sed.

sed G myfile.txt | sed '/^$/d'

And we get:

This is the first line bla bla bla
This is the second line bla bla bla
This is the third line bla bla bla
some other text is here
This is Predictive Hacks a Data Science blog
and this is the last line of this non sense text

The Takeaway

sed is a strong tool for processing text files and is a competitive technical skill for Data Scientists and Data Engineers. It enables us to do a basic data cleansing and some checks without having to load the files in memory using R or Python. If you found interesting the sed tutorial, you may like that awk tutorial too. Stay tuned.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s