Basic operations with SED

In this article we will learn about some of the main uses we can use sed for : * replacing * deleting * printing

For this example we will learn how to remove the comments starting with the '#' sign and the blank lines for the following file :

## Header of input.csv
#this file contains information I want to parse with a simple program. 
#The header, the footer or any comment starting with a "#" will be removed
#The blank lines will also be removed

#img,processed,defaut
#bloc 1
0,a0000.tif,,
1,a0001.tif,True,"(139, 63)(145, 91)"
2,a0002.tif,True,"(93, 72)(24, 162)(31, 64)"
3,a0003.tif,,
4,a0004.tif,,
5,a0005.tif,,
6,a0006.tif,,
7,a0007.tif,,
8,a0008.tif,,
9,a0009.tif,True,"(127, 80)(104, 60)(87, 63)(53, 78)(17, 126)"
10,a0010.tif,,
11,a0011.tif,True,"(39, 78)(84, 110)" # a random comment passing by
#end of bloc 1


#bloc 2
12,a0012.tif,,
13,a0013.tif,,
14,a0014.tif,,
15,a0015.tif,True,"(146, 65)(146, 89)(139, 146)(16, 68)"
16,a0016.tif,True,"(51, 59)(77, 69)(145, 78)(139, 112)(97, 123)(17, 148)"
17,a0017.tif,,
#end of bloc 2

#bloc 3
18,a0018.tif,,
19,a0019.tif,,
20,a0020.tif,True,"(57, 99)(12, 113)(27, 139)(16, 158)"
21,a0021.tif,,
22,a0022.tif,,
23,a0023.tif,,
24,a0024.tif,,
25,a0025.tif,,
26,a0026.tif,,
#end of bloc 3

27,a0027.tif,True,"(11, 86)(29, 74)(92, 68)(109, 129)(132, 104)"
28,a0028.tif,,
29,a0029.tif,True,"(128, 58)"


30,a0030.tif,True,"(133, 59)(99, 77)(111, 100)(115, 153)"
31,a0031.tif,True,"(43, 154)(27, 177)"


## footer : end of file 

Anatomy of a SED command

If we run the command :

sed "" input.csv

Everything inside the brackets will be interpreted as a sed command. In our case, there is nothing hence the file will be printed to the consol without any modification.

You can put in the quotation mark one of sed's many commands for instance s that stands for substitude and is one of the most commonly used.

In our csv we use the comma separator, let's say that we want to change it to a semicolon. We would do :

sed "s/,/;/" input.csv

so we have s meaning that we want to use the replace command followed by a / and the caracter(s) we want to replace followed by a / and the caracter(s) we want to replace it with and finally a /.

The result is the following :

## Header
#this file contains information I want to parse with a simple program.
#The header; the footer or any comment starting with a "#" will be removed
#The blank lines will also be removed

#img;processed,defaut
#bloc 1
0;a0000.tif,,
1;a0001.tif,True,"(139, 63)(145, 91)"
2;a0002.tif,True,"(93, 72)(24, 162)(31, 64)"
3;a0003.tif,,
4;a0004.tif,,
5;a0005.tif,,
6;a0006.tif,,
7;a0007.tif,,
8;a0008.tif,,
9;a0009.tif,True,"(127, 80)(104, 60)(87, 63)(53, 78)(17, 126)"
10;a0010.tif,,
11;a0011.tif,True,"(39, 78)(84, 110)" # a random comment passing by
#end of bloc 1


#bloc 2
12;a0012.tif,,
13;a0013.tif,,
14;a0014.tif,,
15;a0015.tif,True,"(146, 65)(146, 89)(139, 146)(16, 68)"
16;a0016.tif,True,"(51, 59)(77, 69)(145, 78)(139, 112)(97, 123)(17, 148)"
17;a0017.tif,,
#end of bloc 2

#bloc 3
18;a0018.tif,,
19;a0019.tif,,
20;a0020.tif,True,"(57, 99)(12, 113)(27, 139)(16, 158)"
21;a0021.tif,,
22;a0022.tif,,
23;a0023.tif,,
24;a0024.tif,,
25;a0025.tif,,
26;a0026.tif,,
#end of bloc 3

27;a0027.tif,True,"(11, 86)(29, 74)(92, 68)(109, 129)(132, 104)"
28;a0028.tif,,
29;a0029.tif,True,"(128, 58)"


30;a0030.tif,True,"(133, 59)(99, 77)(111, 100)(115, 153)"
31;a0031.tif,True,"(43, 154)(27, 177)"


## footer : end of file

As you can see, only the first comma has been replaced, in order to repeat the command multiple times per line we need to specify the g option

sed "s/,/;/g" input.csv
## Header
#this file contains information I want to parse with a simple program.
#The header; the footer or any comment starting with a "#" will be removed
#The blank lines will also be removed

#img;processed;defaut
#bloc 1
0;a0000.tif;;
1;a0001.tif;True;"(139; 63)(145; 91)"
2;a0002.tif;True;"(93; 72)(24; 162)(31; 64)"
3;a0003.tif;;
4;a0004.tif;;
5;a0005.tif;;
6;a0006.tif;;
7;a0007.tif;;
8;a0008.tif;;
9;a0009.tif;True;"(127; 80)(104; 60)(87; 63)(53; 78)(17; 126)"
10;a0010.tif;;
11;a0011.tif;True;"(39; 78)(84; 110)" # a random comment passing by
#end of bloc 1


#bloc 2
12;a0012.tif;;
13;a0013.tif;;
14;a0014.tif;;
15;a0015.tif;True;"(146; 65)(146; 89)(139; 146)(16; 68)"
16;a0016.tif;True;"(51; 59)(77; 69)(145; 78)(139; 112)(97; 123)(17; 148)"
17;a0017.tif;;
#end of bloc 2

#bloc 3
18;a0018.tif;;
19;a0019.tif;;
20;a0020.tif;True;"(57; 99)(12; 113)(27; 139)(16; 158)"
21;a0021.tif;;
22;a0022.tif;;
23;a0023.tif;;
24;a0024.tif;;
25;a0025.tif;;
26;a0026.tif;;
#end of bloc 3

27;a0027.tif;True;"(11; 86)(29; 74)(92; 68)(109; 129)(132; 104)"
28;a0028.tif;;
29;a0029.tif;True;"(128; 58)"


30;a0030.tif;True;"(133; 59)(99; 77)(111; 100)(115; 153)"
31;a0031.tif;True;"(43; 154)(27; 177)"


## footer : end of file

Removing comments

In order to remove comments, we can replace the pattern of a comment by nothing. A commant starts with a # sign and is followed by an arbitrarly long chain of characters. In order to match this pattern, we will use a regular expression.

sed "s/#.*//g" input.csv

#.* means : find strings that start with a #, the . stands for any character, finally, the * means that the . can be repeated any number of times. That means that sed will look for a string starting with a # followed by any characters that come after in the line.

If we want to do something a bit cleaner, we can try to remove any whitespace before the comments as well. In order to do so, we will use the \s regular expression that represents a whitespace. If we want to make sure that we removed any whitespace before the comments, we will do \s*

The final regular expression is then \s*#.* and it will then be replaced by nothing.

Finally, the command becomes

sed "s/\s*#.*//g" input.csv

If we run that command, all our comments have disappeared

0,a0000.tif,,
1,a0001.tif,True,"(139, 63)(145, 91)"
2,a0002.tif,True,"(93, 72)(24, 162)(31, 64)"
3,a0003.tif,,
4,a0004.tif,,
5,a0005.tif,,
6,a0006.tif,,
7,a0007.tif,,
8,a0008.tif,,
9,a0009.tif,True,"(127, 80)(104, 60)(87, 63)(53, 78)(17, 126)"
10,a0010.tif,,
11,a0011.tif,True,"(39, 78)(84, 110)"




12,a0012.tif,,
13,a0013.tif,,
14,a0014.tif,,
15,a0015.tif,True,"(146, 65)(146, 89)(139, 146)(16, 68)"
16,a0016.tif,True,"(51, 59)(77, 69)(145, 78)(139, 112)(97, 123)(17, 148)"
17,a0017.tif,,



18,a0018.tif,,
19,a0019.tif,,
20,a0020.tif,True,"(57, 99)(12, 113)(27, 139)(16, 158)"
21,a0021.tif,,
22,a0022.tif,,
23,a0023.tif,,
24,a0024.tif,,
25,a0025.tif,,
26,a0026.tif,,


27,a0027.tif,True,"(11, 86)(29, 74)(92, 68)(109, 129)(132, 104)"
28,a0028.tif,,
29,a0029.tif,True,"(128, 58)"


30,a0030.tif,True,"(133, 59)(99, 77)(111, 100)(115, 153)"
31,a0031.tif,True,"(43, 154)(27, 177)"

Removing blank lines

In order to remove the blank lines, we need to specify to sed a pattern corresponding to a blank line and use the d command where the d stands for delete. The delete command expects the pattern to the between /

Now let's define what pattern a blank line corresponds to. Since there is no symbol for blankness, we can do the following ^$. ^ means the begining of a line and $ corresponds to the end of a line.

So whenever we find a blank line we delete it.

sed "/^$/ d" input.csv
## Header
#this file contains information I want to parse with a simple program.
#The header, the footer or any comment starting with a "#" will be removed
#The blank lines will also be removed
#img,processed,defaut
#bloc 1
0,a0000.tif,,
1,a0001.tif,True,"(139, 63)(145, 91)"
2,a0002.tif,True,"(93, 72)(24, 162)(31, 64)"
3,a0003.tif,,
4,a0004.tif,,
5,a0005.tif,,
6,a0006.tif,,
7,a0007.tif,,
8,a0008.tif,,
9,a0009.tif,True,"(127, 80)(104, 60)(87, 63)(53, 78)(17, 126)"
10,a0010.tif,,
11,a0011.tif,True,"(39, 78)(84, 110)" # a random comment passing by
#end of bloc 1
#bloc 2
12,a0012.tif,,
13,a0013.tif,,
14,a0014.tif,,
15,a0015.tif,True,"(146, 65)(146, 89)(139, 146)(16, 68)"
16,a0016.tif,True,"(51, 59)(77, 69)(145, 78)(139, 112)(97, 123)(17, 148)"
17,a0017.tif,,
#end of bloc 2
#bloc 3
18,a0018.tif,,
19,a0019.tif,,
20,a0020.tif,True,"(57, 99)(12, 113)(27, 139)(16, 158)"
21,a0021.tif,,
22,a0022.tif,,
23,a0023.tif,,
24,a0024.tif,,
25,a0025.tif,,
26,a0026.tif,,
#end of bloc 3
27,a0027.tif,True,"(11, 86)(29, 74)(92, 68)(109, 129)(132, 104)"
28,a0028.tif,,
29,a0029.tif,True,"(128, 58)"
30,a0030.tif,True,"(133, 59)(99, 77)(111, 100)(115, 153)"
31,a0031.tif,True,"(43, 154)(27, 177)"
## footer : end of file

combining the sed commands

We can concatenate sed commands by separating them with a semicolon. Hence the final sed command will be :

sed "s/\s*#.*//g;/^$/ d" input.csv

and the final output is

0,a0000.tif,,
1,a0001.tif,True,"(139, 63)(145, 91)"
2,a0002.tif,True,"(93, 72)(24, 162)(31, 64)"
3,a0003.tif,,
4,a0004.tif,,
5,a0005.tif,,
6,a0006.tif,,
7,a0007.tif,,
8,a0008.tif,,
9,a0009.tif,True,"(127, 80)(104, 60)(87, 63)(53, 78)(17, 126)"
10,a0010.tif,,
11,a0011.tif,True,"(39, 78)(84, 110)"
12,a0012.tif,,
13,a0013.tif,,
14,a0014.tif,,
15,a0015.tif,True,"(146, 65)(146, 89)(139, 146)(16, 68)"
16,a0016.tif,True,"(51, 59)(77, 69)(145, 78)(139, 112)(97, 123)(17, 148)"
17,a0017.tif,,
18,a0018.tif,,
19,a0019.tif,,
20,a0020.tif,True,"(57, 99)(12, 113)(27, 139)(16, 158)"
21,a0021.tif,,
22,a0022.tif,,
23,a0023.tif,,
24,a0024.tif,,
25,a0025.tif,,
26,a0026.tif,,
27,a0027.tif,True,"(11, 86)(29, 74)(92, 68)(109, 129)(132, 104)"
28,a0028.tif,,
29,a0029.tif,True,"(128, 58)"
30,a0030.tif,True,"(133, 59)(99, 77)(111, 100)(115, 153)"
31,a0031.tif,True,"(43, 154)(27, 177)"

Overwritting the file

If we want to overwrite the result of sed on the input file, we need to add the option -i. So if you are running sed without -i, it is safe and won't alter your files.

sed -i "s/\s*#.*//g;/^$/ d" input.csv