Tag Archives: sed

On how to delete a chunk of text contained in multiple lines: use sed to catch the range (/a/,/b/)

There were some javascript google ads calls from a page I downloaded via wget, doing:

 wget www.someSite.com/somePage.html > toCleanItUp.htlm

After highlighting all the page C-x h I did a M-x shell-command-on-region and used the following:

sed "/<script .*>/,/<\/script>/d" > nowItIsClean.html

If feeling lazy, (or if I don’t have the file opened already in an emacs buffer), there’s the straightforward way of using cat:

cat toCleanItUp.html | sed "/<script .*>/,/<\/script>/d" > nowItIsClean.html

Ah, the beauty of unix tools!

Found a well written “sed by example” article with practical examples (specially in part 3).
There are plenty of good resources out there, today I partially checked a thorough intro and tutorial written by Bruce Barnett .
And this collection of sed oneliners has useful stuff as well to get you covered.

Today had to experiment a little with sed ranges.
Whereas the following does not work obviously because of the ambiguity of matching all lines containing 7, as 7, 17, 27, 37, 47 do)

yes 'nope this sed regex range does not work' | head -50 | cat -n | sed -n -e '/7/,/28/p'

These two variants could be used:

# (notice the space before 7 in this case)
yes 'this sed regex range works' | head -50 | cat -n | sed -n -e '/ 7/,/28/p'

or the following which uses the POSIX character class definition for space, (check here)

yes 'this sed regex range works ' | head -50 | cat -n | sed -n -e '/^[[:space:]]*7[[:space:]].*$/,/28/p'

Today I needed to massively clean lines containing x’s in front some css declarations in a group of files (context: this is some sort of trick I frequently use to cancel instead of deleting a property in a declaration while live editing css with the indispensable webdeveloper plugin for Firefox. practical as it the trick, if you forget to erase it immediately the x yields error on the www3 validation later on)

I could easily filter up the desired files and even remove the line doing something like this:

ls | awk '/.css$/' | xargs sed -i.bk -e 's/^ *x.*$//g'

The problem was that I did not want to simply erase the line without it’s corresponding line break, as will be the result here (the explanation is that sed by default, operates on single lines, stripping the line break from the stdin and appending it back after doing the substitution)

The simple way I found around it was to filter all lines matching the regex (outputing the rest to the file) instead of performing the substitution on every single line which would leave and empty line.

Like this:

ls | awk '/.css$/' | xargs sed -i.bk -e '/^ *x.*$/d'

I figured how to discard lines from this excellent collection of sed oneliners It shows two alternatives.

# print only lines which do NOT match regexp (emulates “grep -v”)
sed -n ‘/regexp/!p’ # method 1, corresponds to above
sed ‘/regexp/d’ # method 2, simpler syntax

Here’s the complete online command used at the shell command line (which I ran in emacs by the way)

find . -type f -exec grep -i '^ *x' /dev/null {} + | awk '!/svn|htdocs/' | cut -c 3- | awk '!/^#/' | awk -F ':' '{print $1}' | awk '/.css$/'| xargs sed -i.bk -e '/^ *x.*$/d\
'

To massively replace text from a group of files:
// this searches all html files (not into htdocs) which contain tab = “DATs” and modifies that to tab = “Datasets” in one pass
// it’s important to notice that in the process they leave a backup copy of each one with the -e appended to the extension so bla.html will be modified but an extra bla.htm-e will exist holding the content of the old file

find .  -name '*.html' | awk '!/htdocs/' | xargs grep -l 'tab = "DATs"' | xargs sed -i -e 's/tab = "DATs"/tab = "Datasets"/g'

Note1: Actually the -i option takes whatever it’s after to create the extension of the backup to create. Use -ei if you want nothing or -e -i.bak to make it more standard.
Note2:
this worked better later on

find . -type f | xargs grep "url *=> *'/DAT_introduction.html'" | awk '!/svn|htdocs|blib|README|.bk/' | awk '{print $1}' | awk -F ':' '{print $1}' | xargs perl -pi -e "s#url *=> *'/DAT_introduction.html'#url => '/datasets.html'#g"

There are many things different:
awk cleans clean up the fields so to have only the path and file name printed
awk also helps here filtering out many option I don’t want to list and they are put together with the simple clause “|” (or) in the regex

instead of sed I used perl this time (sort of more familiar), notice that I used a different character “#” for the substitution delimiter, since I don’t want to escape “/” in the regex