Skip navigation

Tag Archives: perl

This is just to write down a little trick for when working with html files.
I was troubleshooting a layout with and extra div and using this to identify all the the instances containing end divs (</div>) so I could pin point the one that gave me a hard time tracking down:

perl -pe ‘$i++;s#</div>#</div><!–$i–>/g’

(I ran this in Emacs with “M-x shell-command-on-region”)

To massively replace text from a group of files:
// this searches all html files (not into htdocs) which contain tab = “DATs” and modifies that to tab = “Datasets” in one pass
// it’s important to notice that in the process they leave a backup copy of each one with the -e appended to the extension so bla.html will be modified but an extra bla.htm-e will exist holding the content of the old file

find .  -name '*.html' | awk '!/htdocs/' | xargs grep -l 'tab = "DATs"' | xargs sed -i -e 's/tab = "DATs"/tab = "Datasets"/g'

Note1: Actually the -i option takes whatever it’s after to create the extension of the backup to create. Use -ei if you want nothing or -e -i.bak to make it more standard.
this worked better later on

find . -type f | xargs grep "url *=> *'/DAT_introduction.html'" | awk '!/svn|htdocs|blib|README|.bk/' | awk '{print $1}' | awk -F ':' '{print $1}' | xargs perl -pi -e "s#url *=> *'/DAT_introduction.html'#url => '/datasets.html'#g"

There are many things different:
awk cleans clean up the fields so to have only the path and file name printed
awk also helps here filtering out many option I don’t want to list and they are put together with the simple clause “|” (or) in the regex

instead of sed I used perl this time (sort of more familiar), notice that I used a different character “#” for the substitution delimiter, since I don’t want to escape “/” in the regex