Skip navigation

Tag Archives: sed

I needed to inspect a relatively small portion of a large log file (~1Gb), which will make chock even powerfull text-editors like vi(m) or emacs

I proceded in two steps:
1) found the match in the file and pulled the line number

awk ‘/ May 10 /{a=$0; b = NR;}END{print a,” :: “,b}’ log.txt
which yielded:
Thu May 10 02:17:05 ART 2012 :: 29199076

2) then I dumped the content from the that line and filtered it with head
tail -n +29199076 log.txt | head -n 100
That is possible with the trick of using “tail -n +(N)” which brings lines from the N line onwards

As and alternative to the last one, as explained here, sed could’ve been used in the following manner:
sed -n -e 29199076,29199176 -e 29199077q log.txt
(the last parameter, for efficiency, tells to quit at the limit line + 1 )

(This is mainly a remainder post for myself)
For certain reasons I sometimes have to edit text pasted from an emacs buffer that I was editing with the longlines-mode enabled. Hence as this mode does, the paragraphs are hard wrapped beyond a certain amount of characters (when they extend over ‘fill-column’ lenght).

Although “the soft newlines used for line wrapping will not show up when the text is yanked or saved to disk”, they will remain if, say, I had carelessly pasted it directly into a gmail form to save for later reuse there.

My way to remove those artificially-inserted line breaks, is running this oneliner on the text region.

sed -ne '1h;1!H;${;g;s#\n\([^\n]\)# \1#g;p}' | sed -e 's#^[ \t]*\(.*\)$#\1#g'

(The first sed command tells to put a space and remove the line break. using the multiline search and replace method
The second just gets rid of the leading white space at the beginning of line)

In a bash shell, I knew about the way to produce the output of a query into a tab delimited file by simply running:

mysql -uUSER -pPASSW < fileWithSQLquery.sql >

Now in this case what I needed was a CSV file, for which I piped it to sed reeplacing tabs by commas doing:

mysql -uUSER -pPASSW < fileWithSQLquery.sql | sed -e ‘s#\t#, #g’ > fileToSave.csv

I struggled the other day doing some sys admin work for recovering data from a single table of our database. Editing big files (of about several gigabites) is no-picnic even for vi(m) or emacs, so it wasn’t trivial to find a quick way to isolate the parts needed. For what is worth here’s the method I’ve followed with success resourcing to simple cat and sed commands in my command line:

  1. Get the creation statement for the table to be recovered

    cat your_entire_backup_file.sql | sed -ne '/Table structure for table `your_table`/,/-- Dump/p' > table_creation.sql
  2. Get the data

    cat your_entire_backup_file.sql | sed -ne '/INSERT INTO `your_table`/,/Table structure/p' > data_dump.sql
  3. Join the two into a single file

    cat table_creation.sql data_dump.sql > data_for_single_table_to_copy
  4. Optionally, in case you need to extract some rows only from that previous instance of the table, as it was my case with records deleted by mistake, you might want to create a temporary table from where to later perfom the selection of the desired rows. In order to do that, the table name should be altered from the creation and insertion statements:

    sed -i 's#your_table#your_temp_table#g' data_for_single_table_to_copy.sql
  5. Now we are ready to create that temporary table with its data inside our database:

    mysql -u 'your_username -p your_database_name < 'your_path_to_the_file/data_for_single_table_to_copy.sql
  6. Voila!, the table is there containing the information you needed. Now is up to you to extract and reinsert whatever you wanted inside the original table

Note: See that different parameters could be used to isolate and put together table creation and data parts in only one pass. Also the awk command might be used instead since, like sed, it permits collecting portions by matching from the beginning to the ending block of text. Just make sure you know the order of the table after the one you are picking.

awk '/Table structure for table `your_table`/,/Table structure for table `your_next_table`/{print}' your_entire_backup_file.sql > data_for_single_table_to_copy.sql

In case the table to extract happens to be the last one (which again, you could just know with a mysq “show tables” command, modify the last part of the regexp to match it accordingly.

Hand editing web stuff (outside of wysiwyg editors, that is) a lot of times requires dealing with messy html markup. Even though tags are meant to be parsed by browsers, and there are even performance benefits for serving ugly obfuscated code without white spaces, the human readability of the markup needs some tidiness in the formatting of web pages for which the indentation of tags makes us able to understand what goes on inside it.

Emacs has a large set of commands to internally handle code indentation, but never occurred to me until recently that they could be helpful to simply re-align the clutter of html tags that you can see from the output of a source view.

I went on defining something basic like this little elisp function to quickly help with the general re-alignment of code.

(defun my-tidy ()
  "Automatically re-indents code"
   (read-kbd-macro "C-u -999 M-x indent-region RET C-x C-x M-x indent-region")))

(global-set-key (kbd "C-M-*") 'my-tidy)

Now that was fine but still insufficient. Say, what about the embedded css or javascript that normally goes along inside a web page?
I didn’t like the way emacs (in its default mode, of course) gets css code re-aligned, plus I actually do have specific requirements which demand the ability to reformat the css style sheets declaration and rules when editing a web document. As I use firefox with its web-developer plug-in, in order to save screen real state I leave the narrowest possible window at my left, therefore it is inconvenient to have css stuff deeply indented to the right (wanted just one space from left margin), additionally, rules needed to be broken up nicely as well (avoiding long chains like #div a.rule1, #div b.rule2, #div c.rule 3 for example).

This screen-shot might show better why the formatting of css code matters so much in my work setup.

my css editing style

So playing a bit more with the idea, I went on crafting some good old sed and awk one-liners for reformatting all what could be found embedded inside <style> tags. The formidable shell-command-on-region in emacs allows such things. You will note that the regexp ain’t look pretty,  it’s sort of long (I might make another post explaining how it breaks up), and also has the dreaded leaning toothpick syndrome! cause emacs lisp needs characters to be double escaped.

In short, all what I wanted is wrapped up below exactly as it now goes inside my emacs init :

(defun select-css-code ()
  "Select region contained by <style></style> tags.
   Simply highliths what is between those tags for embedded css content"
  (let(p1 p2)
    (goto-char (point-min))
    (search-forward "<style ")
    (backward-char 7)
    (setq p1 (point))
    (search-forward "</style>")
    (setq p2(point))
    (goto-char p1)
    (push-mark p2)
    (setq mark-active t)))

(defun select-javascript-code ()
  "Select region contained by <script></script> tags.
   Simply highliths what is between those tags for embedded javascript content"
  (let(p1 p2)
    (goto-char (point-min))
    (search-forward "<script ")
    (backward-char 8)
    (setq p1 (point))
    (search-forward "</script>")
    (setq p2(point))
    (goto-char p1)
    (push-mark p2)
    (setq mark-active t)))

(defun re-indent-web-page-code ()
  "Re-indents html code including its embedded javascript and css.
The css code gets indented diferently (through some awk and sed one-liners)
to ease the editing of styles with Firefox using a window of its Web-Developer plug-in."
    (indent-rigidly (region-beginning)(region-end) -999)
    (indent-region (region-beginning) (region-end))
    (indent-rigidly (region-beginning)(region-end) -999)
    (indent-region (region-beginning) (region-end))
       (setq command  "awk  '/{/ {gsub(/,/,\",\\n\")} {print }' | sed -ne '1h;1!H;${;g;s#{\\([^\\n]\\)#{\\n\\1#g;p}' | sed -ne '1h;1!H;${;g;s#;}#;\\n}#g;p}' | sed -ne '1h;1!H;${;g;s#;\\([^\\n]\\)#;\\n\\1#g;p}' |  sed -e 's#^[ \\t]*\\(.*\\)$#\\1#g' |  awk  '!/{|}|^ *#/&&!/^\\// {$0 = \" \"$0} {print }'  | awk NF |  sed -e 's#\}$#}\\n#g' | sed -ne '1h;1!H;${;g;s#,\\n#,#g;p}' | awk  '/{/ {gsub(/,/,\",\\n\")} {print }' | sed -e 's#^[ \\t]*\\(.*\\),$#\\1,#g'" )
      (shell-command-on-region (mark)(point) command t t)))

UPDATE: note that instead of a macro call

     (read-kbd-macro "M-x indent-region"))

I’m using this straight forward lisp expression

    (indent-region (region-beginning) (region-end))

The function is almost there, I still need to address a couple of things:
What if we have many [style or javascript] sections intermingled in our html?
One way to address that would be to successively grab the content, send it to other place using the acummulating-text function of emacs like ‘append-to-buffer (, then, we can switch to that second buffer, treat the code there and simply get it back to the original document.
Also noted that the css code doesn’t get indented if the javascript tags aren’t found, so I’ll revise the logic to allow it regardless of whether javascript code exist or not.

On how to delete a chunk of text contained in multiple lines: use sed to catch the range (/a/,/b/)

There were some javascript google ads calls from a page I downloaded via wget, doing:

 wget > toCleanItUp.htlm

After highlighting all the page C-x h I did a M-x shell-command-on-region and used the following:

sed "/<script .*>/,/<\/script>/d" > nowItIsClean.html

If feeling lazy, (or if I don’t have the file opened already in an emacs buffer), there’s the straightforward way of using cat:

cat toCleanItUp.html | sed "/<script .*>/,/<\/script>/d" > nowItIsClean.html

Ah, the beauty of unix tools!

Found a well written “sed by example” article with practical examples (specially in part 3).
There are plenty of good resources out there, today I partially checked a thorough intro and tutorial written by Bruce Barnett .
And this collection of sed oneliners has useful stuff as well to get you covered.

Today had to experiment a little with sed ranges.
Whereas the following does not work obviously because of the ambiguity of matching all lines containing 7, as 7, 17, 27, 37, 47 do)

yes 'nope this sed regex range does not work' | head -50 | cat -n | sed -n -e '/7/,/28/p'

These two variants could be used:

# (notice the space before 7 in this case)
yes 'this sed regex range filters correctly from seventh to twenty-eighth' | head -50 | cat -n | sed -n -e '/ 7/,/28/p'

or the following which uses the POSIX character class definition for space, (check here)

yes 'printing from seven to twenty-eighth using a POSIX character in the regex range ' | head -50 | cat -n | sed -n -e '/^[[:space:]]*7[[:space:]].*$/,/28/p'

I’m adding these other other oneliners also for my reference, even though not all use ranges

To exclude last line:
yes 'the last gets swallowed ' | head -10 | cat -n | sed '$d'

To print last line:
yes 'version 1, print last line' | head -10 | cat -n | sed '$!d'

yes 'version 2, printing last line | head -10 | cat -n | sed -n '$p'

To print only first line (like tail -1)
sed q

Print only first five lines (like tail -5)
yes ''version 1, prints up to fifth line' | head -10 | cat -n | sed -n '1,5p'

yes 'version 2, prints from first to fifth line' | head -10 | cat -n | sed '6,$'

yes 'version 3, simplest ' | head -10 | cat -n | sed 5q

whereas this other does the opposite, printing from sixth to last:
yes 'this filters up to the fifth line' | head -10 | cat -n | sed '6,$!d'

Remember to check ttp:// for a fantastic compilation of sed oneliners

Today I needed to massively clean lines containing x’s in front some css declarations in a group of files (context: this is some sort of trick I frequently use to cancel instead of deleting a property in a declaration while live editing css with the indispensable webdeveloper plugin for Firefox. practical as it the trick, if you forget to erase it immediately the x yields error on the www3 validation later on)

I could easily filter up the desired files and even remove the line doing something like this:

ls | awk '/.css$/' | xargs sed -i.bk -e 's/^ *x.*$//g'

The problem was that I did not want to simply erase the line without it’s corresponding line break, as will be the result here (the explanation is that sed by default, operates on single lines, stripping the line break from the stdin and appending it back after doing the substitution)

The simple way I found around it was to filter all lines matching the regex (outputing the rest to the file) instead of performing the substitution on every single line which would leave and empty line.

Like this:

ls | awk '/.css$/' | xargs sed -i.bk -e '/^ *x.*$/d'

I figured how to discard lines from this excellent collection of sed oneliners It shows two alternatives.

# print only lines which do NOT match regexp (emulates “grep -v”)
sed -n ‘/regexp/!p’ # method 1, corresponds to above
sed ‘/regexp/d’ # method 2, simpler syntax

Here’s the complete online command used at the shell command line (which I ran in emacs by the way)

find . -type f -exec grep -i '^ *x' /dev/null {} + | awk '!/svn|htdocs/' | cut -c 3- | awk '!/^#/' | awk -F ':' '{print $1}' | awk '/.css$/'| xargs sed -i.bk -e '/^ *x.*$/d\

To massively replace text from a group of files:
// this searches all html files (not into htdocs) which contain tab = “DATs” and modifies that to tab = “Datasets” in one pass
// it’s important to notice that in the process they leave a backup copy of each one with the -e appended to the extension so bla.html will be modified but an extra bla.htm-e will exist holding the content of the old file

find .  -name '*.html' | awk '!/htdocs/' | xargs grep -l 'tab = "DATs"' | xargs sed -i -e 's/tab = "DATs"/tab = "Datasets"/g'

Note1: Actually the -i option takes whatever it’s after to create the extension of the backup to create. Use -ei if you want nothing or -e -i.bak to make it more standard.
this worked better later on

find . -type f | xargs grep "url *=> *'/DAT_introduction.html'" | awk '!/svn|htdocs|blib|README|.bk/' | awk '{print $1}' | awk -F ':' '{print $1}' | xargs perl -pi -e "s#url *=> *'/DAT_introduction.html'#url => '/datasets.html'#g"

There are many things different:
awk cleans clean up the fields so to have only the path and file name printed
awk also helps here filtering out many option I don’t want to list and they are put together with the simple clause “|” (or) in the regex

instead of sed I used perl this time (sort of more familiar), notice that I used a different character “#” for the substitution delimiter, since I don’t want to escape “/” in the regex