Skip navigation

Tag Archives: awk

Escaping a forwardslash in awk like

awk  '/javascript\//'

doesn’t work
The way to do so is using the hexadecimal value of “/” which you can figure with the “od” unix command:

echo "/" | od -x

which returns:
0000000 0a2f

then doing:

awk '/javascripta2f/'

works fine!

I needed to inspect a relatively small portion of a large log file (~1Gb), which will make chock even powerfull text-editors like vi(m) or emacs

I proceded in two steps:
1) found the match in the file and pulled the line number

awk ‘/ May 10 /{a=$0; b = NR;}END{print a,” :: “,b}’ log.txt
which yielded:
Thu May 10 02:17:05 ART 2012 :: 29199076

2) then I dumped the content from the that line and filtered it with head
tail -n +29199076 log.txt | head -n 100
That is possible with the trick of using “tail -n +(N)” which brings lines from the N line onwards

As and alternative to the last one, as explained here, sed could’ve been used in the following manner:
sed -n -e 29199076,29199176 -e 29199077q log.txt
(the last parameter, for efficiency, tells to quit at the limit line + 1 )

I struggled the other day doing some sys admin work for recovering data from a single table of our database. Editing big files (of about several gigabites) is no-picnic even for vi(m) or emacs, so it wasn’t trivial to find a quick way to isolate the parts needed. For what is worth here’s the method I’ve followed with success resourcing to simple cat and sed commands in my command line:

  1. Get the creation statement for the table to be recovered

    cat your_entire_backup_file.sql | sed -ne '/Table structure for table `your_table`/,/-- Dump/p' > table_creation.sql
  2. Get the data

    cat your_entire_backup_file.sql | sed -ne '/INSERT INTO `your_table`/,/Table structure/p' > data_dump.sql
  3. Join the two into a single file

    cat table_creation.sql data_dump.sql > data_for_single_table_to_copy
  4. Optionally, in case you need to extract some rows only from that previous instance of the table, as it was my case with records deleted by mistake, you might want to create a temporary table from where to later perfom the selection of the desired rows. In order to do that, the table name should be altered from the creation and insertion statements:

    sed -i 's#your_table#your_temp_table#g' data_for_single_table_to_copy.sql
  5. Now we are ready to create that temporary table with its data inside our database:

    mysql -u 'your_username -p your_database_name < 'your_path_to_the_file/data_for_single_table_to_copy.sql
  6. Voila!, the table is there containing the information you needed. Now is up to you to extract and reinsert whatever you wanted inside the original table

Note: See that different parameters could be used to isolate and put together table creation and data parts in only one pass. Also the awk command might be used instead since, like sed, it permits collecting portions by matching from the beginning to the ending block of text. Just make sure you know the order of the table after the one you are picking.

awk '/Table structure for table `your_table`/,/Table structure for table `your_next_table`/{print}' your_entire_backup_file.sql > data_for_single_table_to_copy.sql

In case the table to extract happens to be the last one (which again, you could just know with a mysq “show tables” command, modify the last part of the regexp to match it accordingly.

After updating to the newest 1.7.7 version of cygwin (you have to do uname -a to know which one you are running) I found that awk was replaced by gawk, which not only would affect many of my scripts and shell functions but also to me feels harder to type in the command line.

I found that you could simply copy one to another so to have awk as an alias of gawk.
In my setup, all it took was:

cd c:/cygwin/bin
cp gawk awk

Hand editing web stuff (outside of wysiwyg editors, that is) a lot of times requires dealing with messy html markup. Even though tags are meant to be parsed by browsers, and there are even performance benefits for serving ugly obfuscated code without white spaces, the human readability of the markup needs some tidiness in the formatting of web pages for which the indentation of tags makes us able to understand what goes on inside it.

Emacs has a large set of commands to internally handle code indentation, but never occurred to me until recently that they could be helpful to simply re-align the clutter of html tags that you can see from the output of a source view.

I went on defining something basic like this little elisp function to quickly help with the general re-alignment of code.

(defun my-tidy ()
  "Automatically re-indents code"
   (read-kbd-macro "C-u -999 M-x indent-region RET C-x C-x M-x indent-region")))

(global-set-key (kbd "C-M-*") 'my-tidy)

Now that was fine but still insufficient. Say, what about the embedded css or javascript that normally goes along inside a web page?
I didn’t like the way emacs (in its default mode, of course) gets css code re-aligned, plus I actually do have specific requirements which demand the ability to reformat the css style sheets declaration and rules when editing a web document. As I use firefox with its web-developer plug-in, in order to save screen real state I leave the narrowest possible window at my left, therefore it is inconvenient to have css stuff deeply indented to the right (wanted just one space from left margin), additionally, rules needed to be broken up nicely as well (avoiding long chains like #div a.rule1, #div b.rule2, #div c.rule 3 for example).

This screen-shot might show better why the formatting of css code matters so much in my work setup.

my css editing style

So playing a bit more with the idea, I went on crafting some good old sed and awk one-liners for reformatting all what could be found embedded inside <style> tags. The formidable shell-command-on-region in emacs allows such things. You will note that the regexp ain’t look pretty,  it’s sort of long (I might make another post explaining how it breaks up), and also has the dreaded leaning toothpick syndrome! cause emacs lisp needs characters to be double escaped.

In short, all what I wanted is wrapped up below exactly as it now goes inside my emacs init :

(defun select-css-code ()
  "Select region contained by <style></style> tags.
   Simply highliths what is between those tags for embedded css content"
  (let(p1 p2)
    (goto-char (point-min))
    (search-forward "<style ")
    (backward-char 7)
    (setq p1 (point))
    (search-forward "</style>")
    (setq p2(point))
    (goto-char p1)
    (push-mark p2)
    (setq mark-active t)))

(defun select-javascript-code ()
  "Select region contained by <script></script> tags.
   Simply highliths what is between those tags for embedded javascript content"
  (let(p1 p2)
    (goto-char (point-min))
    (search-forward "<script ")
    (backward-char 8)
    (setq p1 (point))
    (search-forward "</script>")
    (setq p2(point))
    (goto-char p1)
    (push-mark p2)
    (setq mark-active t)))

(defun re-indent-web-page-code ()
  "Re-indents html code including its embedded javascript and css.
The css code gets indented diferently (through some awk and sed one-liners)
to ease the editing of styles with Firefox using a window of its Web-Developer plug-in."
    (indent-rigidly (region-beginning)(region-end) -999)
    (indent-region (region-beginning) (region-end))
    (indent-rigidly (region-beginning)(region-end) -999)
    (indent-region (region-beginning) (region-end))
       (setq command  "awk  '/{/ {gsub(/,/,\",\\n\")} {print }' | sed -ne '1h;1!H;${;g;s#{\\([^\\n]\\)#{\\n\\1#g;p}' | sed -ne '1h;1!H;${;g;s#;}#;\\n}#g;p}' | sed -ne '1h;1!H;${;g;s#;\\([^\\n]\\)#;\\n\\1#g;p}' |  sed -e 's#^[ \\t]*\\(.*\\)$#\\1#g' |  awk  '!/{|}|^ *#/&&!/^\\// {$0 = \" \"$0} {print }'  | awk NF |  sed -e 's#\}$#}\\n#g' | sed -ne '1h;1!H;${;g;s#,\\n#,#g;p}' | awk  '/{/ {gsub(/,/,\",\\n\")} {print }' | sed -e 's#^[ \\t]*\\(.*\\),$#\\1,#g'" )
      (shell-command-on-region (mark)(point) command t t)))

UPDATE: note that instead of a macro call

     (read-kbd-macro "M-x indent-region"))

I’m using this straight forward lisp expression

    (indent-region (region-beginning) (region-end))

The function is almost there, I still need to address a couple of things:
What if we have many [style or javascript] sections intermingled in our html?
One way to address that would be to successively grab the content, send it to other place using the acummulating-text function of emacs like ‘append-to-buffer (, then, we can switch to that second buffer, treat the code there and simply get it back to the original document.
Also noted that the css code doesn’t get indented if the javascript tags aren’t found, so I’ll revise the logic to allow it regardless of whether javascript code exist or not.

This is more of quick reminder for myself on the difficult task of escaping quotes and running an awk or sed command inside the bash command line. I solved the issue in my case using the octal representation of a single quote (\47), taken from here, where more options are shown.

Having previously accommodated the list of stale links with their replacements in a two columns format that looked like:

I used the following:

awk '{print "UPDATE library SET url=\47"$2"\47 WHERE url=\47"$1"\47;"}' 

to correctly generated what I wanted sql-postgres statements to update the links table:

UPDATE links SET url='' WHERE url='';

Knowing how to do loops in awk is helpful to address a range of fields. If for example, we want to output only from second, to prior to last field. Say we’ve got :


Fields are here defined by “:” separator, so using:

awk -F ':' '{for(i=2;i<NF;i++) printf "%s:",$i}'

We will get:


Using grep, use the before and after switches -B Num -A Num. For example:
svn -v log | grep -B 2 -A 4 'string'

Also found here this awk oneliner where “b” and “a” are the number of lines to print before and after string “s”

awk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=2 a=4 s="string" file1

A quick note so I can remember the way to skip some fields at the beginning of a line printing the rest without further specifications (as it’d be tedious to have to write say: awk '{print $2,$3,$4,$5,$6,$7,$8,$9...$n}')
The solution found here explains a simple way: just to print the substring starting from the nth field.
In my example where I pull the history of commands filtering for “find” I skip the first field in order to output without the command number:
history | grep find | awk '{print substr($0, index($0,$2))}'
(for what it matters, turns to be convenient for me inside the emacs shell, as I can select the line I want doing fewer keystrokes:C-a SPC, activates the selection mark at the beginning, C-e extends selection to the end of the line and C-w copies it to customize and reuse it)

Later observation:
Actually, I found a simpler (hence, better) way using the cut command:

history | grep find | cut -d ' ' -f3-

where “-d ‘ ‘” is the separation field and -f3- means from the third field on. With cut you don’t need to know which is the last field, leaving the second part of the range empty implies “up to the end”.