Skip navigation

Tag Archives: text-editing

This is a way to select whatever text we had just inserted inside a buffer. I forgotten this trick until now that used it again. A small tip worth annotating that might come in handy for somebody else as well:
1) yank(paste) the text “C-y”
2) mark the point where the cursor landed, (using point-to-register), do “C-x r SPC” and pick any letter ( a-z)
3) go back to the point where you initially were, by doing “C-u SPC”
4) set the mark there,“C-SPC”
5) use “C-x r j” (+ the letter picked) to go up the point where the yanked text ends inside your buffer.
6) Voila, the text yanked is highlighted ready for whatever manipulation you need to do onto it.

ps: Of course in case you pasted text at the end of the buffer is simpler,
just a matter of:
“C-u SPC”
“C-SPC” to mark
“M – >” to go to the end

UPDATE: Actually, thanks to Peter (see comment below) I learned that this is possible just doing “C-x C-x after the yank if you have transient mark mode enabled” .

I just wanted to free a shortcut I had for “C-1” to let it behave the way it was meant with the command “digit-arguments”. This answer came in handy.

(global-unset-key (kbd “C-1”))
(define-key global-map (kbd “C-1”) ‘digit-argument)

By the way (a little tip) you can go: C-<digit ARG) instead of the C-u method; say to move 11 characters back, “C-11 C-b” is quicker than “C-u 11 C-b”, one less hit!

I needed to inspect a relatively small portion of a large log file (~1Gb), which will make chock even powerfull text-editors like vi(m) or emacs

I proceded in two steps:
1) found the match in the file and pulled the line number

awk ‘/ May 10 /{a=$0; b = NR;}END{print a,” :: “,b}’ log.txt
which yielded:
Thu May 10 02:17:05 ART 2012 :: 29199076

2) then I dumped the content from the that line and filtered it with head
tail -n +29199076 log.txt | head -n 100
That is possible with the trick of using “tail -n +(N)” which brings lines from the N line onwards

As and alternative to the last one, as explained here, sed could’ve been used in the following manner:
sed -n -e 29199076,29199176 -e 29199077q log.txt
(the last parameter, for efficiency, tells to quit at the limit line + 1 )

When in need to enter special character inside emacs you can quote it with “C-q” and enter the octal character code, so for instance to type “ñ” you will do: “C-q 361

I found a thorough description of iso-latin-1 on the web, plus a great deal of general info on character and encodings available there as well

Within emacs it’s easy to pull any charset-table by doing “M-x list-charset-chars“, say if we type latin-iso8859-1 at the prompt the program shows:

character set

Then, for specifics about any character, simply highlight it and do “M-x describe-char” to get a full detailed view like:

character: ñ (241, #o361, #xf1)
preferred charset: latin-iso8859-1
(Right-Hand Part of ISO/IEC 8859/1 (Latin-1): ISO-IR-100)
code point: 0x71
syntax: w which means: word
category: .:Base, j:Japanese, l:Latin
buffer code: #xC3 #xB1
file code: #xF1 (encoded by coding system iso-latin-1-unix)
display: by this font (glyph code)
uniscribe:-outline-Bitstream Vera Sans Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x78)

Character code properties: customize what to show
general-category: Ll (Letter, Lowercase)
decomposition: (110 771) (‘n’ ‘̃’)

There are text properties here:
charset latin-iso8859-1

Awesome power!

I had an encoding issue that was bugging me inside the remember-data-file. I don’t know exactly how some latin-1 characters copy-pasted there ended up being saved as raw-text and were shown like non-ASCII characters (so a multibyte characters like “é” will appear with it’s escaped octal code “303\251” )

I tried at first setting the file’s encoding system with the tag “-*-coding: utf-8 -*-“, though it seemed not sufficient. The raw characters remained there and I soon grew tired of having to type in: “utf-8” at the prompt to select the encoding every time I needed to save the file.

Today searching the manual found here one easy cure on the command “recode-region” which allows to convert the text that was decoded with the wrong coding system.

Really all it took after marking the whole buffer (C-x h) was doing: “M-x recode-region RET” “Text was was really in: utf-8” “But was interpreted as: raw-text”

That was it!, the drag is over, I’m back to storing notes quickly doing just C-c r and C-c C-x with the worthy remember mode.

(This is mainly a remainder post for myself)
For certain reasons I sometimes have to edit text pasted from an emacs buffer that I was editing with the longlines-mode enabled. Hence as this mode does, the paragraphs are hard wrapped beyond a certain amount of characters (when they extend over ‘fill-column’ lenght).

Although “the soft newlines used for line wrapping will not show up when the text is yanked or saved to disk”, they will remain if, say, I had carelessly pasted it directly into a gmail form to save for later reuse there.

My way to remove those artificially-inserted line breaks, is running this oneliner on the text region.

sed -ne '1h;1!H;${;g;s#\n\([^\n]\)# \1#g;p}' | sed -e 's#^[ \t]*\(.*\)$#\1#g'

(The first sed command tells to put a space and remove the line break. using the multiline search and replace method
The second just gets rid of the leading white space at the beginning of line)

This seems a bit idiosincratic, in emacs the way to paste (yank) text in the search minibuffer is doing M-y not C-y (that is Meta-y not Control-y, as is the default way for all other matters)
It’s funny that I didn’t realize something so basic until now. I don’t know what the reason might be, but this is still the way at least in emacs 23.1.91
(R. Stallman answered the same question in 2005 here at

UPDATE: maybe worth pointing that when editing a search (M-E) you need to use the canonical C-y to yank instead of M-y as it is the case in the search

Hand editing web stuff (outside of wysiwyg editors, that is) a lot of times requires dealing with messy html markup. Even though tags are meant to be parsed by browsers, and there are even performance benefits for serving ugly obfuscated code without white spaces, the human readability of the markup needs some tidiness in the formatting of web pages for which the indentation of tags makes us able to understand what goes on inside it.

Emacs has a large set of commands to internally handle code indentation, but never occurred to me until recently that they could be helpful to simply re-align the clutter of html tags that you can see from the output of a source view.

I went on defining something basic like this little elisp function to quickly help with the general re-alignment of code.

(defun my-tidy ()
  "Automatically re-indents code"
   (read-kbd-macro "C-u -999 M-x indent-region RET C-x C-x M-x indent-region")))

(global-set-key (kbd "C-M-*") 'my-tidy)

Now that was fine but still insufficient. Say, what about the embedded css or javascript that normally goes along inside a web page?
I didn’t like the way emacs (in its default mode, of course) gets css code re-aligned, plus I actually do have specific requirements which demand the ability to reformat the css style sheets declaration and rules when editing a web document. As I use firefox with its web-developer plug-in, in order to save screen real state I leave the narrowest possible window at my left, therefore it is inconvenient to have css stuff deeply indented to the right (wanted just one space from left margin), additionally, rules needed to be broken up nicely as well (avoiding long chains like #div a.rule1, #div b.rule2, #div c.rule 3 for example).

This screen-shot might show better why the formatting of css code matters so much in my work setup.

my css editing style

So playing a bit more with the idea, I went on crafting some good old sed and awk one-liners for reformatting all what could be found embedded inside <style> tags. The formidable shell-command-on-region in emacs allows such things. You will note that the regexp ain’t look pretty,  it’s sort of long (I might make another post explaining how it breaks up), and also has the dreaded leaning toothpick syndrome! cause emacs lisp needs characters to be double escaped.

In short, all what I wanted is wrapped up below exactly as it now goes inside my emacs init :

(defun select-css-code ()
  "Select region contained by <style></style> tags.
   Simply highliths what is between those tags for embedded css content"
  (let(p1 p2)
    (goto-char (point-min))
    (search-forward "<style ")
    (backward-char 7)
    (setq p1 (point))
    (search-forward "</style>")
    (setq p2(point))
    (goto-char p1)
    (push-mark p2)
    (setq mark-active t)))

(defun select-javascript-code ()
  "Select region contained by <script></script> tags.
   Simply highliths what is between those tags for embedded javascript content"
  (let(p1 p2)
    (goto-char (point-min))
    (search-forward "<script ")
    (backward-char 8)
    (setq p1 (point))
    (search-forward "</script>")
    (setq p2(point))
    (goto-char p1)
    (push-mark p2)
    (setq mark-active t)))

(defun re-indent-web-page-code ()
  "Re-indents html code including its embedded javascript and css.
The css code gets indented diferently (through some awk and sed one-liners)
to ease the editing of styles with Firefox using a window of its Web-Developer plug-in."
    (indent-rigidly (region-beginning)(region-end) -999)
    (indent-region (region-beginning) (region-end))
    (indent-rigidly (region-beginning)(region-end) -999)
    (indent-region (region-beginning) (region-end))
       (setq command  "awk  '/{/ {gsub(/,/,\",\\n\")} {print }' | sed -ne '1h;1!H;${;g;s#{\\([^\\n]\\)#{\\n\\1#g;p}' | sed -ne '1h;1!H;${;g;s#;}#;\\n}#g;p}' | sed -ne '1h;1!H;${;g;s#;\\([^\\n]\\)#;\\n\\1#g;p}' |  sed -e 's#^[ \\t]*\\(.*\\)$#\\1#g' |  awk  '!/{|}|^ *#/&&!/^\\// {$0 = \" \"$0} {print }'  | awk NF |  sed -e 's#\}$#}\\n#g' | sed -ne '1h;1!H;${;g;s#,\\n#,#g;p}' | awk  '/{/ {gsub(/,/,\",\\n\")} {print }' | sed -e 's#^[ \\t]*\\(.*\\),$#\\1,#g'" )
      (shell-command-on-region (mark)(point) command t t)))

UPDATE: note that instead of a macro call

     (read-kbd-macro "M-x indent-region"))

I’m using this straight forward lisp expression

    (indent-region (region-beginning) (region-end))

The function is almost there, I still need to address a couple of things:
What if we have many [style or javascript] sections intermingled in our html?
One way to address that would be to successively grab the content, send it to other place using the acummulating-text function of emacs like ‘append-to-buffer (, then, we can switch to that second buffer, treat the code there and simply get it back to the original document.
Also noted that the css code doesn’t get indented if the javascript tags aren’t found, so I’ll revise the logic to allow it regardless of whether javascript code exist or not.

My quick guide for checkboxes
I like them but can easily forget about the simplest detail of their creation and use:

  • to create checkboxes use “[0/0]” or “[0%]” ( both can also be together in any order)
  • doing “C-#” recalculates the counts (updates the checkbox statistics)
  • add a “- [ ]” (dash space left-braket space right-bracket) at the beginning to create a checkbox item
  • new items can be created from there doing “S-M RET” (before, or after depending on whether cursor was at the beginning or end of item)
  • doing “C-c C-c” you can toggle check marck on/off
  • with “C-x-b” toggle checkbox at point, which according to the online manual works like the following:
    • If there is an active region, toggle the first checkbox in the region and set all remaining boxes to the same status as the first. If you want to toggle all boxes in the region independently, use a prefix argument.
    • If the cursor is in a headline, toggle checkboxes in the region between this headline and the next (so not the entire subtree).
    • If there is no active region, just toggle the checkbox at point.
    • ( this is found at )
** My quick guide for checkboxes [2/3] [66%]
This is a small sample
- [ ] item 1
- [X] item 2
- [X] item 3 which containes 4 sub items
    * sub item 3.1
    * sub item 3.2
    * sub item 3.3
    * ( see more at )

This is simply to remember how to change all the DOS “^M” line breaks, from files created in Windows when in need to edit them in a Unix environment.

  • Option 1)
    with Emacs, whith the file open, type:
    M-% C-q C-m RET RET !
  • Option 2)
    use the dos2unix command:
    dos2unix somefile.txt
  • Option 3)
    to apply option 2 inside an emacs buffer:
    C-x h (to select-all)
    C-u M-x shell-command-on-region
    dos2unix RET