Saturday, December 18, 2010

Inserting Unicode Characters
and UTF-8 Characters Into Vim

 
I was wondering how to insert a
unicode character into Vim the
other day. Here's what I found.

First off, you have to know how
to enter a regular ASCII character
into Vim using its number on the
ASCII table. Here's how you do
that:

  1. Enter insert mode
  2. Type ctrl-v
  3. Type the decimal equivalent
    for the ASCII character
  4. To actually see the ASCII
    character, type the
    ESC key

Say, for example, you wish to
enter a capital-A. Here's how
you would do it following the
above steps:

  1. Type i for insert
  2. Type ctrl-v to enter
    the ASCII code for a capital-A
  3. Type 65 to represent the
    letter A
  4. Hit the ESC key to end
    the insertion of text

Of course it is easier to type a
capital-A simply by hitting the
A-key while in insert mode. I'm
taking the long way around the barn
here in order to explain things.

What does this have to do with Unicode?
For Unicode characters, you take an extra
step. After typing ctrl-v, you follow
the ctrl-v with a Lowercase-u.

Here are the steps again, slightly adjusted
for Unicode:

  1. Enter insert mode
  2. Type ctrl-v u
  3. Type the hexidecimal equivalent
    for the Unicode character
  4. To see what effect the
    insertion of a Unicode character
    has had on your document, type
    the ESC key. Hitting
    ESC ends text insertion.

Here's how to enter a Capital-A
character in Unicode.

  1. Enter insert mode
  2. Type ctrl-v u
    to insert a Unicode character
  3. Type 0041 which
    is the 2-byte hexadecimal equivalent
    for a capital-A
  4. To actually see the Capital-A,
    type the ESC key. Hitting
    ESC ends text insertion.

What does this have to do with
UTF-8? So far, I've only mentioned
Unicode. UTF-8 is a specific
implemenation of Unicode. Wikipdedia
describes the relationship:

Unicode

Remember these distinctions when entering
ASCII versus entering Unicode in Vim:

  1. ASCII is one byte
  2. Unicode is 2 bytes
  3. ASCII is expressed in
    decimal notation
  4. Unicode is expressed in
    hexadecimal notation

If you are entering an ASCII
character in Vim, you will use
one byte of decimal notation.
If you are entering a Unicode
character in Vim, you will use
2 bytes of hexadecimal notation.
To the best of my knowledge, the
notation (decimal or hexadecimal
) is hard-wired. Please post if
I'm wrong and I'll correct myself.

One more thing I find it helpful
to know is what kind of encoding
Vim is using. Here's how I find
out:

:set encoding

This also seems to work:

:set enc

The answer that comes back in
my current session of Vim is:

encoding=utf-8

Seeing UTF-8 on my screen
makes me feel secure in the knowledge
that I'll be able to enter Unicode
into Vim.

There are many uses for a flexible
tool. That's the lesson I learn
over and over again when using Vim.

Ed Abbott