[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Trying to input Unicode via GNU Emacs 21.3.1
From: |
Peter Dyballa |
Subject: |
Re: Trying to input Unicode via GNU Emacs 21.3.1 |
Date: |
Sat, 12 Feb 2005 14:29:48 +0100 |
Am 11.02.2005 um 22:00 schrieb List account:
For instance, I need to be able to display the typical accented
Spanish, Italian and French characters. As an example, I can input
"Alarcón" in Emacs and it looks fine, but it displays in my browser
(Camino 0.82 on Mac OS X) as "Alarcón". The odd thing is that I
basically copied and modified this text from a page that actually
works just fine.
Camino is not clever in guessing an HTML file's encoding: I can teach
ten times and more the right encoding and when I return to that page
it's again the default encoding from the preferences. So you should be
not that stupid and start your HTML file this way:
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<!-- ... other things ... -->
</head>
Here all charset names are defined:
http://www.iana.org/assignments/character-sets.
The two characters ó explain that, what you've typed in GNU Emacs was
correctly encoded as UTF-8. Character Palette (in Mac OS X) tells me
about ó that it is in UTF-8 "C3 B3", i.e. Ã followed by ³. Camino
should be able to display these two characters, if you VIEW it in
UTF-8, as one ó. Defining the charset used in the HTML source's header
should Camino, and other browsers, make automatically switch to the
correct character set -- and maybe you should have set the correct font
that is Unicode!
I have the following lines in my .emacs:
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)
It has been said a few times that this is too much, at least
set-keyboard-coding-system is incorrect. Usually your keyboard will
work in some Latin mode, i.e. produce only *one* character on hitting
or releasing a key (UTF-8 is one, two, three, and I think even some
more characters, for example in the case that you input a character
from a right-to-left script in a left-to-right script environment, and
vice versa). It might be more helpful when you set LANG to some
(Spanish? French?) UTF-8 setting (man locale).
I have also tried the technique of hitting [C-q] and entering the
Unicode string, but it chokes on the codes for accented characters and
instead of inserting the accented "a" character (0x00E1) by typing C-q
0 0 E 1 it produces "^@e1".
As far as I know the C-q syntax supports only *octal* values. So the
inputs ends when you input something outside the octal range of 0...7,
e is that finishing item, RET another. So you see ASCII NUL, which is
represented in Emacs as ^@, followed by e and 1, which are unchanged.
--
Greetings
Pete
Basic, n.:
A programming language. Related to certain social diseases in
that those who have it will not admit it in polite company.