help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Trying to input Unicode via GNU Emacs 21.3.1


From: Peter Dyballa
Subject: Re: Trying to input Unicode via GNU Emacs 21.3.1
Date: Sat, 12 Feb 2005 14:29:48 +0100


Am 11.02.2005 um 22:00 schrieb List account:

For instance, I need to be able to display the typical accented Spanish, Italian and French characters. As an example, I can input "Alarcón" in Emacs and it looks fine, but it displays in my browser (Camino 0.82 on Mac OS X) as "Alarcón". The odd thing is that I basically copied and modified this text from a page that actually works just fine.

Camino is not clever in guessing an HTML file's encoding: I can teach ten times and more the right encoding and when I return to that page it's again the default encoding from the preferences. So you should be not that stupid and start your HTML file this way:

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <!-- ... other things ... -->
</head>

Here all charset names are defined: http://www.iana.org/assignments/character-sets.

The two characters ó explain that, what you've typed in GNU Emacs was correctly encoded as UTF-8. Character Palette (in Mac OS X) tells me about ó that it is in UTF-8 "C3 B3", i.e. à followed by ³. Camino should be able to display these two characters, if you VIEW it in UTF-8, as one ó. Defining the charset used in the HTML source's header should Camino, and other browsers, make automatically switch to the correct character set -- and maybe you should have set the correct font that is Unicode!


I have the following lines in my .emacs:
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

It has been said a few times that this is too much, at least set-keyboard-coding-system is incorrect. Usually your keyboard will work in some Latin mode, i.e. produce only *one* character on hitting or releasing a key (UTF-8 is one, two, three, and I think even some more characters, for example in the case that you input a character from a right-to-left script in a left-to-right script environment, and vice versa). It might be more helpful when you set LANG to some (Spanish? French?) UTF-8 setting (man locale).


I have also tried the technique of hitting [C-q] and entering the Unicode string, but it chokes on the codes for accented characters and instead of inserting the accented "a" character (0x00E1) by typing C-q 0 0 E 1 it produces "^@e1".

As far as I know the C-q syntax supports only *octal* values. So the inputs ends when you input something outside the octal range of 0...7, e is that finishing item, RET another. So you see ASCII NUL, which is represented in Emacs as ^@, followed by e and 1, which are unchanged.

--
Greetings

  Pete

  Basic, n.:
        A programming language.  Related to certain social diseases in
that those who have it will not admit it in polite company.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]