[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[emacs-bidi] Arabic implementation
From: |
TAKAHASHI Naoto |
Subject: |
[emacs-bidi] Arabic implementation |
Date: |
Fri, 9 Nov 2001 16:06:45 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) |
Eli Zaretskii writes:
>> > No, I meant the code that supports Arabic presentation forms.
>>
>> Shall we discus it here, in address@hidden
> That's the closest forum I could imagine to support of Arabic in Emacs.
OK, here it goes. Our strategy is based on the two Emacs-21's
features below.
1. Font-lock checks the content of the buffer and if it finds a
certain pattern, it changes the appearance of the string on the
fly. This is very close to what we need for Arabic.
2. We can display a composed character with an arbitrary character.
For example, (compose-region (point) (+ (point) 2) ?*) composes the
following two characters and displays an asterisk instead of
overstricking the original two characters. Note that if we do
(compose-region (point) (1+ (point)) ?*) only the directly
following character is displayed with an asterisk.
Now the procedure.
First, we advised font-lock-fontify-region so that it looks for a
character that has the new special category `?' (composition). All
Arabic characters that require glyph selection have this category.
If such a character is found, the function predefined for that
character is called to compose that character. In our case, a
function named arabic-compose-region is invoked. This function does
roughly the followings.
case 1: preceding-char is Arabic && following-char is Arabic
-> Compose the found character to use its word medium form
for display
case 2: preceding-char isn't Arabic && following-char is Arabic
-> ... word initial form ...
case 3: preceding-char is Arabic && following-char isn't Arabic
-> ... word final form ...
case 4: preceding-char isn't Arabic && following-char isn't Arabic
-> ... isolated form ...
(Of course you have to do more in the real life. Some Arabic
characters are never connected, even in the middle of a word, to the
following character; the sequence laam-alef need to be displayed with
a special ligature; you have to handle diacritical marks, etc.)
All necessary Arabic glyph variants (called presentation forms) falls
into the range of mule-unicode-e000-ffff, so we created necessary
fonts. Thus the buffer contains Arabic characters that belong to the
mule-unicode-0100-24ff charset but displayed with the
mule-unicode-e000-ffff charset.
And we wrote a quail package for Arabic.
You can see a screendump at http://www.m17n.org/ntakahas/arabic.png
--
TAKAHASHI Naoto
address@hidden
http://www.m17n.org/ntakahas/