emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: chinese word mode


From: Eric Abrahamsen
Subject: Re: chinese word mode
Date: Thu, 07 Nov 2013 15:13:55 +0800
User-agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.3 (gnu/linux)

William Xu <address@hidden> writes:

> Eric Abrahamsen <address@hidden> writes:
>
>> Eric Abrahamsen <address@hidden> writes:
>>
>> [...]
>>
>>> (define-minor-mode thai-word-mode
>>>   :global t :group 'mule
>>>   (cond (thai-word-mode
>>>      ;; This enables linebreak between Thai characters.
>>>      (modify-category-entry (make-char 'thai-tis620) ?|)
>>>      ;; This enables linebreak at a Thai word boundary.
>>>      (put-charset-property 'thai-tis620 'fill-find-break-point-function
>>>                            'thai-fill-find-break-point))
>>>     (t
>>>      (modify-category-entry (make-char 'thai-tis620) ?| nil t)
>>>      (put-charset-property 'thai-tis620 'fill-find-break-point-function
>>>                            nil))))
>>>
>>
>> [...]
>>
>>> My buffers are utf-8 encoded, and describe-char on a Chinese character
>>> shows "preferred charset: unicode-bmp". So what do I put for the charset
>>> in order to make these functions target the right characters? Chinese
>>> characters all seem to have the "|" line-breakable category by default,
>>> but (I think) I can only add the custom fill break point function one
>>> charset at a time.
>>
>> I've tried slapping the 'fill-find-break-point-function onto the
>> 'unicode charset for now, and it works fine because the function only
>> does anything if point is in the midst of Chinese. It presumably gets
>> applied to all characters, though, and that can't be a real solution.
>
> modify-category-entry also accepts a range cons, where you can select
> Chinese characters by range.  For example,
>
>      (#x3400 . #x4DBF)                    ; CJK Unified Ideographs Extension A
>      (#x4E00 . #x9FFF)                    ; CJK Unified Ideographs
>      (#xF900 . #xFAFF)                    ; CJK Compatibility Ideographs
>
> put-charset-property seems only accepts a charset..
>
>> I'm guessing I'll need to separate simplified and traditional word sets
>> and make two versions of the mode. Both modes will loop through their
>> applicable charsets and apply/remove the custom break point function.
>>
>> Assuming I fix this problem and other inevitable bugs, would this
>> library be of general interest to Emacs?
>
> It can make those word movement functions useful.  :)

That's certainly the idea! I'll admit I was motivated to do this by
using LibreOffice, which I usually can't stand, and noticing it DTRT
with Chinese words. A bit of Emacs chauvanism kicked in...

Thanks for the tips on categories and all. I don't think I need the
modify-category-entry section at all, since Chinese characters have the
"|" category by default. So it's just looping on applicable charsets.

E




reply via email to

[Prev in Thread] Current Thread [Next in Thread]