[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Automatic recognition of some specific coding systems
From: |
Eli Zaretskii |
Subject: |
Re: Automatic recognition of some specific coding systems |
Date: |
Thu, 26 Feb 2015 18:36:04 +0200 |
> From: Jürgen Hartmann <juergen_hartmann_@hotmail.com>
> Date: Thu, 26 Feb 2015 00:23:50 +0100
>
> > Try this:
> >
> > (set-coding-system-priority 'utf-8 'cp850)
>
> After doing this, the coding systems
>
> utf-8
> cp850
>
> get correctly recognized, but
>
> latin-9-unix
>
> gets wrongly recognized as cp850-unix encoded.
>
> If I modify the lisp expression to
>
> (set-coding-system-priority 'utf-8 'latin-9)
>
> it is utf-8 and latin-9 that are properly recognized while the test
> file
>
> cp850-dos
>
> gets detected as iso-latin-9-dos encoded.
I feared that might be the result.
> If I pass all three coding systems to set-coding-system-priority,
>
> (set-coding-system-priority 'utf-8 'latin-9 'cp850) or
> (set-coding-system-priority 'utf-8 'cp850 'latin-9)
>
> it turns out that the function set-coding-system-priority ignores the third
> coding system in these cases, because it belongs to the same coding
> category as the coding system named in the second place. The source
> code src/coding.c comments this in the lines 9972 and 9973 like this:
>
> /* Ignore this coding system because a coding system of the
> same category already had a higher priority. */
Yes, I know. That's why I only mentioned 2 of them.
It looks like what you want is beyond the current capabilities of
Emacs's auto-detection of encoding. See below for some alternatives.
Having said that...
> By the way, could you verify, that this is possible with Emacs 22.3
> with the customization described in my previous post?
...no, it doesn't work for me. The latin-9 file is decoded using my
locale's encoding (which isn't latin-9), and cp850 file is still
raw-text.
So I think some other factor(s) is/are at work on your system. Your
locale's encoding is certainly one of them, but I think there should
be something else, either in your customizations or somewhere else.
In general, even if Emacs 22.3 was capable to do the job, I think it
was by sheer luck, and is anyway fragile, since the same
customizations don't work for me (and AFAIU, aren't supposed to work).
So I would suggest to explore alternative ways of doing this in Emacs
24 reliably. Some possibilities you may wish to explore:
. Put a 'coding: cp850' cookie in the cp850 files
. If the names of the cp850 files all match some common pattern, you
can use modify-coding-system-alist to tell Emacs to decode them by
cp850
. Similarly, if the cp850 files' contents match some common regexp,
you can customize auto-coding-regexp-alist to force their decoding
by cp850
Of course, you can always turn the table, and do the above for
latin-9, while keeping cp850 in set-coding-system-priority call. It
all depends which one of these 2 lends itself better to one of these
methods.
I believe that if one of these alternatives can do the job for you,
the result will be much more reliable.
- Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/24
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/24
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/24
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/25
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/25
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/25
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/25
- Re: Automatic recognition of some specific coding systems,
Eli Zaretskii <=
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/26
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/28
Re: Automatic recognition of some specific coding systems, Yuri Khan, 2015/02/26