[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Automatic recognition of some specific coding systems
From: |
Jürgen Hartmann |
Subject: |
RE: Automatic recognition of some specific coding systems |
Date: |
Thu, 26 Feb 2015 23:34:05 +0100 |
@Eli Zaretskii: Thank you very much for your profound assessment:
> It looks like what you want is beyond the current capabilities of
> Emacs's auto-detection of encoding. See below for some alternatives.
>
> Having said that...
>
>> By the way, could you verify, that this is possible with Emacs 22.3
>> with the customization described in my previous post?
>
> ...no, it doesn't work for me. The latin-9 file is decoded using my
> locale's encoding (which isn't latin-9), and cp850 file is still
> raw-text.
Oops, this is an important finding indeed.
> So I think some other factor(s) is/are at work on your system. Your
> locale's encoding is certainly one of them, but I think there should
> be something else, either in your customizations or somewhere else.
I just repeated the tests with Emacs 22.3 using the POSIX locale,
LC_ALL=C ./emacs -q
and you are right: the cp850 file was recognized as raw-text now. The
locale I used before was
de_DE.UTF-8
The more I get involved in this topic the more I see that it is much
more complex that I thought at first glance.
> In general, even if Emacs 22.3 was capable to do the job, I think it
> was by sheer luck, and is anyway fragile, since the same
> customizations don't work for me (and AFAIU, aren't supposed to work).
> So I would suggest to explore alternative ways of doing this in Emacs
> 24 reliably.
This sounds reasonable to me. Besides the aspect of reliability, which
is of curse the most important one, doing so might also yield a
solution that is likely to survive future updates.
> Some possibilities you may wish to explore:
>
> . Put a 'coding: cp850' cookie in the cp850 files
I would rather avoid altering the files content for this technical reason.
> . If the names of the cp850 files all match some common pattern, you
> can use modify-coding-system-alist to tell Emacs to decode them by
> cp850
Unfortunately in my case there is no such pattern in the file names
that would allow to tell which coding the respective file might use.
> . Similarly, if the cp850 files' contents match some common regexp,
> you can customize auto-coding-regexp-alist to force their decoding
> by cp850
That one might do the trick: In my case the only files (at least in
the big picture) that use the DOS EOL variant are those encoded with
cp850 and vice versa. So one could think about a regular expression
that matches this unique EOL pattern.
> Of course, you can always turn the table, and do the above for
> latin-9, while keeping cp850 in set-coding-system-priority call. It
> all depends which one of these 2 lends itself better to one of these
> methods.
>
> I believe that if one of these alternatives can do the job for you,
> the result will be much more reliable.
I also think so.
So, I have to play around a little bit to get acquainted with the
construction of regular expressions for Emacs. I will be back when I
have gained a deeper insight, or a concrete solution at best.
Meanwhile I would like to thank you, Eli Zaretskii, very much for your
time and effort that you spent to provide me with this thorough
analysis and your valuable suggestions.
Juergen
- Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/24
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/24
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/24
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/25
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/25
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/25
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/25
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/26
- RE: Automatic recognition of some specific coding systems,
Jürgen Hartmann <=
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/28
Re: Automatic recognition of some specific coding systems, Yuri Khan, 2015/02/26