emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Detecting the coding system of a file programmatically


From: Eli Zaretskii
Subject: Re: Detecting the coding system of a file programmatically
Date: Fri, 10 Aug 2018 10:28:07 +0300

> From: Andrea Cardaci <address@hidden>
> Date: Fri, 10 Aug 2018 03:02:55 +0200
> 
> (with-temp-buffer
>   (insert-file-contents-literally path)
>   (decode-coding-region (point-min) (point-max) 'utf-8)
>   (... do suff with the buffer ...))
> 
> I use `insert-file-contents-literally' because the non-literally
> counterpart is too slow (about twice as much apparently) as it does a
> bunch of stuff in addition to simply populate the buffer.
> Unfortunately, one of these things is to decode the buffer.
> 
> Now instead of hardcoding 'utf-8 I'd like to detect the correct
> encoding where possible, so I tried experimenting with
> `find-operation-coding-system'.

That's the wrong function to use in this case; you want
decode-coding-inserted-region instead.  Alternatively, you could use
detect-coding-region and then decode-coding-region with the value it
returns.  I suggest a good read of the "Explicit Encoding" and "Lisp
and Coding Systems" nodes of the ELisp manual.

> I created a latin-1 file (which gets
> recognised properly when I visit it) and tried the following:
> 
> (with-temp-buffer
>   (setq path "~/tmp/latin-1")
>   (insert-file-contents-literally path)
>   (find-operation-coding-system
>    'insert-file-contents
>    (cons path (current-buffer))))
> 
> But all I get is (undecided).

That's expected: find-operation-coding-system returns the _default_ to
use for the named operation.  It doesn't consider the contents of the
buffer.

> Now my question is twofold: is this the best approach for what I'm
> trying to achieve? And in any case, why does the latter example does
> not work as expected? (And hence how I can detect the coding system
> programmatically?)

I hope I answered all of those questions, if not, please ask more.

In any case, it is definitely OK to call decode-coding-region with the
value 'undecided' returned by find-operation-coding-system, because
'undecided' is a special value which signals to decode-coding-region
that detection of the actual encoding is necessary.  Thus, I expect
this to work for you:

  (with-temp-buffer
    (insert-file-contents-literally path)
    (decode-coding-region (point-min) (point-max)
                          (find-operation-coding-system
                            'insert-file-contents
                            (cons path (current-buffer)))))

But I still recommend to use decode-coding-inserted-region, because it
will do all of the above (and slightly more) for you internally.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]