Conversion to unibyte, magic latin-1?

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Conversion to unibyte, magic latin-1?

From:	Julian Scheid
Subject:	Conversion to unibyte, magic latin-1?
Date:	Sun, 5 May 2019 03:48:24 +1200

I'm trying to work out how to calculate the SHA-256 for a binary
string reliably (and efficiently) in Elisp.

Consider this binary string:

    $ printf '\x52\xbc\xdd\x9e' | openssl dgst -sha256
    cb0b03042399237f7fac31d47f98ac0899533d298db3a697af29621b49f86888

`secure-hash' doesn't produce the same result (all tested in 26.2):

    (secure-hash 'sha256 (concat [#x52 #xbc #xdd #x9e]))
    "cfdc1612961dc873079178b92bf0aafaa6bd33731cbaa60841eef163f85074e8"

After studying the C source code I've figured out that this is because
it does multi-byte conversion behind the scenes (by the way, C-h f
secure-hash RET doesn't tell you this.)

Armed with this knowledge, and seeing in the code that no conversion
is done for unibyte strings, I've got it to work with
`string-make-unibyte':

    (secure-hash 'sha256 (string-make-unibyte (concat [#x52 #xbc #xdd
#x9e])))
    "cb0b03042399237f7fac31d47f98ac0899533d298db3a697af29621b49f86888"

Alas, `string-make-unibyte' is declared obsolete.  The help page tells
me that I should use `encode-coding-string' instead, so I tried that
with a few obvious encodings, but no luck:

    (secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'raw-text))
    "cfdac1612961dc873079178b92bf0aafaa6bd33731cbaa60841eef163f85074e8"

    (secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'binary))
    "cfdc1612961dc873079178b92bf0aafaa6bd33731cbaa60841eef163f85074e8"

In the end I searched for a coding system that works:

    (let* ((data (concat [#x52 #xbc #xdd #x9e]))
           (ref (secure-hash 'sha256 (string-make-unibyte data))))
      (seq-filter
       (lambda (coding-system)
         (string= (secure-hash 'sha256 (encode-coding-string data
coding-system))
                  ref))
       (coding-system-list)))
    (latin-1 iso-8859-1 iso-latin-1)

    (secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'latin-1))
    "cb0b03042399237f7fac31d47f98ac0899533d298db3a697af29621b49f86888"

This works, but I'm confused... why does latin-1 work but raw-text or
binary doesn't?  More importantly, how do I know that it works
everywhere and will continue to work in the future?  Is latin-1 a
"magic" encoding or does it only happen to work because it matches
with some default coding system set somewhere in my config?

For what it's worth, I can't see a mention of latin-1 anywhere in my
coding system settings (which are all defaults, afaik):

    (list
     default-file-name-coding-system
     default-process-coding-system
     default-keyboard-coding-system
     default-process-coding-system
     default-terminal-coding-system
     coding-system-for-write
     (car coding-category-list))
    (utf-8-unix (utf-8-unix . utf-8-unix) utf-8-unix (utf-8-unix .
utf-8-unix) utf-8-unix nil coding-category-raw-text)

Could someone shed light on this?

[Prev in Thread]

Current Thread

[Next in Thread]

Conversion to unibyte, magic latin-1?, Julian Scheid <=
- Re: Conversion to unibyte, magic latin-1?, Stefan Monnier, 2019/05/04

Prev by Date: Re: Why is Elisp slow?
Next by Date: Re: Why is Elisp slow?
Previous by thread: Record where a package was installed from
Next by thread: Re: Conversion to unibyte, magic latin-1?
Index(es):
- Date
- Thread