[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Conversion to unibyte, magic latin-1?
From: |
Julian Scheid |
Subject: |
Conversion to unibyte, magic latin-1? |
Date: |
Sun, 5 May 2019 03:48:24 +1200 |
I'm trying to work out how to calculate the SHA-256 for a binary
string reliably (and efficiently) in Elisp.
Consider this binary string:
$ printf '\x52\xbc\xdd\x9e' | openssl dgst -sha256
cb0b03042399237f7fac31d47f98ac0899533d298db3a697af29621b49f86888
`secure-hash' doesn't produce the same result (all tested in 26.2):
(secure-hash 'sha256 (concat [#x52 #xbc #xdd #x9e]))
"cfdc1612961dc873079178b92bf0aafaa6bd33731cbaa60841eef163f85074e8"
After studying the C source code I've figured out that this is because
it does multi-byte conversion behind the scenes (by the way, C-h f
secure-hash RET doesn't tell you this.)
Armed with this knowledge, and seeing in the code that no conversion
is done for unibyte strings, I've got it to work with
`string-make-unibyte':
(secure-hash 'sha256 (string-make-unibyte (concat [#x52 #xbc #xdd
#x9e])))
"cb0b03042399237f7fac31d47f98ac0899533d298db3a697af29621b49f86888"
Alas, `string-make-unibyte' is declared obsolete. The help page tells
me that I should use `encode-coding-string' instead, so I tried that
with a few obvious encodings, but no luck:
(secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'raw-text))
"cfdac1612961dc873079178b92bf0aafaa6bd33731cbaa60841eef163f85074e8"
(secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'binary))
"cfdc1612961dc873079178b92bf0aafaa6bd33731cbaa60841eef163f85074e8"
In the end I searched for a coding system that works:
(let* ((data (concat [#x52 #xbc #xdd #x9e]))
(ref (secure-hash 'sha256 (string-make-unibyte data))))
(seq-filter
(lambda (coding-system)
(string= (secure-hash 'sha256 (encode-coding-string data
coding-system))
ref))
(coding-system-list)))
(latin-1 iso-8859-1 iso-latin-1)
(secure-hash 'sha256 (encode-coding-string (concat [#x52 #xbc #xdd
#x9e]) 'latin-1))
"cb0b03042399237f7fac31d47f98ac0899533d298db3a697af29621b49f86888"
This works, but I'm confused... why does latin-1 work but raw-text or
binary doesn't? More importantly, how do I know that it works
everywhere and will continue to work in the future? Is latin-1 a
"magic" encoding or does it only happen to work because it matches
with some default coding system set somewhere in my config?
For what it's worth, I can't see a mention of latin-1 anywhere in my
coding system settings (which are all defaults, afaik):
(list
default-file-name-coding-system
default-process-coding-system
default-keyboard-coding-system
default-process-coding-system
default-terminal-coding-system
coding-system-for-write
(car coding-category-list))
(utf-8-unix (utf-8-unix . utf-8-unix) utf-8-unix (utf-8-unix .
utf-8-unix) utf-8-unix nil coding-category-raw-text)
Could someone shed light on this?
- Conversion to unibyte, magic latin-1?,
Julian Scheid <=