[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: The “binary-friendly” Latin-1
From: |
Ludovic Courtès |
Subject: |
Re: The “binary-friendly” Latin-1 |
Date: |
Tue, 25 Jan 2011 14:21:50 +0100 |
User-agent: |
Gnus/5.110011 (No Gnus v0.11) Emacs/23.2 (gnu/linux) |
Hello!
>> 1. The notion of a “binary-friendly” ISO-8859-1 encoding? It’s
>> actually mostly gone with the iconv change, since every textual
>> access goes through iconv. For binary accesses, the right API is
>> (rnrs io ports) or similar.
>
> An equivalent question is if you care about backward compatibility of
> legacy ports. Legacy ports returned strings and were once the only option.
You mean if there’s legacy code using a port of unspecified encoding to
read binary data, right?
The iconv change doesn’t break it on GNU/Linux:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (define p (open-bytevector-input-port #vu8(0 1 2 3 255
128)))
scheme@(guile-user)> (set-port-encoding! p "ISO-8859-1")
scheme@(guile-user)> (read-char p)
$14 = #\nul
scheme@(guile-user)> (read-char p)
$15 = #\soh
scheme@(guile-user)> (read-char p)
$16 = #\stx
scheme@(guile-user)> (read-char p)
$17 = #\etx
scheme@(guile-user)> (read-char p)
$18 = #\ÿ
scheme@(guile-user)> (read-char p)
$19 = #\200
scheme@(guile-user)> (read-char p)
$20 = #<eof>
--8<---------------cut here---------------end--------------->8---
However, an iconv implementation may be free to choke on anything that’s
not strictly Latin-1 per
<https://secure.wikimedia.org/wikipedia/en/wiki/ISO-8859-1#Codepage_layout>,
e.g., everything but “ÿ” in the example above, but that seems highly
unlikely.
Anyway, as soon as you use a non-Latin-1 locale, ports get opened under
that locale’s encoding, which practically makes it impossible to do
binary I/O on the ports.
>> 2. The #f <=> "ISO-8859-1" equivalence for ‘port-encoding’ and
>> ‘set-port-encoding!’. Likewise, commit
>> d9544bf012b6e343c80b76bd5761b1583cc106a3 makes ‘port-encoding’
>> always return a string and pt->encoding always be non-NULL.
>
> Is the cost of doing the various string comparisons of port-encoding
> strings negligible? It was put in as a (premature) optimization.
The new code keeps open iconv conversion descriptors for each port and
re-uses them; the only use of pt->encoding is when opening those CDs.
Thanks,
Ludo’.