[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] bug in decode_rfc2047()
From: |
David Levine |
Subject: |
Re: [Nmh-workers] bug in decode_rfc2047() |
Date: |
Thu, 03 Jan 2013 22:50:38 -0500 |
Ken wrote:
> So, I see a couple of options. We could go completely portable
> and put in a "?" (or whatever) for every byte that's invalid.
> That would have us generate multiple "?" for multibyte
> character sets like UTF8.
I'm not fond of multiple "?". So I think what we have is OK
as far as that goes.
> Unless we have a LOT of multibyte character sets to deal with,
> perhaps the special-case here for UTF8 is the best alternative?
I'm OK with that.
It's unfortunate that there isn't a more general version of
mbtowc() that takes a codeset as a parameter. Then we could use
use it the way that fmt_scan() does to find out how long the next
character is. I don't think it's worth changing the locale just
to call mbtowc().
On a different but sort of related topic:
I (finally) setup my xterms to handle UTF-8. I noticed that
scan lines get shortened when there are multibyte characters.
fmt_scan()'s cpstripped() doesn't count them. It doesn't look
like it'd be hard to fix using the info from mbtowc() that it
already has, but I think an additional parameter will be needed
to prevent overflow of the dest buffer. And cptrimmed() could
use the same fix.
David