lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: msg00798.html (was: 0x2276 handling)


From: Foteos Macrides
Subject: lynx-dev Re: msg00798.html (was: 0x2276 handling)
Date: Wed, 29 Apr 1998 12:04:37 -0400

"Leonid Pauzner" <address@hidden> wrote:
>>                            empirical testing reveals that v2.8
>> indeed has a serious bug in its handling of the "lg" character
>> reference.  It's defined in entities.h to the Unicode value 0x2276
>> but there is no translation entry for that value in the chartrans
>> tables, and instead of the standard error handling of using the raw
>> ampersand and "lg", v2.8 is substituting garbage characters which
>> vary as a function of which Display Character Set is selected.
>> (That's very "wrong". :)
>>
>>                                 Fote
> 
>I cannot recreate the above behaviour:
>browsing with 2.8(1dev.6) of test/sgml.html and test/unicode.html
>(searching "/" for 2276) shows up the standard error handling
>(no substitution garbage). In which circumstance did you got garbage?
>maybe there was some kind of header outside of SGML text, like URL?).

        I used the W32 binary from fdisk, which declares itself as dev4
but is a dev5 build according to Wayne.  The problem is in the handling
of attribute values via the (excessively hairy and unmaintainable :)
functions in v2.8's LYCharUtils.c and it's UCfoo.c mods, that I did
not use (with lengthy explanations to lynx-dev of why) in the code set
that I had released as v2.7.2.  The homologous functions in SGML.c and
HTPlain.c handle other conversions.  They are not coordinated in the
v2.8 code with each other and the attribute handlers in LYCharUtils.c
(Although I had coordinated them in the v2.7.2 release, the v2.8 release
"superseded" v2.7.2 without having dealt with these and other problems
in the devel code set.).  You see different problems in v2.8 depending
on the markup, and in turn whether you are using SGML.c, HTPlain.c or
LYCharUtils.c functions to set up the chartrans conversions.  To see
the problem we've been discussing, you should have used Alex Matulich's
test page (the URL was posted by Doug), and what his script returned
before he modified his stuff to treat ';' instead of just '&' as the
name=value separator (as in the HTML 4.0 recommendations, which he
obviously has now read and understood :).

 
>Yes, 0x2276 is not known for def7_uni.tbl currently, we may easily add
>U+2276:<>
>or something like this, if necessary.
> 
>From the other hand, there are still few strange characters like 0x200A
>which are _known_ by def7_uni.tbl but report error handling
>instead of promised substitution. This is a bug.
 
        It was inappropriate to have defined any SGML named character
references to Unicode values without also setting up default chartrans
conversions for them (looks like there are more than just "lg').
Depending on which of the (uncoordinated in v2.8) functions of SGML.c,
HTPlain.c or LYCharUtils.c is invoked (based on the markup and MIME
type), this has created a situation in v2.8 for which strings/Unicode
values are being passed as "known" to functions which in fact don't
know them as SGML character references, and particularly for that
mess in the v2.8's LYCharUtils.c, have no rational error recovery
associated with them.

        Also, note this problem brought out for v2.8 by Alex's test
page:  Had the "lg" in fact been handled according to SGML principles
as a character in the URL with a value greater than decimal 127, and
the markup actually intended that (e.g., for an i18n path), on
submitting it to the http server the v2.8 code is still using Klaus'
obsolete conversion function, instead of converting it to utf-8 and
then hex escaping each byte of the resultant multibyte character, as
is done in such cases by the code I had released as v2.7.2.  So even
if the chartrans stuff in v2.8 is fixed up, such URLs would still fail
to retrieve the resource for Lynx users (the server or its script would
have no way to back translate correctly).  I had posted lengthy messages
about this before the v2.8 release, but... (What a "pickle" this is!
I retired just in the knick of time. :)

                                        Fote
-- 
Foteos Macrides (address@hidden during April, '98)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]