[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnu.java.io.encode.EncoderUTF8.java
From: |
Per Bothner |
Subject: |
Re: gnu.java.io.encode.EncoderUTF8.java |
Date: |
Mon, 04 Aug 2003 12:30:48 -0700 |
User-agent: |
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030612 |
David P Grove wrote:
I've been tracking down a bug using classpath to run JSPs on top of Jikes
RVM and I think the root of the problem is that EncoderUTF8.java is
strictly following the UTF8 encoding scheme instead of the "pseudo-UTF8"
that JVMs actually need. In particular, the character \u0000 is being
encoded as the one byte 0 instead of the 2 byte sequence that Java uses.
I'm happy to contribute a bug fix for this. My question is should I
change EncoderUTF8 to implement the Java treatment of \u0000,
That would be wrong. EncoderUTF8 is used to convert 16-bit Unicode
to the *external* UTF8 encoding used for files etc. Not the Java
pseudo-UTF8.
I can only think of one reason why you'd want to create the Java
pseudo-UTF8 format: when writing a Java class file. Implement
that however you wish, but don't change the behavior of EncoderUTF8.
You could add a flag to EncoderUTF8 file to enable "Java-style UTF8",
but it can't be the default.
--
--Per Bothner
address@hidden http://per.bothner.com/