[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Unicode support for CVS
From: |
Da Costa Martins, Iolanda Maria (Iolanda) |
Subject: |
RE: Unicode support for CVS |
Date: |
Tue, 22 Oct 2002 08:52:52 -0400 |
UTF-16 LE
UTF-32 (good nightmares with this one if one is trying to use standard xml
parsers) would also make my files to be checked in CVS as binary - I don't
see any use on doing that.
Tested all byte orders and it's the same really.
UTF-8 is the solution - yes, that's what I said. I have no issue with UTF-8,
it's just not what I need in certain files, and some files are actually
required to be UTF-16.
As to UTF-8 is the default encoding for the web, but not for databases.
I repeat that we are not doing UTF-8 in the back-end of our process (in the
front-end, yes). My UTF-8 encoded files are going fine in to CVS as text (no
complaints here)...
So I repeat the question - UTF-16 has to be binary by faith?
If the CVS cognoscenti say yes, then binary it will be. It just doesn't
really make me completely happy to loose the merging side...
-----Original Message-----
From: Thomas Maslen [mailto:maslen@pobox.com]
Sent: Tuesday, October 22, 2002 2:30 PM
To: Da Costa Martins, Iolanda Maria (Iolanda)
Cc: 'bug-cvs@gnu.org'
Subject: Re: Unicode support for CVS
> Our projects contains Unicode encoded files.
If the "Unicode" is stored in a file, then is it UTF-16 in big-endian order,
UTF-16 in little-endian order, or UTF-8?
(In principle, I suppose, someone could store it as big- or little-endian
32-bit values, but that seems pretty bogus).
> Encoding these files in UTF-8 could be possible, but it would have an
impact
> on the our XML data implementation.
The default encoding for XML _is_ UTF-8, no?
> One of the things thatis currently not sounding good is to store text type
> of information in CVS as binary, just because we need to prevent possible
> truncation.
UTF-8 was carefully designed so that it will never contain a zero byte
unless you actually use Unicode character zero (U+0000), and I bet you
don't do that.
> Is there any solution on how to have unicode text, html and xml files
> checked in as text and not as binary, so that we might be able to use
> revision and merge options as for text,
The CVS cognoscenti should correct me if I'm wrong, but I assume that CVS
handles text with 8-bit character sets just fine (as long as you don't
use a zero byte) -- no?
If this is true, then you should be able to store all your Unicode text
using UTF-8 (which is the default for XML anyway) and CVS should handle
it perfectly well as text, not binary.
Thomas Maslen
maslen@pobox.com