openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] UTF-8


From: Jim Atkinson
Subject: Re: [Openexr-devel] UTF-8
Date: Thu, 15 Nov 2012 09:16:18 -0800

Hi Florian -

On Nov 14, 2012, at 9:11 PM, Florian Kainz <address@hidden> wrote:

> Instead of normalizing strings before they are stored in files, the
> OpenEXR library could normalize strings on the fly before every string
> comparison.  That way every string would be preserved exactly.  Speed
> could be an issue, though.  String comparisons are not rare, and on-the-fly
> normalization would slow them down considerably.

I'm with David on this one.  I think the library should not change the strings 
that the application asks it to store.  Your earlier recommendations were:

On Nov 14, 2012, at 11:47 AM, Florian Kainz <address@hidden> wrote:

> - All text strings are to be interpreted as Unicode, encoded as UTF-8.
>  This includes attribute names and strings contained in attributes,
>  for example, as channel names.
> 
> - Text strings stored in files must be in Normalization Form C (NFC,
>  canonical decomposition followed by canonical composition).
> 
> - Where text strings need to be collated, strcmp() is used to compare
>  the corresponding char sequences:  string A comes before (or is less
>  than) string B if
> 
>    strcmp(A,B) == -1
> 
>  (Note: this is not ambigous; the C99 standard specifies that strcmp()
>  interprets the bytes that make up a string as unsigned.)
> 
> - Text strings passed to the IlmImf library must be encoded as UTF-8
>  and in Normalization Form C.

None of these recommendations require the IlmImf library to do anything other 
than compare strings with strcmp().  So I'd change the word "must" to "should" 
and leave the onus of Unicode normalization on the application.  Well behaved 
applications and well defined OpenEXR files should use Normalized Form C UTF-8 
strings, but the library itself shouldn't worry about it if they don't.

- Jim


reply via email to

[Prev in Thread] Current Thread [Next in Thread]