Re: [Openexr-devel] UTF-8

openexr-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] UTF-8

From:	Britton, Andrew D
Subject:	Re: [Openexr-devel] UTF-8
Date:	Fri, 16 Nov 2012 21:50:38 +0000

I wonder though if adding this normalization step, albeit tricky, will help 
future proof the file format by addressing internationalization concerns up 
front. Like Jim, I'm also a monolingual English speaker so am vastly unaware of 
how often this issue arises outside of English. If everyone now, and in the 
future, were expert EXR users I would think this point to be moot and assume 
that normalization is not required; as EXR continues to be adopted though an 
ever expanding group of non-expert users will grow.

In addition, the words expressed from the digital archivist at my office:
1) normalization for header and channel fields is a good idea because all users 
know what to expect up on return of data and exactly in what format the data 
will be returned
2) Even if current search tools will miss terms, as in gruen, because of 
different representations of the same word then search tools, later, will be 
developed to address that issue.
3) The point is standardization of the important string fields will make the 
format even stronger and more predictable. 

Andrew Britton

-----Original Message-----
From: address@hidden [mailto:address@hidden On Behalf Of Larry Gritz
Sent: Friday, November 16, 2012 10:05 AM
To: Jim Atkinson
Cc: address@hidden
Subject: Re: [Openexr-devel] UTF-8

OpenEXR shouldn't have to worry about this nonsense internally.

My vote would be to make libIlmImf continue doing an exact compare (strcmp), 
but put in the docs that the strings are assumed to be UTF-8 and that any apps 
using the library should canonicalize with normalization "C" any channel name 
or attribute strings *before* passing them to the library, and if they don't do 
so, they get what they deserve in terms of false negative string matches.

        -- lg

On Nov 16, 2012, at 8:34 AM, Jim Atkinson wrote:

> 
> On Nov 15, 2012, at 4:12 PM, Florian Kainz <address@hidden> wrote:
> 
>> Jim, what would be the downside of normalizing attribute and channel names?
>> Would it prevent you from doing something that you can do now?
>> 
>> I don't buy the slippery slope argument.  If we say that attribute 
>> names and channel names must be normalized, but other strings don't 
>> have to be, where's the problem?
> 
> I agree, "foo" is a contrived example.  Also, I guess I misunderstood you 
> since I thought you were suggesting that we normalize *all* strings in the 
> header.  I'm much happier with the suggestion that we normalize all attribute 
> and channel names but no attribute values.
> 
> My main complaint is that it adds a lot of extra complexity, dependencies, 
> and portability issues to an image format library.  The main reason for 
> adding it seems to be that there may be an application that may display a 
> layer name that a user may have difficulty typing in.
> 
> As a monolingual English speaker, I simply don't run into these issues often. 
>  When confronted with a layer named "공룡" or even "grün", I'm forced to cut 
> and paste it or select it in a GUI.  So maybe I don't understand how really 
> pervasive and frustrating these issues are.  But it sure seems like unicode 
> normalization issues should be outside the scope of OpenEXR.
> 
> - Jim
> 

--
Larry Gritz
address@hidden

_______________________________________________
Openexr-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/openexr-devel

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Openexr-devel] UTF-8, (continued)
- Re: [Openexr-devel] UTF-8, Lars Borg, 2012/11/13

Prev by Date: Re: [Openexr-devel] UTF-8
Previous by thread: Re: [Openexr-devel] UTF-8
Next by thread: Re: [Openexr-devel] UTF-8
Index(es):
- Date
- Thread