paragui-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [paragui-users] Full UNICODE support


From: Teunis Peters
Subject: Re: [paragui-users] Full UNICODE support
Date: Wed, 12 Feb 2003 11:19:01 -0800 (PST)

On 12 Feb 2003, Kamil Skalski wrote:

> > I for one am VERY interested.  I've been working on the OpenGL branch -
> > which uses an entirely different rendering approach.  (it actually isn't
>
> BTW I was wondering if ParaGUI could be entirely rendered by OpenGL.
> OpenGl supports loading textures into it's memory and hardware blitting
> them into view...

That's what the opengl driver in the opengl branch does.  I'm also going
to be doing a DirectX driver once I'm happy with the opengl one :)

> >     Do strings stay in UTF8 format?  (useful but not the only way)
>
> I don't get what you mean... UTF8 are strings in C meaning. And as far
> as I can see, Para don't assume how they look like, except for LineEdit
> code. There is still standard rule, that string is ended with '\0' and
> in UTF8 it holds.

yep.  The UTF encoding is -great- as any C function will handle it fine...
as long as they don't assume arbitrary rules such as all characters are
0-127 :)

> >     Layout rules
> >     - eg. left-to-right/right-to-left/top-to-bottom,compressed
> >     (Korean can be 'compressed' where multiple sounds make up
> >     a character)
>
> Wow. It is an issue which I didn't even think about. But anyway it seems
> to be only problem of positioning text in widgets and rendering in some
> other directions than it's done now, so it should be easy to modify and
> generalize current code to support such things.

Yep.  It's a wee bit tricky but that's what text layout toolkits are for.
As far as I know the best one out there is pango (part of gtk/gnome
project).

> >     - Input methods (most Unix ones are X-specific, more the pity)
>
> First I thought that SDL provides unicodes for any input characters, but
> now I'm really disappointed. I searched web for a while and there seems
> to be no general method to interpret input as unicode codes.

Nope.  No there isn't.  What SDL does is provide unicode in the cases of
keyboards that have keys present that aren't in the ASCII set.  Such as
accented characters (or even the accent 'dead keys').

For languages such as Japanese, a full input editor's required.
Particularily with it's ~2500 characters used in everyday speech
(including 4 alphabets).  Arabic is much simpler only requiring 4 cases
and interpretive compression (and right-to-left).

> >     - font caching - for quick display rather than rendering every
> >       character individually as the current system (opengl branch
>
> I don't know the internal architecture of ParaGUI, so I've no idea.
> What's wrong about just cache once used character glyph?

Remember that 2500+ character thing?  *g*.  Have to have some kind of
order to prevent memory bloat from too many fonts/sizes/typefaces/... such
as italic/bold/normal/pointsize/...

> > What do you mean by Unicode support btw?  UCS-4, UCS-2, UTF-16, UTF-8
> > encodings?  Wide character support?
>
> As I understand unicode, there are only ONE unicode codes and UTF-16,
> UTF-8... are just endoding of these codes, way to write 32-bit integers
> into text files. So what I mean by unicode support is just set of
> methods do support text written in unicode (precisely in some encoding,
> like UTF8 - simpliest), display it in ParaGUI labels and make possible
> to type them form keyboard (could be not so easy as I thought)

oh right.  Also ISO-10646 versus unicode?  Unicode uses a 65535 character
table where ISO-10646 adds additional code pages for languages not handled
in unicode (such as Chinese names at >100000 characters).  Let's just
assume:
        ISO-10646 character interpretation (32-bit characters)
        UTF encoding (for C string compatibility)
        (consider Java rule of encoding '0' character?)

There.  That should do it.  Just wondering if you went outside that...  as
glibc natively supports wchar_t, it IS possible to have unencoded strings.

> > As far as I know, outside of tricky layouts and rendering, UTF8 should be
> > the easiest format to leave strings in though.  And it's compatible with
> > most existing C (and especially C++) libraries.  Also the added silly that
> > linux filenames can be encoded in UTF8....
>
> I agree... it seems that I didn't tell anything new... but as for now I
> can display strings in UTF8 just like they should look like. And maybe
> it would be possible to type in special characters even if it requires
> platform specific chartables.

font-specific and language-specific actually.  Code pages are great for
providing WHICH translation to use (eg. ISO-8859-1 being latin-1 :)

As most systems run, theoretically one's keyboard should spit out all the
characters you need.  (there ARE polish keyboards :)
Failing that, want to write a character-table widget?

Look to (eventually) having input-methods such as WNN/canna/...  (I'll
write it one of these days soon :) for the phonetic/lookup systems

I'm a little worried about string rendering - paragui's is -lousy-.

G'day, eh? :)
        - Teunis





reply via email to

[Prev in Thread] Current Thread [Next in Thread]