Re: [Gnue-dev] Unicode + wxPython

gnue-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnue-dev] Unicode + wxPython

From:	Aditya Gilra
Subject:	Re: [Gnue-dev] Unicode + wxPython
Date:	Fri, 23 Aug 2002 05:19:07 -0700 (PDT)

||Shriharih||

God-remembrance.

Unicode encoded as utf-8 has been working for me in
wxpython forms and postgresql since forms 0.1 . utf-16
has not worked and is not needed either.

----------
BUT there are issues -

1. utf-8 is a variable-length multi-byte encoding.
Whenever you use arrow keys in a text field len(str)
where str is a utf-8 encoded plain string will not
work. You will need to use len(unicode(str,"utf-8"))
to give you the correct length. At the end of cursor
movement, you will need to convert back to utf-8 using
codecs.utf_8_encode(ustr)[0].

forms-0.1 required many changes, but the latest
forms-0.3 requires only 2-3 lines change in one file
which I have send as part of my pygtk patch.

--------
AND 1 problem with wx Windows.

2. This is not an issue with most languages but it is
with Hindi and most Indic & Arabic scripts. Ligaturing
based on adjacent characters is essential for proper
display of Hindi. Present wxWindows is based on
gtk+-1.2 which does not use pango. So it is useless
for us in India. 

However, the latest wxWindows CVS has an enable-gtk2
configure option which will enable gtk2 which uses
pango. Once we have the next version of wxWindows and
wxPython, Hindi ligaturing will work. Till then, I
must use my pygtk form.

- Aditya.

--- Arturas Kriukovas <address@hidden> wrote:
> Hello
> 
> 
> some time ago i thought that wxPython does not
> support unicode. But with
> a help from Jan & Codeworks guru Marius appeared,
> that it's possible to
> get unicode wxForm via wxPython. Full sequence of
> achieving this is
> written below. However, the main thing is that with
> currect outer
> software (on my system it is Python 2.1.3, wxPython
> 2.2) we can see
> unicode characters and should be able to support
> them.
> 
> 
> Arturas
> address@hidden
> address@hidden
> 
> 
> How to put unicode characters in GNUE form:
> 1. Getting unicode console.
> My usual system locale configuration is:
>     >locale
>     LANG=POSIX
>     LC_CTYPE="lt_LT.ISO8859-13"
>     LC_NUMERIC="lt_LT.ISO8859-13"
>     LC_TIME="lt_LT.ISO8859-13"
>     LC_COLLATE="lt_LT.ISO8859-13"
>     LC_MONETARY="lt_LT.ISO8859-13"
>     LC_MESSAGES="lt_LT.ISO8859-13"
>     LC_PAPER="lt_LT.ISO8859-13"
>     LC_NAME="lt_LT.ISO8859-13"
>     LC_ADDRESS="lt_LT.ISO8859-13"
>     LC_TELEPHONE="lt_LT.ISO8859-13"
>     LC_MEASUREMENT="lt_LT.ISO8859-13"
>     LC_IDENTIFICATION="lt_LT.ISO8859-13"
>     LC_ALL=lt_LT.ISO8859-13
>     >locale charmap
>     ISO-8859-13
> On your system it will of course be different, but
> not too much.
> Too main variables here are:
>  * LC_CTYPE and
>  * LC_ALL.
> LC_ALL is important because it overrides all other
> LC_* variables - any
> changes to LC_CTYPE will not work if LC_ALL will be
> set to some value.
> LC_CTYPE defines your 'character classification'
> (from manual; as far as
> i saw it practically defines what key on your
> keyboard what letter
> define).
> We need to unset LC_ALL (because it will override
> LC_CTYPE and we do not
> need this) and change LC_CTYPE to some utf-8 value.
> I do it like this:
>     unset LC_ALL
>     LC_CTYPE=lt_LT.UTF-8
> After running it my locale configuration becomes
> like this:
>     >unset LC_ALL
>     address@hidden:11:22:38:~:
>     >LC_CTYPE=lt_LT.UTF-8
>     address@hidden:11:22:57:~:
>     >locale
>     LANG=POSIX
>     LC_CTYPE=lt_LT.UTF-8
>     LC_NUMERIC="POSIX"
>     LC_TIME="POSIX"
>     LC_COLLATE=C
>     LC_MONETARY="POSIX"
>     LC_MESSAGES="POSIX"
>     LC_PAPER="POSIX"
>     LC_NAME="POSIX"
>     LC_ADDRESS="POSIX"
>     LC_TELEPHONE="POSIX"
>     LC_MEASUREMENT="POSIX"
>     LC_IDENTIFICATION="POSIX"
>     LC_ALL=
>     address@hidden:11:23:00:~:
>     >locale charmap
>     UTF-8
> Now with current locale configuration we need to
> start xterm. Note that
> we also need to set specific font font xterm, what
> can be done:
>     xterm -fn -misc-fixed-medium-*-140-*-iso10646-1 
>  -fb -misc-fixed-medium-*-140-*-iso10646-1 &
> [there shouldn't be a line break before '-fb' - all
> command should be in
> one line - this break is for better readability]
> The parameters for xterm set it to use misc fixed
> .... font, the most
> important thing here is font encoding - iso10646-1 -
> it is ISO unicode
> font encoding name.
> Personally i use a small script:
>         #!/bin/bash
>         old_CTYPE=$LC_CTYPE
>         old_ALL=$LC_ALL
>         LC_CTYPE=lt_LT.UTF-8
>         unset LC_ALL
>         command xterm -fn
> -misc-fixed-medium-*-140-*-iso10646-1 -fb
>      -misc-fixed-medium-*-140-*-iso10646-1 &
>         LC_CTYPE=$old_CTYPE
>         LC_ALL=$old_ALL
> ['command' is used because i have an alias 'xterm'
> to launch xterm with
> Lithuanian fonts]
> Now we have unicode xterm.
> 
> 2. Switching through languages.
> In my .xsession file i have following settings:
>         export LC_CTYPE=lt_LT.ISO8859-13
>         export LC_COLLATE=lt_LT.ISO8859-13
>         export LC_ALL=lt_LT.ISO8859-13
>         setxkbmap -option grp:alt_shift_toggle lt &
> This binds LeftAlt + LeftShift buttons to change
> between English and
> Lithuanian languages.
> When i have unicode xterm, i simply run commands
> like 'setxkbmap ru', or
> 'setxkbmap de' or any other and use
> LeftAlt+LeftShift to switch between
> English and the language i set up.
> 
> So now we have unicode xterm and we can write there
> in different
> languages.
> 
> 3. Work.
> We can work as well from unicode console as well as
> from simple console.
> The only note - if we save file from unicode console
> that has some
> non-standard characters (i mean characters that are
> specific to some 
> languages, the ones that are not in ASCII [English
> language does not
> have these]) and later try to open this file from
> simple (non-unicode)
> console, we will not see these characters (this is
> important thing as we'll see
> later).
> This is because the file is saved encoded in UTF -
> that is ASCII
> characters are saved in one byte encoding, other
> characters are saved in
> two or more bytes.
> 
> 4. Python.
> This is a simple python script (also included as
> attachment) that should
> be saved as UTF encoded file, let's say as
> 'script1.py':
>     #!/usr/bin/python
>     print "UTF-8 & one byte font"
>     print "---------------------"
>     print "Creating unicode string 'foo'"
>     foo = unicode('asd--[russian
> letters]-[lithuanian letters]','utf-8')
>     print "String output encoded in UTF-8: print
> foo.encode('utf-8')"
>     print foo.encode('utf-8')
>     print "String output as ASCII: print foo"
>     print foo
> You can write any non-ASCII characters instead of
> [russian|lithuanian
> letters].
> If we run this file from simple (non-unicode)
> console, first line is not
> outputted correctly and python dies trying to output
> second line with
> ASCII decoding error.
> less command does not output file correctly.
> If we run this file from unicode console, first line
> is outputted
> correctly and python dies trying to output second
> line with ASCII
> decoding error.
> less command here does output file correctly.
> 
> 5. wxPython
> This is a larger python script, that i shamelessly
> copied from
> somewhere.
> It also should be saved as UTF-8 encoded file, let's
> say as
> 'script2.py'.
> It has a window, some labels, menu, status bar,
> button... :
> NOTE: you might not see characters correctly - it's
> because your window
> manager uses some not-unicode encoding. I cannot
> tell you now what\where
> to change - it's too window manager dependent,
> however 
=== message truncated ===

> ATTACHMENT part 1.2 application/octet-stream 


> ATTACHMENT part 1.3 application/octet-stream 


> ATTACHMENT part 1.4 application/octet-stream 


> ATTACHMENT part 1.5 application/octet-stream 


> ATTACHMENT part 2 application/pgp-signature 


__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnue-dev] Unicode + wxPython, Arturas Kriukovas, 2002/08/23
- Re: [Gnue-dev] Unicode + wxPython, Aditya Gilra <=
  - Re: [Gnue-dev] Unicode + wxPython, Derek Neighbors, 2002/08/26
- Re: [Gnue-dev] Unicode + wxPython, Derek Neighbors, 2002/08/26
- Re: [Gnue-dev] Unicode + wxPython, Aditya Gilra, 2002/08/23
  - Re: [Gnue-dev] Unicode + wxPython, Jan Ischebeck, 2002/08/23
    - Re: [Gnue-dev] Unicode + wxPython, Aditya Gilra, 2002/08/25
    - Re: [Gnue-dev] Unicode + wxPython, Derek Neighbors, 2002/08/26
    - Re: [Gnue-dev] Unicode + wxPython, Derek Neighbors, 2002/08/30

Prev by Date: [Gnue-dev] Unicode + wxPython
Next by Date: Re: [Gnue-dev] Unicode + wxPython
Previous by thread: [Gnue-dev] Unicode + wxPython
Next by thread: Re: [Gnue-dev] Unicode + wxPython
Index(es):
- Date
- Thread