gnue-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnue-dev] Re: [gnue-discuss] utf-8?


From: Dmitry
Subject: [Gnue-dev] Re: [gnue-discuss] utf-8?
Date: Wed, 27 Mar 2002 23:28:32 +0300 (MSK)

On Thu, 7 Feb 2002, Aditya Gilra wrote:

> ||Shriharih||
>
> God-Remembrance.
>
> --- Jens Mьller <address@hidden> wrote:
> > Aditya Gilra <address@hidden> writes:
> >
> > > 1) In postgresql all values are in utf-8
> > > 2) In logical UI, I keep all values in unicode
> >
> > What means "Unicode" here? UTF-16?
>
> I meant the unicode strings in python2.1. I suppose
> python stores it as ucs-16.
>
> I have a few points to note here -
>
> 1. Please do not use the str() fn anywhere. It cannot
> handle unicode strings having chars beyond ascii i.e.
> > 127.

May be it would be more dependable someday to have all string represented
in unicode and use specific functions to handle them but seems that str()
also _does handle_ strings with chars beyond ascii if desired encoding is
provided via setdefaultencoding()

Bad thing that function is not available at run time.
With current forms codebase defaultencoding might be set via:
a. manually changes in PYTHONPATH/site.py
b. adding custom module named sitecustomize

IMHO wxPython uses information from getdefaultencoding() to initialize its
encoding system. Can't prove it as GetDefaultEncoding() is not implemented
in wxPython.

Long time ago (I think it was in gnue&python1.5.2 age) I was able to run
my customized (non-ascii) forms only after adding wxFONTENCODING_KOI8 or
wxFONTENCODING_CP1251 (under windows) to wx.SetFont calls like Derek
discussed few days ago:

widget.SetFont(wxFont(int(GConfig.get('pointSize')),wxMODERN,wxNORMA
<dneighbo> ,wxNORMAL,encoding=wxFONTENCODING_UTF8))
<dneighbo> would be
<dneighbo>
widget.SetFont(wxFont(int(GConfig.get('pointSize')),wxMODERN,wxNORMA
<dneighbo> ,wxNORMAL,encoding=wxFONTENCODING_SYSTEM))

Then behaviour is changed and now it works without modification when
you first provide that getdefaultencoding() gives non
'ascii' in output.

Is there a chance that in wxPython wxFONTENCODING_SYSTEM get its setting
according to this?

Some exceipt from http://diveintopython.org/kgp_unicode.html:

[2]  setdefaultencoding  function sets, well, the default encoding. This
is the encoding scheme that Python will try to use whenever it needs to
auto-coerce a unicode string into a regular string.

Example 5.16. Effects of setting the default encoding

>>> import sys
>>> sys.getdefaultencoding()  [1]
'iso-8859-1'
>>> s = u'La Pe\xf1a'
>>> print s  [2]
La Pen~a
[1] This example assumes that you have made the changes listed in the
previous example to your sitecustomize.py file, and restarted Python. If
your default encoding still says 'ascii', you didn't set up your
sitecustomize.py properly, or you didn't restart Python. The default
encoding can only be changed during Python startup; you can't change it
later. (Due to some wacky programming tricks that I won't get into right
now, you can't even call sys.setdefaultencoding after Python has started
up. Dig into site.py and search for "setdefaultencoding" to find out how.)
[2] Now that the default encoding scheme includes all the characters we
use in our string, Python has no problem auto-coercing the string and
printing it.

Found out that other projects use the same recommendation to provide
setting of defaultencoding to non-ascii. Look for example to:
http://www.livinglogic.de/Python/xist/Installation.html



>
> 2. GNUe UI is in two parts logical and physical. I now
> talk about wxPython. All keystrokes are captured and
> processed first in logical interface then reflected in
> physical, even right arrow, etc.
>
> Hence it is imperative that all strings be stored
> internally (within the program) as python unicode
> strings otherwise the cursor movements will not be
> proper.
>
> 3. Allow the user to specify what encoding he wants
> for his database and UI.
>
> 4. For me, I use wxpython (and hence wxGTK) which
> accepts utf-8 to display Hindi/Devanagari.
>
> 5. For postgresql I use utf-8 and not unicode(ucs-16)
> because I will have to recompile it on RHL7.1 to use
> ucs-16. Postgresql and most databases for that matter
> can store unicode as utf-8 without modification.
>
> 6. Hence, in db driver & UI driver use the functions
> below to convert utf-8<->unicode<->utf-8.
>
> codecs.utf_8_encode(str)
> unicode(str,"utf-8")
>
> Once again, I iterate please do not use str()
> anywhere, use
> codecs.utf_8_encode()
>
> Best Regards and God Remembrance,
> Aditya Gilra.



Exceipt from http://www.lemburg.com/files/python/unicode-proposal.txt:

If not otherwise defined or set, the <default encoding> defaults to
'ascii'. This encoding is also the startup default of Python (and in
effect before site.py is executed).

Note that the default site.py startup module contains disabled
optional code which can set the <default encoding> according to the
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
encoding defined by the current locale. The locale module is used to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
extract the encoding from the locale default settings defined by the
OS environment (see locale.py). If the encoding cannot be determined,
is unkown or unsupported, the code defaults to setting the <default
encoding> to 'ascii'. To enable this code, edit the site.py file or
place the appropriate code into the sitecustomize.py module of your
Python installation.

Question for James:
Can we try to build at least binary snapshot with default encoding set
according to the current locale?
I did not find how to do required settings after the installation.

I'm ready to continue all this testings.

Regards,
Dmitry





reply via email to

[Prev in Thread] Current Thread [Next in Thread]