mldonkey-bugs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Mldonkey-bugs] [bug #9102] [CJK] File names in Downloads and search res


From: su_blanc
Subject: [Mldonkey-bugs] [bug #9102] [CJK] File names in Downloads and search results Web UI page
Date: Mon, 2 May 2005 03:56:07 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040910

Follow-up Comment #13, bug #9102 (project mldonkey):

Let me explain how the things work.
At mlnet startup, the core try to determine how the strings will be 
converted and this based on the target machine settings.
We read 2 values :
- 'locale' of the target machine
- 'language' of the target machine
The most important to understand is that the variable 'language' does build a
list of char mappings that will be used further to convert the strings to
UTF-8 or to LOCALE. Maybe this part is not properly implemented.
in the [CJK] case, mlnet will use the following char mapping :
  [
   [CP949; UHC];
   [CP1361; JOHAB];
   [EUC_KR; EUCKR; CSEUCKR];
   [ISO_2022_KR; CSISO2022KR];
   [ISO_IR_149; KOREAN; KSC_5601; KS_C_5601_1987; KS_C_5601_1989;
CSKSC56011987];
  ]

*only* if the variable 'language' is "KO".
Yes the Core consider that if your machine language is "EN" (English), you
will not use CP949 strings. Nevertheless the Core allows you to use Unicode
char mappings independently of the language of the target machine.
There is no facilities to override this mechanism at the moment (option
parameter), and I think it is the weakest point...

Side note: If you search for a file from any interface that is UTF-8 encoded
(web, GTK2 ...), the results returned *are* UTF-8 encoded (bitwise
comparaison!).

when mlnet converts the string to LOCALE, first we convert it to UTF-8 using
the list of char mappings.
1) for each list included in the char mapping list, we take the first element
of the list (ex: [CP949; UHC] we take CP949 because UHC is an alias of CP949)
and we try to convert the complete string to LOCALE from the current char
mapping (CP949, CP1361, EUC_KR, ISO_2022_KR, ISO_IR_149).
2) if we fail doing so, we try to convert the complete string char by char to
UFT-8 from 'locale'. If the char is not convertible we replace it by a '_'.
3) then we convert the newly created UTF-8 string to 'locale'.

As a consequence, if the 'language' and 'locale' variables are set properly
on the target machine, there should'nt be any issue (ideal setting would be
KO_UTF-8).

@JongAm Park:
Can you put here what is printed by the core at startup. 
Because I'm surprised by the behaviour of your Core.
something like :
Current locale of the target machine is UTF-8
Current language of the target machine is DE
List of charmap used to convert the strings:
Use encoding CP1252
Use encoding ISO-8859-15
.....

@Amorphous:
you can't determine what string encoding is using a peer. The only thing you
can do is to iter through the complete list of char mapping until the core
finds one from which you are able to convert the complete string.
Drawbacks:
1) you have no guarantee that this the good char mapping because several use
the same ranges to encode strings
2) you create a load of the core (a lot of strings to parse) for a purpose
which IMHO is not legitimate.
Thus IMHO it has to be considered as useless.

the idea of "character_encoding_default" looks good for searches because
everybody knows that windows users are still a majority and thus can allow
any user (CP1252 versus ISO-8859-15, ...) to 'compete' equally. But will all
mldonkey users understand how to use this feature?


    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?func=detailitem&item_id=9102>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]