[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[sdx-users] UTF-8 and ISO8859-1
From: |
Ivan Kozlov |
Subject: |
[sdx-users] UTF-8 and ISO8859-1 |
Date: |
Sat, 17 Jan 2004 23:21:22 +0300 |
Hello sdx-users,
In SDX2.1 I had problem of searching in russian (I use Cybertheses
that based on SDX). As I suppose, indexing is done normally - if i
search english(french) word in text - the page shows normally with
russian letters. But when I try to search some russian "Иван" word
it replaced with someting like "Иван". May
be the problem is in transformation between ISO-8859-1 and UTF-8.
Martin Sevigny said that SDX 2.2 is fully build on UTF-8 but i still
see iso8859-1 and have same problem.
Somewhere on internet found that it is common mistake of european programmers
:)
And to fix it I should change "StringBuffer" to "ByteArrayOutputStream":
------------------
InputStream is = ..;
int b;
StringBuffer sb = new StringBuffer();
while( (b=is.read())!=-1 )
{
sb.append( (char)b ); // this is WRONG
}
String s = sb.toString();
--------------------
Change to...
--------------------
InputStream is = ..;
int b;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while( (b=is.read())!=-1 )
{
baos.write( b );
}
String s = sb.toString();
--------------------
And that should work with all unicode table... (I mean with all
languages)
--
Best regards,
Kozlov Ivan
mailto:address@hidden
- [sdx-users] UTF-8 and ISO8859-1,
Ivan Kozlov <=