sdx-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[sdx-users] UTF-8 and ISO8859-1


From: Ivan Kozlov
Subject: [sdx-users] UTF-8 and ISO8859-1
Date: Sat, 17 Jan 2004 23:21:22 +0300

Hello sdx-users,

  In SDX2.1 I had problem of searching in russian (I use Cybertheses
  that based on SDX). As I suppose, indexing is done normally - if i
  search english(french) word in text - the page shows normally with
  russian letters. But when I try to search some russian "Иван" word
  it replaced with someting like "Иван". May
  be the problem is in transformation between ISO-8859-1 and UTF-8.

  Martin Sevigny said that SDX 2.2 is fully build on UTF-8 but i still
  see iso8859-1 and have same problem.

Somewhere on internet found that it is common mistake of european programmers
:)
And to fix it I should change "StringBuffer" to "ByteArrayOutputStream": 
------------------
InputStream is = ..;

int b;
StringBuffer sb = new StringBuffer();

while( (b=is.read())!=-1 )
{
sb.append( (char)b ); // this is WRONG
}
String s = sb.toString();
--------------------
Change to...
--------------------
InputStream is = ..;

int b;
ByteArrayOutputStream baos = new ByteArrayOutputStream();

while( (b=is.read())!=-1 )
{
baos.write( b );
}
String s = sb.toString();
--------------------

And that should work with all unicode table... (I mean with all
languages)
  

-- 
Best regards,
Kozlov Ivan
mailto:address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]