[sdx-users] UTF-8 and ISO8859-1

sdx-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[sdx-users] UTF-8 and ISO8859-1

From:	Ivan Kozlov
Subject:	[sdx-users] UTF-8 and ISO8859-1
Date:	Sat, 17 Jan 2004 23:21:22 +0300

Hello sdx-users,

  In SDX2.1 I had problem of searching in russian (I use Cybertheses
  that based on SDX). As I suppose, indexing is done normally - if i
  search english(french) word in text - the page shows normally with
  russian letters. But when I try to search some russian "Иван" word
  it replaced with someting like "&#1048;&#1074;&#1072;&#1085;". May
  be the problem is in transformation between ISO-8859-1 and UTF-8.

  Martin Sevigny said that SDX 2.2 is fully build on UTF-8 but i still
  see iso8859-1 and have same problem.

Somewhere on internet found that it is common mistake of european programmers
:)
And to fix it I should change "StringBuffer" to "ByteArrayOutputStream": 
------------------
InputStream is = ..;

int b;
StringBuffer sb = new StringBuffer();

while( (b=is.read())!=-1 )
{
sb.append( (char)b ); // this is WRONG
}
String s = sb.toString();
--------------------
Change to...
--------------------
InputStream is = ..;

int b;
ByteArrayOutputStream baos = new ByteArrayOutputStream();

while( (b=is.read())!=-1 )
{
baos.write( b );
}
String s = sb.toString();
--------------------

And that should work with all unicode table... (I mean with all
languages)
  

-- 
Best regards,
Kozlov Ivan
mailto:address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

[sdx-users] UTF-8 and ISO8859-1, Ivan Kozlov <=
- Re: [sdx-users] UTF-8 and ISO8859-1, Pierrick Brihaye, 2004/01/17

Prev by Date: Re: RE : [sdx-users] Could not install SDX2.2beta
Next by Date: Re: [sdx-users] UTF-8 and ISO8859-1
Previous by thread: Re: RE : [sdx-users] Could not install SDX2.2beta
Next by thread: Re: [sdx-users] UTF-8 and ISO8859-1
Index(es):
- Date
- Thread