Re: Handle encoding of Octave strings

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handle encoding of Octave strings

From:	mmuetzel
Subject:	Re: Handle encoding of Octave strings
Date:	Wed, 16 May 2018 13:10:24 -0700 (MST)

I would like to make "islower" and "isupper" Unicode aware.
At the moment, I see the following:
octave:1> islower ('ä')
ans =

  0  0

Since we are using UTF-8 for character arrays, the single lower-case letter
"ä" is represented by two bytes:
octave:2> size ('ä')
ans =

   1   2

Should islower('ä') return true(1,2) or true(1,1)? I am tending towards the
former.

This leads to the bigger question: How should indexing on (multi-byte)
character arrays work? At the moment, a user has to be somewhat aware of the
fact that Octave uses UTF-8:
octave:3> str = "aäbc"
str = aäbc
octave:4> str(1)
ans = a
octave:5> str(2)
ans = �
octave:6> str(3)
ans = �
octave:7> str(4)
ans = b
octave:8> str(2:3)
ans = ä

To index the second character in the string, the user has to access the
second and(!) third element. The third character is indexed with the fourth
element and so forth.
Is this OK?

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Handle encoding of Octave strings, mmuetzel, 2018/05/05
- Re: Handle encoding of Octave strings, mmuetzel <=
  - Re: Handle encoding of Octave strings, John W. Eaton, 2018/05/16
    - Re: Handle encoding of Octave strings, Nicholas Jankowski, 2018/05/16
    - Re: Handle encoding of Octave strings, mmuetzel, 2018/05/17
    - Re: Handle encoding of Octave strings, mmuetzel, 2018/05/17

Prev by Date: Re: [GSoC] Blog syndication
Next by Date: Re: Handle encoding of Octave strings
Previous by thread: Re: Handle encoding of Octave strings
Next by thread: Re: Handle encoding of Octave strings
Index(es):
- Date
- Thread