octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Handle encoding of Octave strings


From: mmuetzel
Subject: Re: Handle encoding of Octave strings
Date: Wed, 16 May 2018 13:10:24 -0700 (MST)

I would like to make "islower" and "isupper" Unicode aware.
At the moment, I see the following:
octave:1> islower ('ä')
ans =

  0  0

Since we are using UTF-8 for character arrays, the single lower-case letter
"ä" is represented by two bytes:
octave:2> size ('ä')
ans =

   1   2

Should islower('ä') return true(1,2) or true(1,1)? I am tending towards the
former.

This leads to the bigger question: How should indexing on (multi-byte)
character arrays work? At the moment, a user has to be somewhat aware of the
fact that Octave uses UTF-8:
octave:3> str = "aäbc"
str = aäbc
octave:4> str(1)
ans = a
octave:5> str(2)
ans = �
octave:6> str(3)
ans = �
octave:7> str(4)
ans = b
octave:8> str(2:3)
ans = ä

To index the second character in the string, the user has to access the
second and(!) third element. The third character is indexed with the fourth
element and so forth.
Is this OK?

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]