octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: char type in Octave


From: mmuetzel
Subject: Re: char type in Octave
Date: Thu, 24 May 2018 08:29:52 -0700 (MST)

TL;DR: Let's stay with UTF-8.

Longer version:
I had a (not so) quick look at the code and the amount of effort for
switching our char representation seems unreasonably high.
If we kept our current 8-bit representation, the main "issue" from a user's
point of view might be with indexing: A user might suspect that a char
vector with N characters would always have N elements and indexing the n-th
element would return the n-th character.
But even if we moved from a 8-bit representation of characters to a 16-bit
representation, we wouldn't be able to represent characters from higher
Unicode plains with one char element. Even if we went one step further and
used a 32-bit representation, there are character modifiers (e.g. accents).
So one character could always be represented by several basic elements
(8-bit, 16-bit, or 32-bit).
Thus, indexing into character arrays will always be problematic in some
cases. No matter which UTF-flavour we would be using.
I am seconding Rik's and Michael's reasoning and would like to vote for
staying with 8-bit chars.

However, I am still in favor of consistently using and supporting Unicode
(UTF-8) wherever possible.
We could facilitate the possible issue with indexing by providing dedicated
functions. These could help with indexing into char arrays by identifying
elements that belong to one character.
Something along the lines of:
str = 'aäbc'
str_idx = u8_char_idx(str)

which could result in:
str_idx = [ 1 2 2 3 4 ]

Indexing the n-th character would be as easy as:
str(str_idx==n)

That also leads back to my initial doubt of whether "element-wise" operators
on character arrays like isupper or islower should return an array of the
same size as the input. IMHO they should.

Markus



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]