bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature suggestion: ability to expand a set of elements of an array


From: Martin D Kealey
Subject: Re: feature suggestion: ability to expand a set of elements of an array or characters of a scalar, given their indices
Date: Thu, 27 Jun 2024 15:50:04 +1200

On Thu, 27 Jun 2024 at 06:30, Chet Ramey <chet.ramey@case.edu> wrote:

> On 6/26/24 2:18 PM, Zachary Santer wrote:
>
> >> On Tue, Jun 11, 2024, 12:49 PM Zachary Santer <zsanter@gmail.com>
> wrote:
> >>>
> >>> $ array=( zero one two three four five six )
> >>> $ printf '%s\n' "${array[@]( 1 5 )}"
> >>> one
> >>> five
> >
> > This is different functionality.
>
> Equivalent to printf '%s\n' "${array[1}" "${array[5]}". The innovation Zach
> wants is to have a single word expansion to do this.
>

Surely the point is to handle the case where we don't know in advance how
many elements will be wanted.

In effect, it would mimic Perl's @array[@indeces] and @hash{@keys}
functionality, where we supply an arbitrary list of indices or subscripts,
and get back the corresponding values.

Using the proposed syntax we would be able to write:

array=( '' one two three four five six )
indices=( 1 0 6 7 5 )
printf '%s, ' "${array[@]( "${indices[@]}" )}"
printf end\\n

to get

one, , six, five, end

(Note that there are only 4 words resulting from the expansion, since there
is no element '7' in 'array'. Unfortunately - and unlike Perl - Bash
doesn't have "undef", so we have to make do with getting back fewer values
in the resulting list if some requested array elements are unset, or if
some indices exceed the size of the array.)

I agree that this syntax looks ugly, but since [@] and [*] don't function
as subscripts, it's tricky to improve on.

My suggestion would be to generalise, turning [@] and [*] into fixed
syntactic tokens that can be combined with "ordinary" subscripting, or left
without subscripts to retain their current meanings:

  "${array[*][index]}"   (a long-hand version of "${array[index]}")
  "${array[@][index]}"   (gives "${array[index]}" if it exists, but is
completely elided if it doesn't - similar to how "$@" can result in no
words, not an empty word)

Or maybe we can have some mechanism so that '@[' doesn't get treated as the
start of an '@' modifier; and we could use:

  "${array*[index]}"
  "${array@[index]}"

(For the rest of this discussion I'm just going to mention the '@' form;
please infer the corresponding '*' form.)

After doing this, I would start working on syntaxes for list-slicing in
various ways, perhaps:

  "${array@[[ list of indices ]]}"

"list of indices" is an ordinary word list; it's split up at unquoted $IFS,
then each of the resulting words is used as a subscript.

I would also revamp how numeric range slices are done (*1):

  "${array@[ start_index : count ]}"
  "${array@[ start_index ... end_index ]}

For all of these expansions, where each subscripted element of the array
exists, it provides a 'word' in the resulting expansion, and where it
doesn't exist, no word is provided.
With '@', the list is kept as separate words despite being quoted; with
'*', the resulting list is joined in the traditional manner.

But I would look even further ahead...

Firstly, I acknowledge Bash has had to comply with historical expectations,
POSIX requirements, and precedent set by ksh.
However, having an array subscript expansion change its behaviour based on
whether or not a "declare -A" statement has been executed, possibly in a
different function or even a different file; that is - by modern standards
at least - a rather poor language design choice. (*1)

I'm talking about whether the subscript undergoes arithmetic expansion.

So I also propose that we should follow Perl in having separate array
indexing and map subscripting syntaxes, so that it's no longer necessary to
use "declare -A", and more to the point, no longer necessary to go look for
it while reading someone else's code. (*2)

(I'm about to suggest some syntax, but the exact form isn't really my main
point; what's really important is that you would be able to read a $
expansion and tell at a glance whether the subscript will be subject to
arithmetic expansion. (*3))

As a secondary issue, deferring *parsing* of arithmetic expressions (until
the containing command is executed) obscures syntax errors, delays their
reporting, and degrades performance. I would change that, either globally
when « shopt -s early_math_parse » is in effect, or in recognized contexts
like this new array indexing syntax. (*4)

When using the new array indexing syntax, the index would be parsed as an
arithmetic expansion while the surrounding commands are being parsed (*5)
(and thus ALWAYS evaluated as a numeric expression), and when using the map
subscripting syntax it would NEVER be subject to arithmetic expansion.

One possible syntax would be:

  "${assoc_array@{key}}"
  "${assoc_array@{{list of keys}}}"

which would differ from the previous in that 'key' and 'list of keys' would
be guaranteed NOT to undergo numeric expansion; importantly, this can be
determined at parse time without needing to have executed a 'declare -A'
statement. (This becomes more important if we look to eventually
implementing lexically scoped variables some time in the future.)

If you really can't stomach using {} around subscripts, there are other
ways to distinguish them, such as [numeric+expression+without+quotes] vs
["map key in quotes"], but that would make the rule around non-deferral of
expression parsing even harder for people to follow.

Apart from anything else, either of these approaches would solve the
conundrum of handling '@' and '*' and '' (empty) as subscripts; simply
write "${array*[@]}" or "${array@{*}}" or "${array@{}}"

Lastly, I would also consider:
A. having an explicit 'index back from the end' syntax, such as
[#-reverse_index], rather than switching based on the sign of the index
expression.
B. making the '@' optional in places where it doesn't introduce ambiguity.

-Martin

*1: Unlike ${array[@]:start:count}, these numeric range forms give primacy
to indices as addresses for particular entries, rather than to the array
being primarily a contiguous "list" whose indices are only required to be
monotonic, not consecutive: so if any entries in the range are unset, then
you get fewer words in the resulting list. The ":" and "..." are still part
of the expansion syntax, not part of the evaluation of a numeric
expression, so « var=1:4 ${array[$var]} » would be erroneous.

*2: This isn't the only place where dynamic scope has just turned other
suboptimal design choices into terrible ones; it's outright hostile to
anyone tasked with managing a large shell codebase written by other people.
But the dynamic scope of "declare" (and its siblings) deserves special
mention, because it's not typically limited to "just once at the top of the
program" when it's especially useful inside functions. So even if you can
SEE a declare statement, you still have to check whether it's been EXECUTED
before the expansion occurs.

*3: Ironically this is even more important in the Shell than it is in Perl,
since the shell cannot infer which operation is required based on the data
type of the subscript - in short, the shell cannot distinguish
numeric+expression from "string+expression".

*4: This would effectively define a new "parse-time numeric context", that
would only apply in places where that context can be established at parse
time, unless « shopt -s early_math_parse » was in effect. (*6) In such a
context, the handling of numeric expressions would change, so that the
expressions « SIX * NINE » and « $SIX * $NINE » would behave identically;
so when given « SIX=1+5 NINE=8+1 », they would both produce "54" rather
than "42".

*5: Just to be clear, I wouldn't immediate implement the early parsing;
rather what I'd do is forbid expressions that would be impossible to parse
without expanding a variable, so that « op='+' » then « base $op offset »
would be disallowed in this new "parse-time numeric context".

While "parse and evaluate all at once" makes the implementation code
slightly smaller, it's a dubious saving: intermingling parsing and
evaluation makes the code *more* complex. You don't even need a complex
tree structure to represent the parsed expression; a list of RPN
(stack-based) operations can be stored as bytes in what's otherwise a valid
C string, and then actually runs faster because of CPU cache locality.
Support for short-circuit evaluation can be provided as "branch"
instructions.

*6: In addition to « shopt -s early_math_parse », it would also make sense
to be able to declare a variable as "holding an arithmetic expression", in
the same way that one can currently be declared as "holding an integer".
The point being, a malformed expression is reported when it's assigned to
the variable, rather than later on when the variable is expanded.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]