[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: IFS delimiter field separation issues
From: |
Lawrence Velázquez |
Subject: |
Re: IFS delimiter field separation issues |
Date: |
Wed, 08 Jan 2025 18:04:19 -0500 |
On Wed, Jan 8, 2025, at 1:25 PM, Jeff Ketchum wrote:
> I ran into a strange bug using newer versions of bash, I haven't isolated
> it to a specific release.
It looks like 5.0 introduced the problem.
> In using unicode group separator character U 241D,
> https://www.compart.com/en/unicode/U+241D, 0x241D
> I set the IFS to this unicode, and have U+241E and U+241F characters in the
> data.
> When assigning to an array, and using for var in "${array[@]}"...
> it ends up splitting the data at unexpected locations.
>
> I don't get this behaviour when the array isn't quoted
>
> [...]
>
> I wrote a script that will easily reproduce this:
Here's a version that I think is more legible:
$ cat /tmp/foo.bash
LC_ALL=en_US.UTF-8
gs=$'\u241D'
rs=$'\u241E'
us=$'\u241F'
data="a${gs}b${rs}c${us}d"
IFS=$gs
# Original variable
printf '"$data" - %q\n' "$data"
printf ' $data - %q\n' $data
echo
# Positional parameters
set -- $data
printf '"$@" - %q\n' "$@"
printf ' $@ - %q\n' $@
echo
# Multi-element array
arr1=($data)
declare -p arr1
printf '"${arr1[@]}" - %q\n' "${arr1[@]}"
printf ' ${arr1[@]} - %q\n' ${arr1[@]}
echo
# Single-element array
arr2=("$data")
declare -p arr2
printf '"${arr2[@]}" - %q\n' "${arr2[@]}"
printf ' ${arr2[@]} - %q\n' ${arr2[@]}
$ ~/build/bash-5.3-testing/bash /tmp/foo.bash
"$data" - a␝b␞c␟d
$data - a
$data - b␞c␟d
"$@" - a
"$@" - b␞c␟d
$@ - a
$@ - b␞c␟d
declare -a arr1=([0]="a" [1]="b␞c␟d")
"${arr1[@]}" - a
"${arr1[@]}" - $'b\342'
"${arr1[@]}" - $'\236c\342'
"${arr1[@]}" - $'\237d'
${arr1[@]} - a
${arr1[@]} - b␞c␟d
declare -a arr2=([0]="a␝b␞c␟d")
"${arr2[@]}" - $'a\342'
"${arr2[@]}" - ''
"${arr2[@]}" - $'b\342'
"${arr2[@]}" - $'\236c\342'
"${arr2[@]}" - $'\237d'
${arr2[@]} - a
${arr2[@]} - b␞c␟d
It's interesting that "$@" works fine, while "${arr[@]}" doesn't.
--
vq