[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bug: UTF-8 expansion results in extra characters
From: |
L A Walsh |
Subject: |
Bug: UTF-8 expansion results in extra characters |
Date: |
Mon, 06 Mar 2017 04:50:09 -0800 |
User-agent: |
Thunderbird |
I didn't see that this was caught and wasn't sure if
it was already covered in what I previously
posted this problem in a follow-up to a similar problem.
It may be the same bug, but I wasn't sure.
If I past the text in quotes into bash, bash tosses in
an extra character as evidenced by 'wc':
echo 'あa a '|wc -m
6
There should only be 5 characters.
If I cut/paste her text (in quotes) directly into 'wc -m' (so it
doesn't go through bash, but is taken directly from 'wc' on its
stdin), then I get '7' (2 extra chars for the quotes):
wc -m
'あa a '7
^^ pressed Ctl-D twice to not end line w/another char (like LF).
Maybe that examples allows you to duplicate the problem?
I don't think your development setup allows for cut/paste
from an editor or mail-window that accurately copies the
characters. Something in your text display+copy+insert
doesn't seem to copy the actual characters, but something
that looks similar.
Ex, copying text like this sometimes misses a space on the end:
あa a
and
Copying from an editor or line that has 'tabs' in it, doesn't
seem to preserve the tabs, so you can't cut+paste script from
an editor into bash without it being becoming disintegrous.
I think its a side-effect of the development tools you are
using (I don't know what they are, but its also a guess, based
on sparse data where specific characters aren't mapped the same
and don't produce the same output in regards to what was
actually presented when someone tries to describe a problem.
It doesn't make a difference in most cases, but in a few, like
the ones mentioned above.... ;-(
Anyway, like I was trying to say, it's a, perhaps, inaccurate
deduction based scant evidence... oh well...