Bug: UTF-8 expansion results in extra characters

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug: UTF-8 expansion results in extra characters

From:	L A Walsh
Subject:	Bug: UTF-8 expansion results in extra characters
Date:	Mon, 06 Mar 2017 04:50:09 -0800
User-agent:	Thunderbird

I didn't see that this was caught and wasn't sure if
it was already covered in what I previously
posted this problem in a follow-up to a similar problem.

It may be the same bug, but I wasn't sure.

If I past the text in quotes into bash, bash tosses in
an extra character as evidenced by 'wc':


echo 'あa a '|wc -m
6

There should only be 5 characters.

If I cut/paste her text (in quotes) directly into 'wc -m' (so it
doesn't go through bash, but is taken directly from 'wc' on its
stdin), then I get '7' (2 extra chars for the quotes):

 wc -m

'あa a '7
      ^^ pressed Ctl-D twice to not end line w/another char (like LF).

Maybe that examples allows you to duplicate the problem?



I don't think your development setup allows for cut/paste
from an editor or mail-window that accurately copies the
characters.   Something in your text display+copy+insert
doesn't seem to copy the actual characters, but something
that looks similar.

Ex, copying text like this sometimes misses a space on the end:

あa a

and

Copying from an editor or line that has 'tabs' in it, doesn't
seem to preserve the tabs, so you can't cut+paste script from
an editor into bash without it being becoming disintegrous.

I think its a side-effect of the development tools you are
using (I don't know what they are, but its also a guess, based
on sparse data where specific characters aren't mapped the same
and don't produce the same output in regards to what was
actually presented when someone tries to describe a problem.

It doesn't make a difference in most cases, but in a few, like
the ones mentioned above.... ;-(

Anyway, like I was trying to say, it's a, perhaps, inaccurate
deduction based scant evidence...  oh well...

[Prev in Thread]

Current Thread

[Next in Thread]

Full-width characters can result in an unwanted space in filenames, Misaki, 2017/03/04
- Re: Full-width characters can result in an unwanted space in filenames, Chet Ramey, 2017/03/04
- Re: Full-width characters can result in an unwanted space in filenames, Chet Ramey, 2017/03/04
  - Re: Full-width characters can result in an unwanted space in filenames, L A Walsh, 2017/03/05
    - Re: Full-width characters can result in an unwanted space in filenames, Misaki, 2017/03/06
  - Bug: UTF-8 expansion results in extra characters, L A Walsh <=
    - Re: Bug: UTF-8 expansion results in extra characters, Eduardo Bustamante, 2017/03/06
    - Re: Bug: UTF-8 expansion results in extra characters, L A Walsh, 2017/03/06
  - Re: Full-width characters can result in an unwanted space in filenames, Misaki, 2017/03/06
    - Re: Full-width characters can result in an unwanted space in filenames, Chet Ramey, 2017/03/06
- Re: Full-width characters can result in an unwanted space in filenames, Chet Ramey, 2017/03/07

Prev by Date: Problem: completion changes user-relative path into absolute path breaking relative-path usages
Next by Date: Re: Pipe using subshells, variables not saved
Previous by thread: Re: Full-width characters can result in an unwanted space in filenames
Next by thread: Re: Bug: UTF-8 expansion results in extra characters
Index(es):
- Date
- Thread