bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in multicolumn


From: Leonard Janis Robert König
Subject: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Sat, 13 Feb 2021 21:28:49 +0100
User-agent: Evolution 3.38.4 (by Flathub.org)

On Sat, 2021-02-13 at 21:15 +0100, Erik Auerswald wrote:
> Hi,
> 
> On 13.02.21 19:29, Leonard Janis Robert König wrote:
> > 
> > first:  Thank you very much for the work, I really owe you one!
> 
> You're welcome. :-)
> 
> > On Sat, 2021-02-13 at 17:58 +0100, Erik Auerswald wrote:
> > > On 13.02.21 15:17, Erik Auerswald wrote:
> > > > On 11.02.21 20:20, Erik Auerswald wrote:
> > > > > On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis
> > > > > Robert
> > > > > König
> > > > > wrote:
> > > > > > On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
> > > > > > > On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis
> > > > > > > Robert
> > > > > > > König wrote:
> > > > > > > > On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald
> > > > > > > > wrote:
> > > > > > > > > On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard
> > > > > > > > > Janis
> > > > > > > > > Robert
> > > > > > > > > König wrote:
> > > > > > > > > > I'm sorry if I this is not a bug but to be
> > > > > > > > > > expected,
> > > > > > > > > > but I thnk
> > > > > > > > > > pr doesn't get the alignment of tabs in multicolumn
> > > > > > > > > > output
> > > > > > > > > > right.  [...]  This seems *kind* of related to
> > > > > > > > > > multi-
> > > > > > > > > > column
> > > > > > > > > > merged output, as was discussed some years ago
> > > > > > > > > > here:
> > > > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> > > > > > > > > 
> > > > > > > > > This thread contains the bug-introducing patch in
> > > > > > > > > message
> > > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
> > > > > > > > >   
> > > > > > > > > This is commit
> > > > > > > > > 553d347d3e08e00ee4f9df520b37c964c3f26e28.
> > > > > > > > > That commit removed the 'assume -e' part of the POSIX
> > > > > > > > > description
> > > > > > > > > of the -COLUMN option from GNU pr.
> > > > > > > > [...]
> > > > > I have found a fix to the problem described by you.  I am
> > > > > quite
> > > > > sure that
> > > > > this is not *correct*, but I did not find a way to make
> > > > > print_sep_string()
> > > > > account for tabs that did not break quite a few existing
> > > > > tests,
> > > > > even if
> > > > > the merged files problem from 2007 and this columnating bug
> > > > > were
> > > > > both
> > > > > fixed.  Thus I just tighten the 2007 bug fix to apply in less
> > > > > cases.
> > > > > This way all existing tests pass, and a new one pertaining to
> > > > > this bug
> > > > > report passes, too.  I do think this is in the same spirit as
> > > > > the
> > > > > "fix"
> > > > > from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).
> > > > 
> > > > I think the attached patch is a better fix than my previous
> > > > one,
> > > > because it applies the special treatment of TAB as separator
> > > > more
> > > > consistently.  It may still not be complete (the code seems
> > > > quite
> > > > convoluted to me) but I do think it improves the situation
> > > > significantly, and does not make it worse.
> > 
> > Hm, I'm not sure whether I understand this special case.  When we
> > have
> > a tab as column separator, doesn't this imply that the second
> > column is
> > starting on a position n*8, (effectively equivalent to the first
> > column), thus guaranteeing that the alignment is honored?  So, if
> > my
> 
> Whatever the reason (perhaps conforming to POSIX, perhaps other pr
> implementations doing the same), GNU pr implements a special
> treatment for TAB as column separator, and the thread from 2007
> implies that the pr from HP-UX does as well.
> 
> The POSIX spec says:
> 
>      "-s[char]
> 
>       Separate text columns by the single character char instead of
>       by the appropriate number of <space> characters (default for
>       char shall be <tab>)."
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html
> 
> So use of -s needs to always result in one separator character
> between columns.
> 
> This is implemented by GNU pr, and seemingly by pr from HP-UX, too
> (
> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> )
> 
> Of all the printable ASCII characters, only TAB results in
> interactions with "Tabification," i.e., turning TABs into spaces
> on input and spaces into TABs on output.  Thus only TAB as
> separator may require the special treatment of disabling
> "Tabification."
> 
> Omitting this special treatment resulted in the bug from 2007.
> 
> Removing the implicit "-e" and "-i" from "-NUMBER" and "-m"
> to fix the 2007 bug resulted in this bug (bug#46422), and does
> not conform to the POSIX specification nor to the GNU pr info
> documentation.
> 
> My v3 patch restricts this special treatment of "-s" to just
> the cases where it is used without specifying a separator and
> thus using the default of TAB, or when it is used with a single
> TAB ("-s$'\t'").  Thus it restricts the 2007 change from commit
> 553d347d3e08e00ee4f9df520b37c964c3f26e28 to affect only those
> use cases it should affect, instead of all multi-column use cases.
> 
> It may be possible to add some appropriate special treatment for
> TAB as separator without disabling "Tabification."  But I do not
> know how.  Just accounting for the output position change resulting
> from printing a TAB in print_sep_string() does not work, i.e.,
> breaks many of the existing tests.

This makes sense then, yes.  In theory, this'd be the best approach,
but I don't see that happening with the current code base... .  The
issue is less with enabling/disabling tabification, since a tab
character should, as I wrote, always align the second column -- but
with the rather weird behavior of pr sometimes "miscounting" that I
outlined a few mails earlier, after which I gave up fixing :-)

> 
> > [...]
> > That being said, I don't see this exact distinction reflected in
> > the
> > code, so perhaps I just misunderstood.
> 
> Disabling "Tabification" only when "-s" was active is missing.  That
> resulted in the 2007 bug.  Making the needed special treatment always
> used fixed the 2007 bug, but broke your use case.
> 
> That some special treatment is needed and intended can be gleaned
> from the following comment (with line numbers from pr.c in the
> current master branch @ 2de30c7350a77b091afa1eb284acdf082c0f6aa5):
> 
> 1031  /* It's rather pointless to define a TAB separator with column
> 1032     alignment */
> 
> My patch adds the special treatment, since it works both for the 2007
> bug and this bug (bug#46422).
> 
> > > It seems to me as if "untabify_input = true;" should be re-
> > > introduced
> > > in one additional place to fix the regression from commit
> > > 553d347,
> > > please see the attached patch version 3.
> > > 
> > > > I'd like to ask the GNU Coreutils maintainers to consider
> > > > merging
> > > > the attached patch.
> > > 
> > > The latest version, i.e., v3 for now.
> > 
> > I can only second this, with the patch my rather obscure (and
> > complex)
> > use case of printing thousands of lines of code works properly now!
> 
> Thanks for testing!

Thanks all to you

~leo







reply via email to

[Prev in Thread] Current Thread [Next in Thread]