groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Inputfile_caseencapsulation --with-unpack (and --with-zlib) supp


From: Steffen Nurpmeso
Subject: [Groff] Inputfile_caseencapsulation --with-unpack (and --with-zlib) support
Date: Sun, 27 Jul 2014 01:43:07 +0200
User-agent: s-nail v14.7.4-3-g32d76ea

Hello Werner and the list,

it is a bit hilarious to post it with only a very shallow review
given how much changes just happened today, but for one i think it
won't pass without further comments anyway and then my restricted
groff use cases can't seem to break it; i'd wish i had a test
document which teases out the maximum of all preprocessors etc.,
maybe someone can help out?  That would be nice.

The attachment is a MBOX of git(1) changesets which can be applied
via '$ git am MBOX', or simply '|git am' the attachment if your
MUA supports that.  To do so on a temporary test branch
'$ git checkout -B testbranch origin/master' first.  Then you can
not only throw anything away via '$ git branch -D testbranch' but
you can also see a single diff with '$ git diff origin/master..'.

Changesets:
-----------

  075e75f contrib/gideal/Makefile.sub: fix Bernd (Public Domain)
  61e9cd2 src/include/lib.h: add some infrastructure (Public Domain)
  d154f86 Encapsulate searchpath:: FILE*'s in new class file_case (Public 
Domain)

This is the plain encapsulation only, without any decompression
support.

  ea111b9 Add GROFF_UNPACK_CHECK m4++ / --with-unpack=XY (Public Domain)

New option --with{,out}-unpack{=gz,bz2,xz}.  The default is
--with-unpack which in turn enables all of gz,bz2,xz which is
possibly not desirable.  I think a plain default of gz would be
more useful given how many stat(2)s are needed otherwise.

  6ed7720 file_case: support transparent file decompression (Public Domain)..

..via external decompressor commands, i.e., popen(3), using
a temporary file shall the callee need a seekable stream.
This also extends src/devices/gropdf/gropdf.pl accordingly.

So with this one can say '$ xy troff.1' and '$ xy troff.1.gz' and
it'll find `troff.1.gz' and display it correctly (i.e., different
to the draft we also look wether the path already includes an
extension, then tries the plain file, then tries to auto-expand
with compressor extensions).

  f605d82 file_case: add read interface, extend abstraction (Public Domain)..

And indeed it wasn't that hard to encapsulate all I/O activity
(only input, to be more exact) in class file_case!

  5c2bf98 Add GROFF_ZLIB_CHECK m4++ / --with-zlib=DIR (Public Domain)

If --with-unpack included gz support then --with-zlib can be used
to hook in zlib.

  f6ad3b7 file_case: implement direct HAVE_ZLIB layer (Public Domain)

Simple version, meaning that the I/O functions use if..else to
choose in between STD I/O and zlib I/O.  A further abstraction,
but different to what i've said last time not public and via
virtual C++ classes (it's a program not a library in the end) but
only via a file_case.cpp internal function pointer interface could
be used instead.  A TODO note on that in there.

  3f3f721 CXX peace: src/preproc/eqn/pile.cpp (Public Domain)
  d73d215 CXX peace: src/roff/troff/input.cpp (Public Domain)
  2668aeb CXX peace: src/preproc/grn/hgraph.cpp (Public Domain)

Silence.

  1226fbb Preproc: use file_case::muxer() for ARGV: eqn/ (Public Domain)
  3409988 Preproc: use file_case::muxer() for ARGV: grn/ (Public Domain)
  906e29b Preproc: use file_case::muxer() for ARGV: html/ (Public Domain)
  deff27c Preproc: use file_case::muxer() for ARGV: pic/ (Public Domain)
  b62209e Preproc: use file_case::muxer() for ARGV: preconv/ (Public Domain)
  09565d1 Preproc: use file_case::muxer() for ARGV: refer/ (Public Domain)
  69ea789 Preproc: use file_case::muxer() for ARGV: tbl/ (Public Domain)

Turning over all preprocessors from FILE* to file_case::muxer() on
the input side, therefore offering the same decompression
transparency as for the searchpath:: users, as above.

  a3daa94 file_case: add support for "-" a.k.a. stdin (Public Domain)
  66f424f Many: use stdin support of file_case::muxer() (Public Domain)

And then this extends file_case::muxer() to support some special
names to mean stdin, which simplifies some use cases.  (This is
very fresh.)

Notes:
------

- Several:
  @Werner: the whitespace policy is horrific and moreover terribly
  inconsistent; i couldn't always resist to adjust a bit more than
  necessary to the newer policy (two-space indent, all spaces); my
  editor can't do the original? "initial-tab followed by
  whitespace indent" (or at least i didn't program it for that).

- src/preproc/soelim/soelim.cpp:
  . do_file(): messed up error message (now prints plain filename).

Performance test
----------------

A test directory of manual pages:

  ?0[]$ ls raw|wc -l
       732
  ?0[]$ ls gz|wc -l
       732
  ?0[]$ ls bz|wc -l
       732
  ?0[]$ du -s raw gz bz
  11536   raw
  5128    gz
  4816    bz
  ?0[]$

-mman -Tutf8 output:

  21092 -rw-rw-r--    1 steffen  staff  21594718 26 Jul 16:46 raw.out
  21092 -rw-rw-r--    1 steffen  staff  21594718 26 Jul 16:48 gz.out
      0 drwxrwxr-x    9 steffen  staff       306 26 Jul 16:48 ./
  21092 -rw-rw-r--    1 steffen  staff  21594718 26 Jul 16:50 bz.out
  ?0[]$ cmp raw.out gz.out
  ?0[]$ cmp bz.out gz.out
  ?0[]$

Installation not compressed:

  ?0[]$ du -s /Users/steffen/usr/share/groff/1.22.2
  5632    /Users/steffen/usr/share/groff/1.22.2
  ?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
      0m35.29s real     0m31.28s user     0m6.69s system
  ?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
      0m35.28s real     0m31.27s user     0m6.68s system
  ?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
      0m36.22s real     0m32.14s user     0m6.70s system
  ?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
      0m36.05s real     0m32.15s user     0m6.71s system
  ?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
      0m40.07s real     0m32.69s user     0m8.84s system
  ?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
      0m39.44s real     0m32.65s user     0m8.77s system
  ?0[]$

Installation compressed with gzip(1) (via internal zlib layer):

  ?0[]$ du -s /Users/steffen/usr/share/groff/1.22.2/
  2632    /Users/steffen/usr/share/groff/1.22.2/
  ?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
      0m48.47s real     0m44.77s user     0m6.85s system
  ?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
      0m48.35s real     0m44.76s user     0m6.83s system
  ?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
      0m49.18s real     0m45.64s user     0m6.82s system
  ?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
      0m49.59s real     0m45.64s user     0m6.81s system
  ?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
      0m53.13s real     0m46.19s user     0m8.94s system
  ?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
      0m53.15s real     0m46.18s user     0m8.95s system
  ?0[]$

Installation compressed with bzip2(1) (via popen(3)):

  ?0[]$ du -s /Users/steffen/usr/share/groff/1.22.2
  2488    /Users/steffen/usr/share/groff/1.22.2
  ?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
      2m16.74s real     0m58.83s user     0m56.10s system
  ?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
      2m17.44s real     0m59.72s user     0m55.97s system
  ?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
      2m21.05s real     1m0.15s user     0m57.93s system

Unfortunately the following commits are not in tree:

  3d89967 Drop wcwidth(3) in favour of Unicode 6.3.0 homebrew (Public Domain)
  e886b2e Unhook GNUlib, now that wcwidth(3) is gone (Public Domain)
  596d28d src/libs/gnulib: remove unused (Public Domain)
  5633298 (HEAD, refs/heads/file_case) configure+: regenerate

so that your compilation will last a bit longer :)
Anyway i hope you like it, maybe some rough edges but except for
the soelim error messages i'm a bit out of ideas at the moment.
Good night and a nice rest weekend!
Ciao,

--steffen

Attachment: groff.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]