[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] Inputfile_caseencapsulation --with-unpack (and --with-zlib) supp
From: |
Steffen Nurpmeso |
Subject: |
[Groff] Inputfile_caseencapsulation --with-unpack (and --with-zlib) support |
Date: |
Sun, 27 Jul 2014 01:43:07 +0200 |
User-agent: |
s-nail v14.7.4-3-g32d76ea |
Hello Werner and the list,
it is a bit hilarious to post it with only a very shallow review
given how much changes just happened today, but for one i think it
won't pass without further comments anyway and then my restricted
groff use cases can't seem to break it; i'd wish i had a test
document which teases out the maximum of all preprocessors etc.,
maybe someone can help out? That would be nice.
The attachment is a MBOX of git(1) changesets which can be applied
via '$ git am MBOX', or simply '|git am' the attachment if your
MUA supports that. To do so on a temporary test branch
'$ git checkout -B testbranch origin/master' first. Then you can
not only throw anything away via '$ git branch -D testbranch' but
you can also see a single diff with '$ git diff origin/master..'.
Changesets:
-----------
075e75f contrib/gideal/Makefile.sub: fix Bernd (Public Domain)
61e9cd2 src/include/lib.h: add some infrastructure (Public Domain)
d154f86 Encapsulate searchpath:: FILE*'s in new class file_case (Public
Domain)
This is the plain encapsulation only, without any decompression
support.
ea111b9 Add GROFF_UNPACK_CHECK m4++ / --with-unpack=XY (Public Domain)
New option --with{,out}-unpack{=gz,bz2,xz}. The default is
--with-unpack which in turn enables all of gz,bz2,xz which is
possibly not desirable. I think a plain default of gz would be
more useful given how many stat(2)s are needed otherwise.
6ed7720 file_case: support transparent file decompression (Public Domain)..
..via external decompressor commands, i.e., popen(3), using
a temporary file shall the callee need a seekable stream.
This also extends src/devices/gropdf/gropdf.pl accordingly.
So with this one can say '$ xy troff.1' and '$ xy troff.1.gz' and
it'll find `troff.1.gz' and display it correctly (i.e., different
to the draft we also look wether the path already includes an
extension, then tries the plain file, then tries to auto-expand
with compressor extensions).
f605d82 file_case: add read interface, extend abstraction (Public Domain)..
And indeed it wasn't that hard to encapsulate all I/O activity
(only input, to be more exact) in class file_case!
5c2bf98 Add GROFF_ZLIB_CHECK m4++ / --with-zlib=DIR (Public Domain)
If --with-unpack included gz support then --with-zlib can be used
to hook in zlib.
f6ad3b7 file_case: implement direct HAVE_ZLIB layer (Public Domain)
Simple version, meaning that the I/O functions use if..else to
choose in between STD I/O and zlib I/O. A further abstraction,
but different to what i've said last time not public and via
virtual C++ classes (it's a program not a library in the end) but
only via a file_case.cpp internal function pointer interface could
be used instead. A TODO note on that in there.
3f3f721 CXX peace: src/preproc/eqn/pile.cpp (Public Domain)
d73d215 CXX peace: src/roff/troff/input.cpp (Public Domain)
2668aeb CXX peace: src/preproc/grn/hgraph.cpp (Public Domain)
Silence.
1226fbb Preproc: use file_case::muxer() for ARGV: eqn/ (Public Domain)
3409988 Preproc: use file_case::muxer() for ARGV: grn/ (Public Domain)
906e29b Preproc: use file_case::muxer() for ARGV: html/ (Public Domain)
deff27c Preproc: use file_case::muxer() for ARGV: pic/ (Public Domain)
b62209e Preproc: use file_case::muxer() for ARGV: preconv/ (Public Domain)
09565d1 Preproc: use file_case::muxer() for ARGV: refer/ (Public Domain)
69ea789 Preproc: use file_case::muxer() for ARGV: tbl/ (Public Domain)
Turning over all preprocessors from FILE* to file_case::muxer() on
the input side, therefore offering the same decompression
transparency as for the searchpath:: users, as above.
a3daa94 file_case: add support for "-" a.k.a. stdin (Public Domain)
66f424f Many: use stdin support of file_case::muxer() (Public Domain)
And then this extends file_case::muxer() to support some special
names to mean stdin, which simplifies some use cases. (This is
very fresh.)
Notes:
------
- Several:
@Werner: the whitespace policy is horrific and moreover terribly
inconsistent; i couldn't always resist to adjust a bit more than
necessary to the newer policy (two-space indent, all spaces); my
editor can't do the original? "initial-tab followed by
whitespace indent" (or at least i didn't program it for that).
- src/preproc/soelim/soelim.cpp:
. do_file(): messed up error message (now prints plain filename).
Performance test
----------------
A test directory of manual pages:
?0[]$ ls raw|wc -l
732
?0[]$ ls gz|wc -l
732
?0[]$ ls bz|wc -l
732
?0[]$ du -s raw gz bz
11536 raw
5128 gz
4816 bz
?0[]$
-mman -Tutf8 output:
21092 -rw-rw-r-- 1 steffen staff 21594718 26 Jul 16:46 raw.out
21092 -rw-rw-r-- 1 steffen staff 21594718 26 Jul 16:48 gz.out
0 drwxrwxr-x 9 steffen staff 306 26 Jul 16:48 ./
21092 -rw-rw-r-- 1 steffen staff 21594718 26 Jul 16:50 bz.out
?0[]$ cmp raw.out gz.out
?0[]$ cmp bz.out gz.out
?0[]$
Installation not compressed:
?0[]$ du -s /Users/steffen/usr/share/groff/1.22.2
5632 /Users/steffen/usr/share/groff/1.22.2
?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
0m35.29s real 0m31.28s user 0m6.69s system
?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
0m35.28s real 0m31.27s user 0m6.68s system
?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
0m36.22s real 0m32.14s user 0m6.70s system
?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
0m36.05s real 0m32.15s user 0m6.71s system
?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
0m40.07s real 0m32.69s user 0m8.84s system
?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
0m39.44s real 0m32.65s user 0m8.77s system
?0[]$
Installation compressed with gzip(1) (via internal zlib layer):
?0[]$ du -s /Users/steffen/usr/share/groff/1.22.2/
2632 /Users/steffen/usr/share/groff/1.22.2/
?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
0m48.47s real 0m44.77s user 0m6.85s system
?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
0m48.35s real 0m44.76s user 0m6.83s system
?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
0m49.18s real 0m45.64s user 0m6.82s system
?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
0m49.59s real 0m45.64s user 0m6.81s system
?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
0m53.13s real 0m46.19s user 0m8.94s system
?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
0m53.15s real 0m46.18s user 0m8.95s system
?0[]$
Installation compressed with bzip2(1) (via popen(3)):
?0[]$ du -s /Users/steffen/usr/share/groff/1.22.2
2488 /Users/steffen/usr/share/groff/1.22.2
?0[]$ time find raw/ -type f -exec groff -mman -Tutf8 '{}' ';' >> raw.out
2m16.74s real 0m58.83s user 0m56.10s system
?0[]$ time find gz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> gz.out
2m17.44s real 0m59.72s user 0m55.97s system
?0[]$ time find bz/ -type f -exec groff -mman -Tutf8 '{}' ';' >> bz.out
2m21.05s real 1m0.15s user 0m57.93s system
Unfortunately the following commits are not in tree:
3d89967 Drop wcwidth(3) in favour of Unicode 6.3.0 homebrew (Public Domain)
e886b2e Unhook GNUlib, now that wcwidth(3) is gone (Public Domain)
596d28d src/libs/gnulib: remove unused (Public Domain)
5633298 (HEAD, refs/heads/file_case) configure+: regenerate
so that your compilation will last a bit longer :)
Anyway i hope you like it, maybe some rough edges but except for
the soelim error messages i'm a bit out of ideas at the moment.
Good night and a nice rest weekend!
Ciao,
--steffen
groff.patch
Description: Text Data
- [Groff] Inputfile_caseencapsulation --with-unpack (and --with-zlib) support,
Steffen Nurpmeso <=