groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] groff performance in respect to hardware platform


From: Steve Izma
Subject: Re: [Groff] groff performance in respect to hardware platform
Date: Thu, 24 Mar 2016 12:47:01 -0400
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Mar 23, 2016 at 11:21:37PM -0400, Steve Izma wrote:
> Subject: Re: [Groff] groff performance in respect to hardware platform
> 
>> ... But I'm wondering if anyone can tell me if groff benefits
>> from running on multiple CPU cores and multiple CPUs.
>> I assume that another way of asking this: "is groff
>> multithreaded?"
>> ...
>> I'm only considering this in a Linux environment (Debian stable,
>> fairly recent kernel).

Thank you, everyone, for the responses, which help clarify things
considerably.

> From: Damian McGuckin <address@hidden>
> 
> It is the rendering of the output that takes the time, not the 'groff'
> processing. Actually you might also have to think about your graphics display
> speed. You probably need to be asking questions of the maintainers of the
> viewer that you are using, not 'groff'.

Yes, I need to look more closely at this. My pipeline consists of:
- a python script reading xml files one at a time, parsing, and
  doing a fairly simple substitution of xml tags to groff
  requests (although more complicated than what one can do with
  sed)
- groff, which calls my own set of tmac files, but which is, of
  course, a pipeline of its own.

The output is a PostScript file. Like Clarke, I only need PDF
when the job is finished.

I start a separate instance of okular to view the PostScript
file. I don't think that okular is particularly tuned to PDF to
the point that a PS file causes it more work; it might be the
reverse. Okular watches the timestamp of the PS file and it
appears to me that it's prompt to notice the difference.

Damian's comment might be relevant to the viewing process (and
definitely for the other graphics-oriented concerns I have) but
one counter-indication is how I observe okular working. Up to
about 50 pages, the PostScript file is completely written before
okular attempts to re-read it. The screen update is very fast.
But if I'm viewing, say, page 90 the PS file (being written
apparently in chunks by grops) is noticed by okular as having
its timestamp changed, so it reads whatever in can get, can't
find page 90, so displays page 50. Strangely enough, it doesn't
seem to notice that the file has had more pages added to it after
this point, so I'm stuck looking at the wrong place in the
output. This implies either an I/O problem or else one part of
the pipeline (I don't think it's the python parsing) is lagging
behind.

> From: Clarke Echols <address@hidden>
> 
> I use vim for all of my editing, and have a function key set up so all I
> have to do is press it and it executes groff with the options I need to
> get the PostScript output in a default file.
> 
> I then monitor the file with an open gv(1) window that updates every
> time I press the function key and the file is rewritten.  I then use
> ps2pdf to get a PDF file when I'm done.  It's fast, easy, and has
> never given me any problem about speed.

Yes, this is essentially what I've been doing for a long time and
it demonstrates clearly to me how much I don't need a WYSIWYG
system (e.g., InDesign) for my work. I used to use gv, but I
seem to recall changing to okular because of better keyboard
shortcuts.

> From: Ralph Corderoy <address@hidden>
> 
> As others have said, the single program groff isn't multi-threaded code,
> but if you give it the -V option then it will print the pipeline of
> processes that it's running, and they potentially run on separate cores
> at the same time.  Plus, as you say, all the other processes that want
> to run, e.g. your kernel, desktop, editor, etc., aren't fighting with
> the ones you're waiting for.
> ...
> Run a program like dstat(1), or vmstat(1), during that tedious 250-page
> book and see what you can glean from the results, e.g. is it CPU bound,
> and how many of your cores are used?

Thanks for the suggestion. I'll definitely work on this.

> From: Morten Bo Johansen <address@hidden>
> 
> I don't think so, but you can use GNU parallel. Look at its
> manual page, there are lots of examples, also on how to use it
> on a single file.
> As for python, there is a multiprocessing module.

I'll experiment with this as well.

> From: Steffen Nurpmeso <address@hidden>
> 
> Well i guess that you benefit quite a bit due to the piped nature
> in between all the several programs that are involved, right?
> Unless one part of the pipeline has to wait for more input from
> its predecessor it seems to me they can run in full parallel.

Ralph's and Steffen's comments reminded me of the pipeline issue
(see above), which I hadn't thought of, so now I'm thinking that
multiple cores across multiple CPUs has definite advantages for
this kind of work.

Just in:
> From: "James K. Lowden" <address@hidden>
> 
> Looking at spawn-pipe.c, the only parallelization you get in groff is
> the pipeline of preprocessing, formatting, and rendering.  
> 
> ISTM that's all you *can* get because the formatting process --
> determining which words go on each line -- is necessarily sequential.
> The whole-paragraph formatting algorithm Doug McIllroy proposed some
> time back worked in parallel, but each paragraph would still be rendered
> serially.  

Thanks for looking at the code, which reinforces the above
comments. I think you're correct about the limitation caused by
the serial nature of typesetting, but I wonder if the non-h&j
processes, like I/O and perhaps even things to do with font data,
could be done in parallel? E.g., if, like in a PostScript file,
one stated at the beginning of a tmac file all the fonts that are
going to be needed then having font info compilation as a
separate thread -- this might work? (Not that I'm volunteering; I
couldn't begin to understand how to code that.)

        -- Steve

-- 
Steve Izma
-
Home: 35 Locust St., Kitchener, Ontario, Canada  N2H 1W6
E-mail: address@hidden     phone: :519-745-1313

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
<http://en.wikipedia.org/wiki/Posting_style>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]