lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Continuing deboostification with removing dependency on Boost.


From: Vadim Zeitlin
Subject: Re: [lmi] Continuing deboostification with removing dependency on Boost.Regex
Date: Sun, 30 May 2021 15:08:50 +0200

On Sun, 30 May 2021 11:43:50 +0000 Greg Chicares <gchicares@sbcglobal.net> 
wrote:

GC> Right now, we have this, in 'GNUmakefile':
GC> 
GC>   @-cd $(prefascicle_dir) && $(PERFORM) $(TEST_CODING_RULES) *
GC> 
GC> so the program receives the entire file list [which seems to be
GC> 15000 bytes, so invoking that makefile target from a remote
GC> directory seems like it might exceed some system limit, BTW].
GC> 
GC> Let's approach this in two steps. First, couldn't we just do
GC> something like:
GC> 
GC>   find $(prefascicle_dir) -print0 | xargs -0 $(PERFORM) $(TEST_CODING_RULES)
GC> 
GC> instead, to achieve the same (non-parallelized) outcome as today?
GC> Second, couldn't we parallelize that with a patch like [untested]...
GC> 
GC> +concinnity_check_files := $(addsuffix 
--concinnity-check,$(prefascicle_dir)/.)
GC> 
GC> +.PHONY: %-concinnity-check
GC> +%-concinnity-check:
GC> +   @-$(PERFORM) $(TEST_CODING_RULES) $*
GC> 
GC>  .PHONY: check_concinnity
GC> -check_concinnity: source_clean custom_tools
GC> +check_concinnity: source_clean custom_tools $(concinnity_check_files)
GC> 
GC> ...thereby using 'make' to take care of parallelism, without
GC> making 'test_coding_rules$(EXEEXT)' multithreaded?
GC> 
GC> Then the list of all files is known to 'make', and to the commands
GC> it uses to display summary statistics--whereas each running
GC> instance of 'test_coding_rules$(EXEEXT)' knows only the single
GC> file it's testing, but isn't that perfectly okay?

 Yes, this is indeed a possible solution and I've actually thought about
doing something like this myself immediately _after_ posting my previous
message, but I wouldn't say it's perfect.

 One reason for it is that relaunching a new copy of the program seems
inefficient and it's definitely going to be noticeably slower when using
Wine (whose process startup overhead is not negligible at all) and might be
even noticeable when using native processes: even if launching them is
fast, doing it half a thousand times more than necessary still seems
wasteful.

 The other reason is that I'd like to be able to run test_coding_rules
quickly manually, or from CI scripts, too, and doing something in the
makefile is not going to help with this at all.

 But, as I've also realized after sending the previous message, in
principle both of these problems could be addressed by using GNU Parallel
(https://www.gnu.org/software/parallel/), so maybe we should just do this.


GC> >  Please correct me if I'm wrong, but you still seem to be against actually
GC> > changing test_coding_rules to use threads, and I don't know how to do it
GC> > otherwise, so this still needs to be clarified, ideally.
GC> 
GC> I've never used threads myself, so it's always been my habit to
GC> avoid threading. Old habit isn't a good reason not to make this
GC> multithreaded. A complete lack of experience makes it seem to
GC> require a lot of effort. OTOH, having one threaded example in
GC> the codebase makes it much easier to add more, so using threads
GC> here could be a good thing, educationally.

 This wouldn't be a very representative example, as this case is almost too
simple, i.e. any use of background worker threads in lmi-wx itself would be
more complicated. But OTOH maybe it's a good thing for a first example to
be simple.

GC> But would it be technically superior to using 'make' as above?

 It seems to me that it would, but I can't say by how much without testing
which, in turn, requires writing this code first.

GC> Maybe: we'd be running a single program rather than hundreds of
GC> instances of the same program, and there might be a measurable
GC> speed difference. Or maybe not--in the makefile sketch above, if
GC> we choose to omit this line:
GC>   .PHONY: %-concinnity-check
GC> then, as an automatic byproduct, we get a sentinel file for each
GC> source file whose concinnity has been tested since it was last
GC> changed [which we'd probably want to put in some subdirectory];
GC> and then when we modify a few files, only their concinnity needs
GC> to be rechecked.

 This is quite orthogonal to the original issue and IMO should be discussed
separately. I.e. I do agree that redoing the same checks all the time is
wasteful, but we don't really need (or want, in my case) GNU make to avoid
doing it (e.g. the program could maintain its own database with SHA-1 sums
of the input files, which would be more reliably than the timestamps that
make relies on) and, also, doing it unconditionally has the advantage of
not having to do anything special if test_coding_rules itself changes, so
this optimization is not without cost (even if it's still worth it).

GC> >  But any parallelization will only come later, i.e. my order of sub-tasks
GC> > here is:
GC> > 
GC> > 1. Rebase std::regex patch on master and benchmark it.
GC> > 2. Try using CTRE.
GC> > 3. Parallelize test_coding_rules.
GC> > 
GC> >  I think this makes sense because if you're interested in using CTRE
GC> > anyhow, there is no need to try (3) before (2) to see if it's going to be
GC> > fast enough -- we don't know by how much exactly, but we can be quite sure
GC> > that CTRE will be still faster anyhow, and it might even be fast enough to
GC> > not require (3) in practice. But please correct me if I'm wrong.
GC> 
GC> I certainly agree with (1) and (2). As for (3), maybe I should
GC> test the sketch above and share a sentinel-based implementation
GC> that we could experiment with, measure, and discuss.

 I'll try to do (1) and (2) soon, but this will still take me some time, so
you could definitely experiment with this in the meanwhile if you'd like
to. OTOH I still would prefer to avoid relying on make for being able to
run test_coding_rules reasonably fast and an even simpler solution of using
GNU parallel would be preferable from this point of view.

 Regards,
VZ

Attachment: pgp5l8ZcpiJGS.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]