[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#43773: [PATCH] offload: Improve load normalization and configurabili
From: |
Ludovic Courtès |
Subject: |
bug#43773: [PATCH] offload: Improve load normalization and configurability. |
Date: |
Mon, 05 Oct 2020 16:06:09 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Hi,
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Fixes <https://issues.guix.gnu.org/43773>.
>
> The computed normalized load was previously obtained by dividing the load
> average as found in /proc/loadavg by the number of parallel builds defined for
> a build machine.
>
> This normalized didn't allow to compare machines with different number of
^
> cores, as the load average reported by can be as high as the number of cores;
^
Missing words.
> thus comparing that value to a fixed threshold of 2.0 would mean machines with
> multiple cores were more likely to be flagged as overloaded compared to single
> core machines.
>
> This can be fixed by normalizing using the available number of cores instead
> of the number of parallel jobs.
Indeed, good catch!
> * guix/scripts/offload.scm (<build-machine>)[overload-threshold]: New field.
> (node-load): Modify to return a normalized load value between 0 and 1, taking
> into account the number of cores available.
> (normalized-load): Remove procedure.
> (report-load): New procedure.
> (choose-build-machine): Adjust to use the modified 'node-load' and the new
> 'report-load' and 'build-machine-overload-threshold' procedures.
> (check-machine-status): Adjust.
> * doc/guix.texi (Daemon Offload Setup): Document the offload scheduler and the
> new 'overload-threshold' field.
>
> doc/guix.texi | 30 +++++++++++++++++++++-
> guix/scripts/offload.scm | 54 ++++++++++++++++++++++++----------------
> 2 files changed, 62 insertions(+), 22 deletions(-)
Nice.
[...]
> (define (node-load node)
> - "Return the load on NODE. Return + if NODE is misbehaving."
> + "Return the load on NODE, a normalized value between 0.0 and 1.0. The
> value
> +is derived from /proc/loadavg and normalized according to the number of
> +logical cores available, to give a rough estimation of CPU usage. Return
> +1.0 (fully loaded) if NODE is misbehaving."
> (let ((line (inferior-eval '(begin
> (use-modules (ice-9 rdelim))
> (call-with-input-file "/proc/loadavg"
> read-string))
> - node)))
> - (if (eof-object? line)
> - +inf.0 ;MACHINE does not respond, so assume it is infinitely loaded
> + node))
> + (ncores (inferior-eval '(begin
> + (use-modules (ice-9 threads))
> + (current-processor-count))
> + node)))
> + (if (or (eof-object? line) (eof-object? ncores))
> + 1.0 ;MACHINE does not respond, so assume it is fully loaded
Returning 1.0 now is akin to returning + before, meaning that the
machine will never be picked up, is that right?
What if one sets overload-threshold = 1.0, the machine would still be
picked up, no?
> + (if (and node
> + (or (not threshold) (< load threshold))
I think we can assume that THRESHOLD is always a number, including
possible +inf.0.
Thanks,
Ludo.