qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Serious doubts about Gitlab CI


From: Thomas Huth
Subject: Re: Serious doubts about Gitlab CI
Date: Tue, 30 Mar 2021 13:55:48 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0

On 30/03/2021 13.19, Daniel P. Berrangé wrote:
On Mon, Mar 29, 2021 at 03:10:36PM +0100, Stefan Hajnoczi wrote:
Hi,
I wanted to follow up with a summary of the CI jobs:

1. Containers & Containers Layer2 - ~3 minutes/job x 39 jobs
2. Builds - ~50 minutes/job x 61 jobs
3. Tests - ~12 minutes/job x 20 jobs
4. Deploy - 52 minutes x 1 job

I hope that 52 was just a typo ... ?

I think a challenges we have with our incremental approach is that
we're not really taking into account relative importance of the
different build scenarios, and often don't look at the big picture
of what the new job adds in terms of quality, compared to existing
jobs.

eg Consider we have

   build-system-alpine:
   build-system-ubuntu:
   build-system-debian:
   build-system-fedora:
   build-system-centos:
   build-system-opensuse:

I guess we could go through that list of jobs and remove the duplicated target CPUs, e.g. it should be enough to test x86_64-softmmu only once.

   build-trace-multi-user:
   build-trace-ftrace-system:
   build-trace-ust-system:

I'd question whether we really need any of those 'build-trace'
jobs. Instead, we could have build-system-ubuntu pass
--enable-trace-backends=log,simple,syslog, build-system-debian
pass --enable-trace-backends=ust and build-system-fedora
pass --enable-trace-backends=ftrace, etc.

I recently had the very same idea already:

 https://gitlab.com/qemu-project/qemu/-/commit/65aff82076a9bbfdf7

:-)

Another example, is that we test builds on centos7 with
three different combos of crypto backend settings. This was
to exercise bugs we've seen in old crypto packages in RHEL-7
but in reality, it is probably overkill, because downstream
RHEL-7 only cares about one specific combination.

Care to send a patch? Or shall we just wait one more months and then remove these jobs (since we won't support RHEL7 after QEMU 6.0 anymore)?

We don't really have a clearly defined plan to identify what
the most important things are in our testing coverage, so we
tend to accept anything without questioning its value add.
This really feeds back into the idea I've brought up many
times in the past, that we need to better define what we aim
to support in QEMU and its quality level, which will influence
what are the scenarios we care about testing.

But code that we have in the repository should get at least some basic test coverage, otherwise it bitrots soon ... so it's maybe rather the other old problem that we struggle with, that we should deprecate more code and remove it if nobody cares about it...

Traditionally ccache (https://ccache.dev/) was used to detect
recompilation of the same compiler input files. This is trickier to do
in GitLab CI since it would be necessary to share and update a cache,
potentially between untrusted users. Unfortunately this shifts the
bottleneck from CPU to network in a CI-as-a-Service environment since
the cached build output needs to be accessed by the linker on the CI
runner but is stored remotely.

Our docker containers install ccache already and I could have sworn
that we use that in gitlab, but now I'm not so sure. We're only
saving the "build/" directory as an artifact between jobs, and I'm
not sure that directory holds the ccache cache.

AFAIK we never really enabled ccache in the gitlab-CI, only in Travis.

This is as far as I've gotten with thinking about CI efficiency. Do you
think these optimizations are worth investigating or should we keep it
simple and just disable many builds by default?

ccache is a no-brainer and assuming it isn't already working with
our gitlab jobs, we must fix that asap.

I've found some nice instructions here:

https://gould.cx/ted/blog/2017/06/10/ccache-for-Gitlab-CI/

... and just kicked off a build with these modifications, let's see how it goes...

Aside from optimizing CI, we should consider whether there's more we
can do to optimize build process itself. We've done alot of work, but
there's still plenty of stuff we build multiple times, once for each
target. Perhaps there's scope for cutting this down in some manner ?

Right, I think we should also work more towards consolidating the QEMU binaries, to avoid that we have to always build sooo many target binaries again and again. E.g.:

- Do we still need to support 32-bit hosts? If not we could
  finally get rid of qemu-system-i386, qemu-system-ppc,
  qemu-system-arm, etc. and just provide the 64-bit variants

- Could we maybe somehow unify the targets that have both, big
  and little endian versions? Then we could merge e.g.
  qemu-system-microblaze and qemu-system-microblazeel etc.

- Or could we maybe even build a unified qemu-system binary that
  contains all target CPUs? ... that would also allow e.g.
  machines with a x86 main CPU and an ARM-based board management
  controller...

I'm unclear how many jobs in CI are build submodules, but if there's
more scope for using the pre-built distro packages that's going to
be beneficial in build time.

Since the build system has been converted to meson, I think the configure script prefers to use the submodules instead of the distro packages. I've tried to remedy this a little bit here:

https://gitlab.com/qemu-project/qemu/-/commit/db0108d5d846e9a8

... but new jobs of course will use the submodules again if the author is not careful.

I think we should tweak that behavior again to use the system version of capstone, slirp and dtc instead if these are available. Paolo, what do you think?

Also I wonder whether we could maybe even get rid of the capstone and slirp submodules in QEMU now ... these libraries should be available in the most distros by now, and otherwise people could also install them manually instead?

 Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]