Re: flakiness on CI jobs run via k8s

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: flakiness on CI jobs run via k8s

From:	Daniel P . Berrangé
Subject:	Re: flakiness on CI jobs run via k8s
Date:	Wed, 18 Sep 2024 11:24:22 +0200
User-agent:	Mutt/2.2.12 (2023-09-09)

On Tue, Sep 17, 2024 at 04:48:45PM +0100, Peter Maydell wrote:
> I notice that a lot of the CI job flakiness I'm seeing with main
> CI runs involves jobs that are run via the k8s runners. Notably
> cross-i686-tci and cross-i686-system and cross-i686-user are like this.
> These jobs run with no flakiness that I've noticed when they're run
> by an individual gitlab user (in which case they're not running on
> k8s, I believe). So something seems to be up with the environment
> we're using to run the jobs for the main CI. My impression is that
> the time things take to run can be very variable, especially if the
> CI job believes the reported number of CPUs and actually tries to run
> 8 or 9 test cases in parallel.
> 
> Any ideas what might be causing issues here, or config tweaks
> we might be able to make to ensure that the environment reports
> to the CI job a number of CPUs/etc that accurately reflects
> the amount of resource it really has?

Didn't we change the hosting for our k8s runners recently ? They were
running on Azure, but I vaguely recall hearing that it was being
switched again.

Anyway, perhaps the cloud provider is over-committing the env such
that we have excessive streal time and thus not getting the full
power of the CPUs we expect.  I know gitlab's own public runners
will suffer from this periodically, due to the very cheap VMs they
host on.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[Prev in Thread]

Current Thread

[Next in Thread]

flakiness on CI jobs run via k8s, Peter Maydell, 2024/09/17
- Re: flakiness on CI jobs run via k8s, Daniel P . Berrangé <=

Prev by Date: Xilinx Zynq 7000 Start of CPU1
Next by Date: Re: [PATCH v2] audio/pw: Report more accurate error when connecting to PipeWire fails
Previous by thread: flakiness on CI jobs run via k8s
Next by thread: [PATCH RESEND v4 0/4] target/i386: Various Hyper-V related fixes
Index(es):
- Date
- Thread