[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: flakiness on CI jobs run via k8s
From: |
Daniel P . Berrangé |
Subject: |
Re: flakiness on CI jobs run via k8s |
Date: |
Wed, 18 Sep 2024 11:24:22 +0200 |
User-agent: |
Mutt/2.2.12 (2023-09-09) |
On Tue, Sep 17, 2024 at 04:48:45PM +0100, Peter Maydell wrote:
> I notice that a lot of the CI job flakiness I'm seeing with main
> CI runs involves jobs that are run via the k8s runners. Notably
> cross-i686-tci and cross-i686-system and cross-i686-user are like this.
> These jobs run with no flakiness that I've noticed when they're run
> by an individual gitlab user (in which case they're not running on
> k8s, I believe). So something seems to be up with the environment
> we're using to run the jobs for the main CI. My impression is that
> the time things take to run can be very variable, especially if the
> CI job believes the reported number of CPUs and actually tries to run
> 8 or 9 test cases in parallel.
>
> Any ideas what might be causing issues here, or config tweaks
> we might be able to make to ensure that the environment reports
> to the CI job a number of CPUs/etc that accurately reflects
> the amount of resource it really has?
Didn't we change the hosting for our k8s runners recently ? They were
running on Azure, but I vaguely recall hearing that it was being
switched again.
Anyway, perhaps the cloud provider is over-committing the env such
that we have excessive streal time and thus not getting the full
power of the CPUs we expect. I know gitlab's own public runners
will suffer from this periodically, due to the very cheap VMs they
host on.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|