[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] gitlab: introduce s390x wasmtime job
From: |
Ilya Leoshkevich |
Subject: |
Re: [RFC] gitlab: introduce s390x wasmtime job |
Date: |
Mon, 19 Dec 2022 22:42:16 +0100 |
User-agent: |
Evolution 3.46.1 (3.46.1-1.fc37) |
On Fri, 2022-12-16 at 15:10 +0000, Alex Bennée wrote:
>
> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>
> > On Tue, 2022-07-05 at 15:40 +0100, Peter Maydell wrote:
> > > On Tue, 5 Jul 2022 at 15:37, Ilya Leoshkevich <iii@linux.ibm.com>
> > > wrote:
> > > >
> > > > On Tue, 2022-07-05 at 14:57 +0100, Peter Maydell wrote:
> > > > > On Tue, 5 Jul 2022 at 14:04, Daniel P. Berrangé
> > > > > <berrange@redhat.com>
> > > > > wrote:
> > > > > > If we put this job in QEMU CI someone will have to be able
> > > > > > to
> > > > > > interpret the results when it fails.
> > > > >
> > > > > In particular since this is qemu-user, the answer is probably
> > > > > at least some of the time going to be "oh, well, qemu-user
> > > > > isn't
> > > > > reliable
> > > > > if you do complicated things in the guest". I'd be pretty
> > > > > wary of
> > > > > our
> > > > > having
> > > > > a "pass a big complicated guest code test suite under linux-
> > > > > user
> > > > > mode"
> > > > > in the CI path.
> > >
> > > > Actually exercising qemu-user is one of the goals here: just as
> > > > an
> > > > example, one of the things that the test suite found was commit
> > > > 9a12adc704f9 ("linux-user/s390x: Fix unwinding from signal
> > > > handlers"),
> > > > so it's not only about the ISA.
> > > >
> > > > At least for s390x, we've noticed that various projects use
> > > > qemu-user-based setups in their CI (either calling it
> > > > explicitly,
> > > > or
> > > > via binfmt-misc), and we would like these workflows to be
> > > > reliable,
> > > > even if they try complicated (within reason) things there.
> > >
> > > I also would like them to be reliable. But I don't think
> > > *testing* these things is the difficulty: it is having
> > > people who are willing to spend time on the often quite
> > > difficult tasks of identifying why something intermittently
> > > fails and doing the necessary design and implementation work
> > > to correct the problem. Sometimes this is easy (as in the
> > > s390 regression above) but quite often it is not (eg when
> > > multiple threads are in use, or the guest wants to do
> > > something complicated with clone(), etc).
> > >
> > > thanks
> > > -- PMM
> > >
> >
> > For what it's worth, we can help analyzing and fixing failures
> > detected
> > by the s390x wasmtime job. If something breaks, we will have to
> > look at
> > it anyway, and it's better to do this sooner than later.
>
> Sorry for necroing an old thread but I just wanted to add my 2p.
Thanks for that though; I've been cherry-picking this patch into my
private trees for some time now, and would be happy to see it go
upstream in some form.
> I think making 3rd party test suites easily available to developers
> is a worthy
> goal and there are a number that I would like to see including LTP
> and
> kvm-unit-tests. As others have pointed out I'm less sure about adding
> it
> to the gating CI.
Another third-party test suite that I found useful was the valgrind's
one. I'll post my thoughts about integrating wasmtime's and valgrind's
test suites below, unfortunately I'm not too familiar with LTP and
kvm-unit-tests.
Not touching the gating CI is fine for me.
> If we want to go forward with this we should probably think about how
> we
> would approach this generally:
>
> - tests/third-party-suites/FOO?
Sounds good to me.
> - should we use avocado as a wrapper or something else?
> - make check-?
avocado sounds good; we might have to add a second wrapper for
producing tap output (see below).
One should definitely be able to specify the testsuite and the
architecture, e.g. `make check-third-party-wasmtime-s390x`.
In addition, we need to either hardcode or let the user choose
the way the testsuite it built and executed. I see 3 possibilities:
- Fully on the host. Easiest to implement, the results are also easy
to debug. But this requires installing cross-toolchains manually,
which is simple on some distros and not-so-simple on the others.
- Provide the toolchain as a Docker image. For wasmtime, the toolchain
would include the Rust compiler and Cargo. This solves the problem
with configuring the host, but introduces the next choice one has to
make:
- Build qemu on the host. Then qemu binary would have to be
compatible with the container (e.g. no references to the latest
greatest glibc functions).
This is because wastime testsuite needs to run inside the
container: it's driven by Cargo, which is not available on the
host. It is possible to only build tests with Cargo and then run
the resulting binaries manually, but there is more than one and I'm
not sure how to get a list of them (if we decide to do this, in the
worst case the list can be hardcoded).
For valgrind it's a bit easier, since the test runner is not as
complex as Cargo, and can therefore just follow the check-tcg
approach.
- Build qemu inside the container. 2x space and time required, one
might also have to install additional -dev packages for extra qemu
features. Also, a decision needs to be made on whether the qemu
build directory ends up in the container (needs a rebuild on every
run), in a volume (volume lifetime needs to be managed) or in a
mounted host directory (this can cause selinux/ownership issues if
not done carefully).
- Provide both toolchain and testsuite as a Docker image. Essentially
same as above, but trades build time for download time. Also the
results are slightly harder to debug, since the test binaries are
now located inside the container.
Sorry for the long list, it's just that since we are discussing how to
enable this for a larger audience, I felt I needed to enumerate all the
options and pitfalls I could think of.
> - ensuring the suites output tap for meson
At the moment Rust can output either json like this:
$ cargo test -- -Z unstable-options --format=json
{ "type": "suite", "event": "started", "test_count": 1 }
{ "type": "test", "event": "started", "name": "test::hello" }
{ "type": "test", "name": "test::hello", "event": "ok" }
{ "type": "suite", "event": "ok", "passed": 1, "failed": 0, "ignored":
0, "measured": 0, "filtered_out": 0, "exec_time": 0.001460307 }
or xUnit like this:
$ cargo test -- -Z unstable-options --format=junit
# the following is on a single line; formatted for clarity
<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
<testsuite name="test" package="test" id="0" errors="0" failures="0"
tests="1" skipped="0">
<testcase classname="integration" name="test::hello" time="0"/>
<system-out/>
<system-err/>
</testsuite>
</testsuites>
I skimmed the avocado docs and couldn't find whether it can convert
between different test output formats. Based on the source code, we can
add an XUnitRunner the same way the TAPRunner was added.
In the worst case we can pipe json to a script that would output tap.
Enhancing Rust is also an option, of course, even though this might
take some time.
> - document in docs/devel/testing.rst
Right, we need this too; I totally ignored it in this patch.
> Also I want to avoid adding stuff to tests/docker/dockerfiles that
> aren't directly related to check-tcg and the cross builds. I want to
> move away from docker.py so for 3rd party suites lets just call
> docker/podman directly.
We could add the dockerfiles (if we decide we need them based on
the discussion above) to tests/third-party-suites/FOO. My question is,
would it be possible to build and publish the images on GitLab? Or
is it better to build them on developers' machines?
> > Best regards,
> > Ilya