[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to tell if an emulated aarch64 CPU has stopped doing work?
From: |
Alex Bennée |
Subject: |
Re: How to tell if an emulated aarch64 CPU has stopped doing work? |
Date: |
Fri, 12 Jun 2020 19:46:09 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
Dave Bort <dbort-PgRGKqEAcmkAvxtiuMwx3w@public.gmane.org> writes:
> We use qemu (4.0.0, about to flip the switch to 5.0.0) to test our aarch64
> images, running in linux containers on x86_64 alongside other workloads.
>
> We've recently run into issues where it looks like an emulated CPU (out of
> four) sometimes stops making progress for ten or more seconds, and we're
> trying to characterize the problem. When this
> happens, the other emulated CPUs run just fine, though sometimes two will
> stall out at the same time.
>
> Any suggestions for how to tell if an emulated CPU stopped doing work?
>
> Based on our experiments, the guest-visible clocks and cycle counters
> continue to run when a qemu CPU thread is suspended, so it's hard to tell
> whether the emulation paused, or if our code is
> spinning with interrupts disabled (though evidence is mounting that that's
> not the case). We're adding a bunch more instrumentation to our code, but
> maybe qemu has some features that will help
> us out.
>
> I tried to find a way to count the number of TBs executed by an
> emulated core over time, but I didn't see a cheap way to do that with
> the plugin APIs.
It should be pretty cheap to do. You just need to extend the example bb
plugin to take cpu_index into account and do the proper locking to
update the instruction counter in vcpu_tb_exec.
The qemu_plugin_register_vcpu_idle_cb and
qemu_plugin_register_vcpu_resume_cb functions allow you to register call
backs for everytime we exit the main run loop and sleep for whatever
reason. You could even dump the total instruction counts there.
>
> We could maybe turn on instruction tracing, but this problem happens pretty
> rarely (<1%), we don't have a repro case yet, and we can't really afford the
> cost of slowing down every test run.
> There's a decent chance that this is caused by an overloaded host, but our
> host-side investigations haven't turned up anything concrete either.
>
> Any advice?
>
> --dbort
>
--
Alex Bennée
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: How to tell if an emulated aarch64 CPU has stopped doing work?,
Alex Bennée <=