[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unpredictable performance degradation in QEMU KVMs
From: |
Frantisek Rysanek |
Subject: |
Re: Unpredictable performance degradation in QEMU KVMs |
Date: |
Wed, 06 Oct 2021 22:33:58 +0200 |
Hello Parnell,
I'm just a part-time Linux admin / enthusiast, by no means a Dev-Ops
professional. I have some historical experience under the hood as a
HW/OS troubleshooter. So whatever I voice here is just my "two cents
worth".
One last note regarding the hypothetical "architecture discrepancy":
the x86_64 instruction set is "modular". QEMU has an option (or maybe
it's the default) to "pass through" the CPU feature set from the host
to guest (see the "flags" row in /proc/cpuinfo). Thus, especially
given VT-x, no instructions need to be "emulated" - software in the
guest can see what the host CPU can provide in bare metal, and all
the instructions in the VM guest run on bare metal of the host CPU.
If I understand correctly, when the "performance degradation"
happens, the affected VM guest is still basically functional,
accessible, can be inspected, given the right software tools it can
collect "metrics" and either store them locally or make the data
available over the network - correct?
When debugging some intermittent phenomena, my favourite approach is
to measure and record and graph whatever interesting data I can come
across. In one-second interval, if need be.
There are ready-made tools such as Nagios (and many others in that
vein) which can help you do this data collection and graphing in a
centralized fashion - provided that you can learn to work with those
behemoths...
https://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/
check_linux_stats/details
In my daily practice, rather than install and configure Nagios (which
I'm not familiar with), I tend to cobble together simple scripts or
dedicated C proggies that produce timestamped textual CSV format,
which can then be graphed using e.g. Gnuplot. I use this in a
different area of interest, I've never felt a need to collect basic
system stats, but my approach should be easily applicable...
I can provide some examples if you want.
Even if your goal, for a particular "metric", is not to turn it into
a time-series chart (and therefore invest some effort to extract the
data from some half-convenient "data source"), it may make good sense
just to log available data in the raw format they are in, into a
file, with timestamps, for later reference...
Apart from top and latencytop, there is iotop and iostat (from the
sysstat package), for network traffic there is nethogs or iftop. My
general objection against these tools is, that many of them are
"interactive" / full screen = do not produce a "scrolling output on
stdout", viable for storing in a log for later use... For continuous
collection, either the particular tool has a cmdline option for
non-interactive streaming output, or you need to look for a different
tool...
One possibility is to install snmpd = the SNMP agent, which comes
packaged with a subagent dedicated to local system monitoring. Not
sure what variables are served and how exhaustive or useful for your
case these are.
SNMP can be polled using tools such as Nagios and friends, or using
custom/dedicated tools - here's one of my own:
http://support.fccps.cz/download/adv/frr/snmp/snmp.htm
For interactive browsing of the SNMP tree, you can use a tool called
"snmpb" or some commercial work-alikes...
And that's probably not the end of your options.
Just use tools that are familiar to you - thus saving time needed to
configure the data collection and analysis "framework"...
Frank
On 6 Oct 2021 at 10:58, Parnell Springmeyer wrote:
>
> Hi Frantisek, thanks for replying.
>
> I've not checked using `latencytop`. I will do that, thanks for the
> suggestion.
>
> The most frustrating problem is that the degradation in performance
> is so far very hard to reproduce manually so we haven't really been
> able to determine if it's a CPU performance issue, storage IO, or
> contention.
>
> Not dumb questions, you're talking to someone who doesn't work on
> this sort of technology much, so it is very helpful to get an idea of
> what I might or should look at.
>
> I know we use the same architecture so we can eliminate that as an
> issue.
>
> Thanks for the feedback, I'll see if I can discover anything
> interesting given the ideas you've suggested I poke around at.
>
>
> On Wed, Oct 6, 2021 at 4:06 AM Frantisek Rysanek
> <Frantisek.Rysanek@post.cz> wrote:
> On 5 Oct 2021 at 18:58, Parnell Springmeyer wrote:
> >
> > Hi, we use QEMU VMs for running our integration testing
> > infrastructure and have run into a very difficult to debug problem:
> > occasionally we will see a severe performance degradation in some of
> > our QEMU VMs.
> >
> If memory serves, QEMU guests appear to run as processes in the Linux
> host instance. I'm not "in the know enough" to tell you, how much is
> possibly happening under the hood in the kernel support side of
> things, which is potentially not well described by that superficial
> abstraction visible in "top".
>
> Esoteric issues aside (CPU arch incompatibilities between host and
> guest), have you tried inspecting what the load looks like, in the
> guest and in the host OS instance? What does "top" show? With CPU
> cores expanded? (press "1")
> Have you tried "latencytop" by any chance?
>
> Are you sure this is a CPU performance/emulation issue?
> What storage are your VM's using? Could storage be the bottleneck?
> Isn't the observed "sluggishness" storage-io-bound, rather than CPU
> bound? Can you tell the difference? (Heck... apologies, that's
> probably a series of dumb questions to someone @arista.com)
>
> Stuff can get sluggish when IRQ's don't work right. Any signs of that
> in the guest instance? Interesting messages in dmesg, interesting
> numbers in /proc/interrupts?
>
> CPU arch emulation issues (guest vs. host) might also be an issue. If
> you specify a different CPU core for the guest than the host actually
> has, you may get some fringe parts of the instruction set, even
> within the x86_64 family, that needs to be tediously emulated for the
> guest instance... also, I'd hazard a guess 32bit vs. 64bit *might*
> play a role, albeit marginal. I have fond memories of the 387 math
> co-processor emulation (and its effects on program runtime), but
> that's a *long* time ago :-)
>
> I've seen EXT3 and EXT4 hang for no apparent reason, on bare metal,
> under heavy IOps stress. CPU consumption at 0%, disk IOps at pure 0,
> but the filesystem would block forever in a standstill. If I recall
> correctly, I used Bonnie++ to generate that kind of stress
> reproducibly, against fast block storage (HW RAID back then). There
> was no QEMU in the game.
>
> = feel free to add some juicy detail for us to ponder :-)
>
> Frank
>
>
>
> --
> Parnell Springmeyer