[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Using the qemu tracepoints with SystemTap
From: |
William Cohen |
Subject: |
Re: [Qemu-devel] Using the qemu tracepoints with SystemTap |
Date: |
Tue, 13 Sep 2011 12:10:47 -0400 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.22) Gecko/20110904 Red Hat/3.1.14-1.el6_1 Thunderbird/3.1.14 |
On 09/13/2011 06:03 AM, Stefan Hajnoczi wrote:
> On Mon, Sep 12, 2011 at 4:33 PM, William Cohen <address@hidden> wrote:
>> The RHEL-6 version of qemu-kvm makes the tracepoints available to SystemTap.
>> I have been working on useful examples for the SystemTap tracepoints in
>> qemu. There doesn't seem to be a great number of examples showing the
>> utility of the tracepoints in diagnosing problems. However, I came across
>> the following blog entry that had several examples:
>>
>> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html
>>
>> I reimplemented the VirtqueueRequestTracker example from the blog in
>> SystemTap (the attached virtqueueleaks.stp). I can run it on RHEL-6's
>> qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64 and get output like the following. It
>> outputs the pid and the address of the elem that leaked when the script is
>> stopped like the following:
>>
>> $ stap virtqueueleaks.stp
>> ^C
>> pid elem
>> 19503 1c4af28
>> 19503 1c56f88
>> 19503 1c62fe8
>> 19503 1c6f048
>> 19503 1c7b0a8
>> 19503 1c87108
>> 19503 1c93168
>> ...
>>
>> I am not that familiar with the internals of qemu. The script seems to
>> indicates qemu is leaking, but is that really the case? If there are
>> resource leaks, what output would help debug those leaks? What enhancements
>> can be done to this script to provide more useful information?
>
Hi Stefan,
Thanks for the comments.
> Leak tracing always has corner cases :).
>
> With virtio-blk this would indicate a leak because it uses a
> request-response model where the guest initiates I/O and the host
> responds. A guest that cleanly shuts down before you exit your
> SystemTap script should not leak requests for virtio-blk.
I stopped the systemtap script while the guest vm was still running. So when
the guest vm cleanly shuts down there should be a series of virtqueue_fill
operations that will remove those elements?
Qemu uses a thread for each virtual processor, but a single thread to handle
all IO. It seems like that might be a possible bottleneck. What would be the
path of io event from guest to host back to guest? Is there somthing that a
script could do to gauge that delay due to the qemu io thread handling multiple
processors?
>
> With virtio-net the guest actually hands the host receive buffers and
> they are held until we can receive packets into them and return them
> to the host. We don't have a virtio_reset trace event, and due to
> this we're not accounting for clean shutdown (the guest driver resets
> the device to clear all virtqueues).
>
> I am submitting a patch to add virtio_reset() tracing. This will
> allow the script to delete all elements belonging to this virtio
> device.
>
>> Are there other examples of qemu probing people would like to see?
>
> The malloc/realloc/memalign/vmalloc/free/vfree trace events can be
> used for a few things:
> * Finding memory leaks.
> * Finding malloc/vfree or vmalloc/free mismatches. The rules are:
> malloc/realloc need free, memalign/vmalloc need vfree. They cannot be
> mixed.
>
> Stefan
As a quick and simple experiment to see how often various probes are getting
hit I used the following script on RHEL-6 (the probe points are a bit different
on Fedora):
global counts
probe qemu.*.*? {counts[pn()]++}
probe end {foreach(n+ in counts) printf ("%s = %d\n", n, counts[n])}
For starting up a fedora 14 guest virtual machine and shutting it down I got
the following output:
$ stap ~/research/profiling/examples/qemu_count.s
^Cqemu.kvm.balloon_event = 1
qemu.kvm.bdrv_aio_multiwrite = 155
qemu.kvm.bdrv_aio_readv = 13284
qemu.kvm.bdrv_aio_writev = 998
qemu.kvm.cpu_get_apic_base = 20
qemu.kvm.cpu_in = 94082
qemu.kvm.cpu_out = 165789
qemu.kvm.cpu_set_apic_base = 445752
qemu.kvm.multiwrite_cb = 654
qemu.kvm.paio_submit = 7141
qemu.kvm.qemu_free = 677704
qemu.kvm.qemu_malloc = 683031
qemu.kvm.qemu_memalign = 285
qemu.kvm.qemu_realloc = 47550
qemu.kvm.virtio_blk_handle_write = 504
qemu.kvm.virtio_blk_req_complete = 7146
qemu.kvm.virtio_blk_rw_complete = 7146
qemu.kvm.virtio_notify = 6574
qemu.kvm.virtio_queue_notify = 6680
qemu.kvm.virtqueue_fill = 7146
qemu.kvm.virtqueue_flush = 7146
qemu.kvm.virtqueue_pop = 7147
qemu.kvm.vm_state_notify = 1
See a lot of qemu.kvm.qemu_malloc. This is likely more than systemtap can track
if there are thousands of them live at the same time. There are no qemu_vmalloc
events because of https://bugzilla.redhat.com/show_bug.cgi?id=714773.
Should the qemu.kvm.cpu_in and qemu.kvm.cpu_out match up? There are a lot more
qemu.kvm.cpu_out than qemu.kvm.cpu_in count.
See that qemu.kvm.virtio_blk_req_complete, qemu.kvm.virtio_blk_rw_complete,
qemu.kvm.virtqueue_fill, and qemu.kvm.virtqueue_flush all have the same count,
7146. The qemu.kvm.virtqueue_pop is close, at 7147.
-Will
- [Qemu-devel] Using the qemu tracepoints with SystemTap, William Cohen, 2011/09/12
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, Stefan Hajnoczi, 2011/09/13
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap,
William Cohen <=
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, William Cohen, 2011/09/13
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, Stefan Hajnoczi, 2011/09/14
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, Frank Ch. Eigler, 2011/09/14
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, William Cohen, 2011/09/14
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, Stefan Hajnoczi, 2011/09/15
- Re: [Qemu-devel] Using the qemu tracepoints with SystemTap, Stefan Hajnoczi, 2011/09/14