|
From: | Jason Wang |
Subject: | Re: [Qemu-stable] [Qemu-devel] [PATCH for 2.11] virtio-net: don't touch virtqueue if vm is stopped |
Date: | Mon, 27 Nov 2017 11:26:34 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
On 2017年11月24日 18:44, Stefan Hajnoczi wrote:
On Fri, Nov 24, 2017 at 10:57:11AM +0800, Jason Wang wrote:On 2017年11月23日 18:59, Stefan Hajnoczi wrote:On Thu, Nov 23, 2017 at 11:37:46AM +0800, Jason Wang wrote:Guest state should not be touched if VM is stopped, unfortunately we didn't check running state and tried to drain tx queue unconditionally in virtio_net_set_status(). A crash was then noticed as a migration destination when user type quit after virtqueue state is loaded but before region cache is initialized. In this case, virtio_net_drop_tx_queue_data() tries to access the uninitialized region cache. Fix this by only dropping tx queue data when vm is running.hw/virtio/virtio.c:virtio_load() does the following: for (i = 0; i < num; i++) { if (vdev->vq[i].vring.desc) { uint16_t nheads; /* * VIRTIO-1 devices migrate desc, used, and avail ring addresses so * only the region cache needs to be set up. Legacy devices need * to calculate used and avail ring addresses based on the desc * address. */ if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { virtio_init_region_cache(vdev, i); } else { virtio_queue_update_rings(vdev, i); } So the region caches should be initialized after virtqueue state is loaded. It's unclear to me which code path triggers this issue. Can you add a backtrace or an explanation? Thanks, StefanMigration coroutine was yield before region cache was initialized. The backtrace looks like:[...]#16 0x0000555555b1c199 in vmstate_load_state (f=0x555556f7c010, vmsd=0x5555562b8160 <vmstate_virtio>, opaque=0x555557d68610, version_id=1) at migration/vmstate.c:160 #17 0x0000555555865cc3 in virtio_load (vdev=0x555557d68610, f=0x555556f7c010, version_id=11) at /home/devel/git/qemu/hw/virtio/virtio.c:2110Reviewed-by: Stefan Hajnoczi <address@hidden> Thanks for the backtrace! Your patch is fine but I have a larger concern: The backtrace shows that the virtio code is re-entrant during savevm load. That's probably a bad thing because set_status() and other APIs are probably not intended to run while we are half-way through savevm load. The virtqueue is only partially set up at this point :(. I wonder if a more general cleanup is necessary to avoid problems like this in the future... Stefan
Yes, this needs some thought. An idea is to guarantee the atomicity of the virtio state and don't expose partial state. But looks like this needs lots of changes.
Anyway, I will apply this patch first. Thanks
[Prev in Thread] | Current Thread | [Next in Thread] |