[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-stable] [PATCH 53/67] block: Pass unaligned discard requests to dr
From: |
Michael Roth |
Subject: |
[Qemu-stable] [PATCH 53/67] block: Pass unaligned discard requests to drivers |
Date: |
Wed, 14 Dec 2016 18:44:47 -0600 |
From: Eric Blake <address@hidden>
Discard is advisory, so rounding the requests to alignment
boundaries is never semantically wrong from the data that
the guest sees. But at least the Dell Equallogic iSCSI SANs
has an interesting property that its advertised discard
alignment is 15M, yet documents that discarding a sequence
of 1M slices will eventually result in the 15M page being
marked as discarded, and it is possible to observe which
pages have been discarded.
Between commits 9f1963b and b8d0a980, we converted the block
layer to a byte-based interface that ultimately ignores any
unaligned head or tail based on the driver's advertised
discard granularity, which means that qemu 2.7 refuses to
pass any discard request smaller than 15M down to the Dell
Equallogic hardware. This is a slight regression in behavior
compared to earlier qemu, where a guest executing discards
in power-of-2 chunks used to be able to get every page
discarded, but is now left with various pages still allocated
because the guest requests did not align with the hardware's
15M pages.
Since the SCSI specification says nothing about a minimum
discard granularity, and only documents the preferred
alignment, it is best if the block layer gives the driver
every bit of information about discard requests, rather than
rounding it to alignment boundaries early.
Rework the block layer discard algorithm to mirror the write
zero algorithm: always peel off any unaligned head or tail
and manage that in isolation, then do the bulk of the request
on an aligned boundary. The fallback when the driver returns
-ENOTSUP for an unaligned request is to silently ignore that
portion of the discard request; but for devices that can pass
the partial request all the way down to hardware, this can
result in the hardware coalescing requests and discarding
aligned pages after all.
Reported by: Peter Lieven <address@hidden>
CC: address@hidden
Signed-off-by: Eric Blake <address@hidden>
Reviewed-by: Max Reitz <address@hidden>
Signed-off-by: Kevin Wolf <address@hidden>
(cherry picked from commit 3482b9bc411a9a12b2efde1018e1ddc906cd817e)
Signed-off-by: Michael Roth <address@hidden>
---
block/io.c | 45 ++++++++++++++++++++++++++++++++-------------
1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/block/io.c b/block/io.c
index 959e140..5147080 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2437,7 +2437,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs,
int64_t offset,
{
BdrvTrackedRequest req;
int max_pdiscard, ret;
- int head, align;
+ int head, tail, align;
if (!bs->drv) {
return -ENOMEDIUM;
@@ -2460,19 +2460,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs,
int64_t offset,
return 0;
}
- /* Discard is advisory, so ignore any unaligned head or tail */
+ /* Discard is advisory, but some devices track and coalesce
+ * unaligned requests, so we must pass everything down rather than
+ * round here. Still, most devices will just silently ignore
+ * unaligned requests (by returning -ENOTSUP), so we must fragment
+ * the request accordingly. */
align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
assert(align % bs->bl.request_alignment == 0);
head = offset % align;
- if (head) {
- head = MIN(count, align - head);
- count -= head;
- offset += head;
- }
- count = QEMU_ALIGN_DOWN(count, align);
- if (!count) {
- return 0;
- }
+ tail = (offset + count) % align;
tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD);
@@ -2483,11 +2479,34 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs,
int64_t offset,
max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX),
align);
- assert(max_pdiscard);
+ assert(max_pdiscard >= bs->bl.request_alignment);
while (count > 0) {
int ret;
- int num = MIN(count, max_pdiscard);
+ int num = count;
+
+ if (head) {
+ /* Make small requests to get to alignment boundaries. */
+ num = MIN(count, align - head);
+ if (!QEMU_IS_ALIGNED(num, bs->bl.request_alignment)) {
+ num %= bs->bl.request_alignment;
+ }
+ head = (head + num) % align;
+ assert(num < max_pdiscard);
+ } else if (tail) {
+ if (num > align) {
+ /* Shorten the request to the last aligned cluster. */
+ num -= tail;
+ } else if (!QEMU_IS_ALIGNED(tail, bs->bl.request_alignment) &&
+ tail > bs->bl.request_alignment) {
+ tail %= bs->bl.request_alignment;
+ num -= tail;
+ }
+ }
+ /* limit request size */
+ if (num > max_pdiscard) {
+ num = max_pdiscard;
+ }
if (bs->drv->bdrv_co_pdiscard) {
ret = bs->drv->bdrv_co_pdiscard(bs, offset, num);
--
1.9.1
- [Qemu-stable] [PATCH 45/67] virtio: allow per-device-class legacy features, (continued)
- [Qemu-stable] [PATCH 45/67] virtio: allow per-device-class legacy features, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 03/67] hw/ppc/spapr: Fix the selection of the processor features, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 44/67] target-ppc: Fix CPU migration from qemu-2.6 <-> later versions, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 42/67] acpi/ipmi: Initialize the fwinfo before fetching it, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 47/67] block: Don't mark node clean after failed flush, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 50/67] qcow2: Inform block layer about discard boundaries, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 46/67] virtio-net: mark VIRTIO_NET_F_GSO as legacy, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 51/67] block: Let write zeroes fallback work even with small max_transfer, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 48/67] vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 04/67] ppc: Check the availability of transactional memory, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 53/67] block: Pass unaligned discard requests to drivers,
Michael Roth <=
- [Qemu-stable] [PATCH 54/67] block/curl: Use BDRV_SECTOR_SIZE, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 55/67] block/curl: Fix return value from curl_read_cb, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 49/67] slirp: Fix access to freed memory, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 56/67] block/curl: Remember all sockets, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 58/67] vhost: drop legacy vring layout bits, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 60/67] pci-assign: sync MSI/MSI-X cap and table with PCIDevice, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 52/67] block: Return -ENOTSUP rather than assert on unaligned discards, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 63/67] vhost-user-test: Use libqos instead of pxe-virtio.rom, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 57/67] block/curl: Do not wait for data beyond EOF, Michael Roth, 2016/12/14
- [Qemu-stable] [PATCH 61/67] rules.mak: Use -r instead of -Wl, -r to fix building when PIE is default, Michael Roth, 2016/12/14