[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] hw/nvme: fix handling of over-committed queues
From: |
Klaus Jensen |
Subject: |
Re: [PATCH v2] hw/nvme: fix handling of over-committed queues |
Date: |
Mon, 4 Nov 2024 19:15:37 +0100 |
On Oct 29 13:15, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
>
> If a host chooses to use the SQHD "hint" in the CQE to know if there is
> room in the submission queue for additional commands, it may result in a
> situation where there are not enough internal resources (struct
> NvmeRequest) available to process the command. For a lack of a better
> term, the host may "over-commit" the device (i.e., it may have more
> inflight commands than the queue size).
>
> For example, assume a queue with N entries. The host submits N commands
> and all are picked up for processing, advancing the head and emptying
> the queue. Regardless of which of these N commands complete first, the
> SQHD field of that CQE will indicate to the host that the queue is
> empty, which allows the host to issue N commands again. However, if the
> device has not posted CQEs for all the previous commands yet, the device
> will have less than N resources available to process the commands, so
> queue processing is suspended.
>
> And here lies an 11 year latent bug. In the absense of any additional
> tail updates on the submission queue, we never schedule the processing
> bottom-half again unless we observe a head update on an associated full
> completion queue. This has been sufficient to handle N-to-1 SQ/CQ setups
> (in the absense of over-commit of course). Incidentially, that "kick all
> associated SQs" mechanism can now be killed since we now just schedule
> queue processing when we return a processing resource to a non-empty
> submission queue, which happens to cover both edge cases. However, we
> must retain kicking the CQ if it was previously full.
>
> So, apparently, no previous driver tested with hw/nvme has ever used
> SQHD (e.g., neither the Linux NVMe driver or SPDK uses it). But then OSv
> shows up with the driver that actually does. I salute you.
>
> Fixes: f3c507adcd7b ("NVMe: Initial commit for new storage interface")
> Cc: qemu-stable@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2388
> Reported-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
> Changes in v2:
> - Retain cq kick on previously full queue
> - Link to v1:
> https://lore.kernel.org/r/20241025-issue-2388-v1-1-16707e0d3342@samsung.com
> ---
> hw/nvme/ctrl.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index
> f4e89203c1a6e3b051fd7185cbf01ec9bae9684a..1185455a94c4af43a39708b1b97dba9624fc7ad3
> 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -1520,9 +1520,16 @@ static void nvme_post_cqes(void *opaque)
> stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
> break;
> }
> +
> QTAILQ_REMOVE(&cq->req_list, req, entry);
> +
> nvme_inc_cq_tail(cq);
> nvme_sg_unmap(&req->sg);
> +
> + if (QTAILQ_EMPTY(&sq->req_list) && !nvme_sq_empty(sq)) {
> + qemu_bh_schedule(sq->bh);
> + }
> +
> QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
> }
> if (cq->tail != cq->head) {
> @@ -7950,7 +7957,6 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr,
> int val)
> /* Completion queue doorbell write */
>
> uint16_t new_head = val & 0xffff;
> - int start_sqs;
> NvmeCQueue *cq;
>
> qid = (addr - (0x1000 + (1 << 2))) >> 3;
> @@ -8001,19 +8007,16 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr,
> int val)
>
> trace_pci_nvme_mmio_doorbell_cq(cq->cqid, new_head);
>
> - start_sqs = nvme_cq_full(cq) ? 1 : 0;
> - cq->head = new_head;
> - if (!qid && n->dbbuf_enabled) {
> - stl_le_pci_dma(pci, cq->db_addr, cq->head,
> MEMTXATTRS_UNSPECIFIED);
> - }
> - if (start_sqs) {
> - NvmeSQueue *sq;
> - QTAILQ_FOREACH(sq, &cq->sq_list, entry) {
> - qemu_bh_schedule(sq->bh);
> - }
> + /* scheduled deferred cqe posting if queue was previously full */
> + if (nvme_cq_full(cq)) {
> qemu_bh_schedule(cq->bh);
> }
>
> + cq->head = new_head;
> + if (!qid && n->dbbuf_enabled) {
> + stl_le_pci_dma(pci, cq->db_addr, cq->head,
> MEMTXATTRS_UNSPECIFIED);
> + }
> +
> if (cq->tail == cq->head) {
> if (cq->irq_enabled) {
> n->cq_pending--;
>
> ---
> base-commit: fdf250e5a37830615e324017cb3a503e84b3712c
> change-id: 20241025-issue-2388-bd047487f74c
>
> Best regards,
> --
> Klaus Jensen <k.jensen@samsung.com>
>
Ping. Tested, but would appreciate a review ;)
signature.asc
Description: PGP signature
- Re: [PATCH v2] hw/nvme: fix handling of over-committed queues,
Klaus Jensen <=