[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
From: |
Anton Nefedov |
Subject: |
Re: [Qemu-block] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations |
Date: |
Wed, 17 Oct 2018 15:32:14 +0000 |
On 8/10/2018 7:43 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 18:04 hat Anton Nefedov geschrieben:
>>
>>
>> On 8/10/2018 6:46 PM, Kevin Wolf wrote:
>>> Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
>>>>
>>>>
>>>> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
>>>>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>>>>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>>>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>>>>>> Signed-off-by: Anton Nefedov <address@hidden>
>>>>>>>> Reviewed-by: Alberto Garcia <address@hidden>
>>>>>>>> ---
>>>>>>>> hw/ide/core.c | 12 ++++++++++++
>>>>>>>> 1 file changed, 12 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>>>>>> index 2c62efc..352429b 100644
>>>>>>>> --- a/hw/ide/core.c
>>>>>>>> +++ b/hw/ide/core.c
>>>>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int
>>>>>>>> ret)
>>>>>>>> TrimAIOCB *iocb = opaque;
>>>>>>>> IDEState *s = iocb->s;
>>>>>>>>
>>>>>>>> + if (iocb->i >= 0) {
>>>>>>>> + if (ret >= 0) {
>>>>>>>> + block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>>> + } else {
>>>>>>>> + block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> if (ret >= 0) {
>>>>>>>> while (iocb->j < iocb->qiov->niov) {
>>>>>>>> int j = iocb->j;
>>>>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int
>>>>>>>> ret)
>>>>>>>> goto done;
>>>>>>>> }
>>>>>>>>
>>>>>>>> + block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>>> + count << BDRV_SECTOR_BITS,
>>>>>>>> BLOCK_ACCT_UNMAP);
>>>>>>>> +
>>>>>>>> /* Got an entry! Submit and exit. */
>>>>>>>> iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>>>>> sector <<
>>>>>>>> BDRV_SECTOR_BITS,
>>>>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (ret == -EINVAL) {
>>>>>>>> + block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>>>>>
>>>>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>>>>>> also for reads and writes, and each of them could return -EINVAL.
>>>>>>>
>>>>>>
>>>>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>>>>>
>>>>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>>>>>> something wrong, it could also be the result of a host problem.
>>>>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>>>>>> since the request may already have been accounted for as either done or
>>>>>>> failed in ide_issue_trim_cb().
>>>>>>>
>>>>>>
>>>>>> Couldn't be accounted done with such retcode;
>>>>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>>>>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>>>>>
>>>>>> But if EINVAL (from further layers) should not be accounted as an
>>>>>> invalid op, then it should be accounted failed instead, the thing that
>>>>>> current code does not do.
>>>>>> (and which brings us back to possible double-accounting if we account
>>>>>> invalid in ide_issue_trim_cb() )
>>>>>
>>>>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
>>>>> only one possible source for -EINVAL.
>>>>>
>>>>>>> Instead, I think it would be better to immediately account for invalid
>>>>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>>>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>>>>>> -EINVAL return code.
>>>>>>>
>>>>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>>>>>> acct_failed there, and filter off TRIM commands in the common
>>>>>> accounting.
>>>>>
>>>>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
>>>>> from a TRIM command doesn't mean anything. It can still have multiple
>>>>> possible sources.
>>>>>
>>>>
>>>> I meant that common ide_dma_cb() should account EINVAL (along with other
>>>> errors) as failed, unless it's TRIM, which means it's already
>>>> accounted (either invalid or failed)
>>>
>>> Oh, you would already account for failure in ide_issue_trim_cb(), too,
>>> but only if it's EINVAL? That feels like it would complicate the code
>>> quite a bit.
>>>
>>
>> No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
>> for TRIM.
>> Then common path (ide_dma_cb()) does not account TRIM operations at all
>> regardless of the error code. No need to check for TRIM specifically if
>> we have BLOCK_ACCT_NONE.
>>
>>> And actually, didn't commit caeadbc8ba4 break werror=stop for requests
>>> returning -EINVAL because we don't call ide_handle_rw_error() any more?
>>>
>>
>> Yes.
>>
>> Read/write ignore werror=stop for invalid range case (not for EINVAL).
>> I wonder if it's crucial to ignore it for TRIM too, otherwise we could
>> just remove this chunk
>>
>> if (ret == -EINVAL) {
>> ide_dma_error(s);
>> return;
>> }
>
> Ah, right, I forgot about this.
>
> It is actually desirable to avoid stopping for invalid requests because
> we should only stop for host errors. An invalid request is a guest error
> and stopping the VM will do nothing to fix it. (Resuming the VM would
> immediately fail again, so the VM would be locked in paused state.)
>
> Kevin
>
I can't come up with a perfect solution of this.
It's hard to reach TRIM sector-ranges from common ide_dma_cb() (which
only has QEMUSGList) and it's hard to accurately report range invalidity
to ide_dma_cb() with a single retval.
I see the following options for invalid range trim:
1. distinguish range invalidity in ide_dma_cb() with some unused
errcode instead of -EINVAL
- does one exist?
2. use some new global flag on IDEState
- every callback (ide_dma_cb, pmac_ide_transfer_cb) must check
and clear that
3. parse SGList and check range invalidity separately in ide_dma_cb()
- somewhat too intrusive change
or put up with it:
4. ignore (account as invalid but do not abort)
5. treat as a host error i.e. call ide_handle_rw_error() and stop VM
if werror=stop
6. keep -EINVAL (as done now) and potentially ignore configured error
action for the host returned -EINVAL
What do you think?
/Anton