qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 3/8] block: introduce a lock to protect graph operatio


From: Emanuele Giuseppe Esposito
Subject: Re: [RFC PATCH v2 3/8] block: introduce a lock to protect graph operations
Date: Mon, 2 May 2022 09:54:14 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


Am 30/04/2022 um 07:48 schrieb Stefan Hajnoczi:
> On Fri, Apr 29, 2022 at 10:37:54AM +0200, Emanuele Giuseppe Esposito wrote:
>> Am 28/04/2022 um 15:45 schrieb Stefan Hajnoczi:
>>> On Tue, Apr 26, 2022 at 04:51:09AM -0400, Emanuele Giuseppe Esposito wrote:
>>>> +static int has_writer;
>>>
>>> bool?
>>
>> Yes and no. With the latest findings and current implementation we could
>> have something like:
>>
>> wrlock()
>>      has_writer = 1
>>      AIO_WAIT_WHILE(reader_count >=1) --> job_exit()
>>                                              wrlock()
>>
>> But we are planning to get rid of AIO_WAIT_WHILE and allow wrlock to
>> only run in coroutines. This requires a lot of changes, and switch a lot
>> of callbacks in coroutines, but then we would avoid having such problems
>> and nested event loops.
> 
> I don't understand how this answer is related to the question about
> whether the type of has_writer should be bool?

Yes sorry I did not conclude the explanation, but taking into account
the above case we would have an assertion failure `assert(!has_writer)`
in bdrv_graph_wrlock(), and just removing that would make the lock
inconsistent because the first unlock() would reset the flag to
zero/false and forget about the previous wrlock().
Example:

wrlock()
        has_writer = 1
        AIO_WAIT_WHILE(reader_count >=1) --> job_exit()
                                                wrlock()
                                                        has_writer = 1
                                                /* performs a write */
                                                wrunlock()
                                                        has_writer = 0
                                        <---
        /* performs a write but has_writer = 0! */
> 
>>> How can rd be negative, it's uint32_t? If AioContext->reader_count can
>>> be negative then please use a signed type.
>>
>> It's just "conceptually negative" while summing. The result is
>> guaranteed to be >= 0, otherwise we have a problem.
>>
>> For example, we could have the following AioContext counters:
>> A1: -5 A2: -4 A3: 10
>>
>> rd variable below could become negative while looping, but we read it
>> only once we finish reading all counters, so it will always be >= 0.
> 
> AioContext->reader_count is uint32_t but can hold negative values. It
> should be int32_t.
> 
> IMO even rd should be int32_t so it's clear that it will hold negative
> values, even temporarily.
> 
> The return value of reader_count() should be uint32_t because it's
> always a positive value.
> 
> That way the types express what is going on clearly.

Makes sense

Emanuele

> 
>>>
>>>> +            aio_wait_kick();
>>>> +            qemu_co_queue_wait(&exclusive_resume, &aio_context_list_lock);
>>>
>>> Why loop here instead of incrementing reader_count and then returning?
>>> Readers cannot starve writers but writers can starve readers?
>>
>> Not sure what you mean here. Why returning?
> 
> It was a misconception on my part. Looping is necessary. Somehow I
> thought that since we have aio_context_list_lock when we awake,
> has_writer cannot be 1 but that's incorrect.
> 
>>
>>>
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Mark bs as not reading anymore, and release pending exclusive ops.  */
>>>> +void coroutine_fn bdrv_graph_co_rdunlock(void)
>>>> +{
>>>> +    AioContext *aiocontext;
>>>> +    aiocontext = qemu_get_current_aio_context();
>>>> +
>>>> +    qatomic_store_release(&aiocontext->reader_count,
>>>> +                          aiocontext->reader_count - 1);
>>>
>>> This is the point where reader_count can go negative if the coroutine
>>> was created in another thread. I think the type of reader_count should
>>> be signed.
>>
>> I think as long as we don't read it as a single, there's no problem
> 
> There is no problem with the program's behavior, two's complement means
> unsigned integer operations produce the same result as signed integer
> operations.
> 
> The issue is clarity: types should communicate the nature of the values
> held in a variable. If someone takes a look at the struct definition
> they will not know that ->reader_count is used to hold negative values.
> That can lead to misunderstandings and bugs in the future.
> 
> Stefan
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]