qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [PATCH v2 1/1] qemu-img: wait for convert coroutines t


From: Peter Lieven
Subject: Re: [Qemu-stable] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete
Date: Mon, 24 Apr 2017 20:16:56 +0200


> Am 24.04.2017 um 18:27 schrieb Anton Nefedov <address@hidden>:
> 
>> On 04/21/2017 03:37 PM, Peter Lieven wrote:
>>> Am 21.04.2017 um 14:19 schrieb Anton Nefedov:
>>>> On 04/21/2017 01:44 PM, Peter Lieven wrote:
>>>>> Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
>>>>> On error path (like i/o error in one of the coroutines), it's required to
>>>>>  - wait for coroutines completion before cleaning the common structures
>>>>>  - reenter dependent coroutines so they ever finish
>>>>> 
>>>>> Introduced in 2d9187bc65.
>>>>> 
>>>>> Signed-off-by: Anton Nefedov <address@hidden>
>>>>> ---
>>>>> [..]
>>>>> 
>>>> 
>>>> 
>>>> And what if we error out in the read path? Wouldn't be something like this 
>>>> easier?
>>>> 
>>>> 
>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>> index 22f559a..4ff1085 100644
>>>> --- a/qemu-img.c
>>>> +++ b/qemu-img.c
>>>> @@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
>>>>         main_loop_wait(false);
>>>>     }
>>>> 
>>>> +    /* on error path we need to enter all coroutines that are still
>>>> +     * running before cleaning up common structures */
>>>> +    if (s->ret) {
>>>> +        for (i = 0; i < s->num_coroutines; i++) {
>>>> +             if (s->co[i]) {
>>>> +                 qemu_coroutine_enter(s->co[i]);
>>>> +             }
>>>> +        }
>>>> +    }
>>>> +
>>>>     if (s->compressed && !s->ret) {
>>>>         /* signal EOF to align */
>>>>         ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
>>>> 
>>>> 
>>>> Peter
>>>> 
>>> 
>>> seemed a bit too daring to me to re-enter every coroutine potentially 
>>> including the ones that yielded waiting for I/O completion.
>>> If that's ok - that is for sure easier :)
>> 
>> I think we should enter every coroutine that is still running and have it 
>> terminate correctly. It was my mistake that I have not
>> done this in the original patch. Can you check if the above fixes your test 
>> cases that triggered the bug?
>> 
> 
> hi, sorry I'm late with the answer
> 
> this segfaults in bdrv_close(). Looks like it tries to finish some i/o which 
> coroutine we have already entered and terminated?
> 
> (gdb) run
> Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2 
> ./harddisk.hdd.c ./harddisk.hdd
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7fffeac2d700 (LWP 436020)]
> [New Thread 0x7fffe4ed6700 (LWP 436021)]
> qemu-img: error while reading sector 20480: Input/output error
> qemu-img: error while writing sector 19712: Operation now in progress
> 
> Program received signal SIGSEGV, Segmentation fault.
> aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
> 454         ctx = atomic_read(&co->ctx);
> (gdb) bt
> #0  aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
> /* [Anton]: thread_pool_co_cb () here */
> #1  0x0000555555634629 in thread_pool_completion_bh (opaque=0x555555cfe020) 
> at /mnt/code/us-qemu/util/thread-pool.c:189
> #2  0x0000555555633b31 in aio_bh_call (bh=0x555555cfe0f0) at 
> /mnt/code/us-qemu/util/async.c:90
> #3  aio_bh_poll (address@hidden) at /mnt/code/us-qemu/util/async.c:118
> #4  0x0000555555636f14 in aio_poll (address@hidden, blocking=<optimized out>) 
> at /mnt/code/us-qemu/util/aio-posix.c:682
> #5  0x00005555555c52d4 in bdrv_drain_recurse (address@hidden) at 
> /mnt/code/us-qemu/block/io.c:164
> #6  0x00005555555c5aed in bdrv_drained_begin (address@hidden) at 
> /mnt/code/us-qemu/block/io.c:248
> #7  0x0000555555581443 in bdrv_close (bs=0x555555d22560) at 
> /mnt/code/us-qemu/block.c:2909
> #8  bdrv_delete (bs=0x555555d22560) at /mnt/code/us-qemu/block.c:3100
> #9  bdrv_unref (bs=0x555555d22560) at /mnt/code/us-qemu/block.c:4087
> #10 0x00005555555baf44 in blk_remove_bs (address@hidden) at 
> /mnt/code/us-qemu/block/block-backend.c:552
> #11 0x00005555555bb173 in blk_delete (blk=0x555555d22380) at 
> /mnt/code/us-qemu/block/block-backend.c:238
> #12 blk_unref (address@hidden) at /mnt/code/us-qemu/block/block-backend.c:282
> #13 0x000055555557a22c in img_convert (argc=<optimized out>, argv=<optimized 
> out>) at /mnt/code/us-qemu/qemu-img.c:2359
> #14 0x0000555555574189 in main (argc=5, argv=0x7fffffffe4a0) at 
> /mnt/code/us-qemu/qemu-img.c:4464
> 
> 
>> Peter
>> 
> 
> /Anton
> 

it seems that this is a bit tricky, can you share how your test case works?

thanks,
peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]