Am 24.04.2017 um 18:27 schrieb Anton Nefedov <address@hidden>:
On 04/21/2017 03:37 PM, Peter Lieven wrote:
Am 21.04.2017 um 14:19 schrieb Anton Nefedov:
On 04/21/2017 01:44 PM, Peter Lieven wrote:
Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
On error path (like i/o error in one of the coroutines), it's required to
- wait for coroutines completion before cleaning the common structures
- reenter dependent coroutines so they ever finish
Introduced in 2d9187bc65.
Signed-off-by: Anton Nefedov <address@hidden>
---
[..]
And what if we error out in the read path? Wouldn't be something like this
easier?
diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
main_loop_wait(false);
}
+ /* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+ if (s->ret) {
+ for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+ }
+ }
+
if (s->compressed && !s->ret) {
/* signal EOF to align */
ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
Peter
seemed a bit too daring to me to re-enter every coroutine potentially including
the ones that yielded waiting for I/O completion.
If that's ok - that is for sure easier :)
I think we should enter every coroutine that is still running and have it
terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes your test
cases that triggered the bug?
hi, sorry I'm late with the answer
this segfaults in bdrv_close(). Looks like it tries to finish some i/o which
coroutine we have already entered and terminated?
(gdb) run
Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2
./harddisk.hdd.c ./harddisk.hdd
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeac2d700 (LWP 436020)]
[New Thread 0x7fffe4ed6700 (LWP 436021)]
qemu-img: error while reading sector 20480: Input/output error
qemu-img: error while writing sector 19712: Operation now in progress
Program received signal SIGSEGV, Segmentation fault.
aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
454 ctx = atomic_read(&co->ctx);
(gdb) bt
#0 aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
/* [Anton]: thread_pool_co_cb () here */
#1 0x0000555555634629 in thread_pool_completion_bh (opaque=0x555555cfe020) at
/mnt/code/us-qemu/util/thread-pool.c:189
#2 0x0000555555633b31 in aio_bh_call (bh=0x555555cfe0f0) at
/mnt/code/us-qemu/util/async.c:90
#3 aio_bh_poll (address@hidden) at /mnt/code/us-qemu/util/async.c:118
#4 0x0000555555636f14 in aio_poll (address@hidden, blocking=<optimized out>)
at /mnt/code/us-qemu/util/aio-posix.c:682
#5 0x00005555555c52d4 in bdrv_drain_recurse (address@hidden) at
/mnt/code/us-qemu/block/io.c:164
#6 0x00005555555c5aed in bdrv_drained_begin (address@hidden) at
/mnt/code/us-qemu/block/io.c:248
#7 0x0000555555581443 in bdrv_close (bs=0x555555d22560) at
/mnt/code/us-qemu/block.c:2909
#8 bdrv_delete (bs=0x555555d22560) at /mnt/code/us-qemu/block.c:3100
#9 bdrv_unref (bs=0x555555d22560) at /mnt/code/us-qemu/block.c:4087
#10 0x00005555555baf44 in blk_remove_bs (address@hidden) at
/mnt/code/us-qemu/block/block-backend.c:552
#11 0x00005555555bb173 in blk_delete (blk=0x555555d22380) at
/mnt/code/us-qemu/block/block-backend.c:238
#12 blk_unref (address@hidden) at /mnt/code/us-qemu/block/block-backend.c:282
#13 0x000055555557a22c in img_convert (argc=<optimized out>, argv=<optimized
out>) at /mnt/code/us-qemu/qemu-img.c:2359
#14 0x0000555555574189 in main (argc=5, argv=0x7fffffffe4a0) at
/mnt/code/us-qemu/qemu-img.c:4464
Peter
/Anton