[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Segfaults in chardev due to races
From: |
Max Reitz |
Subject: |
[Qemu-devel] Segfaults in chardev due to races |
Date: |
Fri, 21 Dec 2018 23:31:03 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
Hi,
While fixing iotest 147 (which currently cannot be run in parallel),
I've noticed that after it's fixed (and does run in parallel), I see
some segfaults in chardev/char-socket.c.
They usually look like this:
#0 0x0000556f3b5765d0 in object_get_class (address@hidden) at
qom/object.c:822
#1 0x0000556f3b6162a5 in qio_channel_writev_full (ioc=0x0,
iov=0x7fffc8bc9aa0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
#2 0x0000556f3b6056b4 in io_channel_send_full (ioc=0x0,
address@hidden, address@hidden, fds=0x0, nfds=0) at
chardev/char-io.c:123
#3 0x0000556f3b60a50b in tcp_chr_write
(chr=0x556f3cfa8600, buf=0x7fd3a4006be0 "{\"timestamp\":
{\"seconds\": 1545429461, \"microseconds\": 104325}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138)
at chardev/char-socket.c:135
#4 0x0000556f3b602440 in qemu_chr_write_buffer
(address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545429461, \"microseconds\": 104325},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138, address@hidden,
write_all=false) at chardev/char.c:112
#5 0x0000556f3b6026ef in qemu_chr_write
(s=0x556f3cfa8600, address@hidden "{\"timestamp\":
{\"seconds\": 1545429461, \"microseconds\": 104325}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>,
address@hidden) at chardev/char.c:147
#6 0x0000556f3b6046af in qemu_chr_fe_write
(address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545429461, \"microseconds\": 104325},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>) at chardev/char-fe.c:42
#7 0x0000556f3b2513d8 in monitor_flush_locked
(address@hidden) at monitor.c:406
#8 0x0000556f3b2515a9 in monitor_flush_locked (mon=0x556f3cece140) at
monitor.c:449
#9 0x0000556f3b2515a9 in monitor_puts (address@hidden,
str=0x556f3de59919 "") at monitor.c:449
#10 0x0000556f3b2515ef in qmp_send_response (mon=0x556f3cece140,
rsp=<optimized out>) at monitor.c:498
#11 0x0000556f3b2517ea in monitor_qapi_event_emit
(address@hidden,
address@hidden) at monitor.c:526
#12 0x0000556f3b251b22 in monitor_qapi_event_queue_no_reenter
(qdict=0x556f3e4613e0, event=QAPI_EVENT_SHUTDOWN) at monitor.c:551
#13 0x0000556f3b251b22 in monitor_qapi_event_queue
(event=QAPI_EVENT_SHUTDOWN, qdict=0x0) at monitor.c:627
#14 0x0000556f3b251b22 in monitor_qapi_event_queue
(address@hidden,
address@hidden) at monitor.c:594
#15 0x0000556f3b6563f1 in qapi_event_send_shutdown (guest=<optimized
out>, reason=<optimized out>) at qapi/qapi-events-run-state.c:49
#16 0x0000556f3b37ac07 in main_loop_should_exit () at
include/sysemu/sysemu.h:36
#17 0x0000556f3b37ac07 in main_loop () at vl.c:1909
#18 0x0000556f3b207735 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4661
Investigating, I saw that sometimes tcp_chr_write() is invoked in
parallel with tcp_chr_free_connection() -- from two different
AioContexts (and two different pthreads). s->ioc is set to NULL before
s->connected is set to 0, and because these run concurrently,
tcp_chr_write() sees the s->connected as 1 and then fetches the
already-NULL s->ioc which causes the segfault.
The issue is more easily reproducible by inserting the following into
tcp_chr_free_connection() above "s->connected = 0;":
for (volatile int i = 0; i < 1000000; i++) {
__asm__ __volatile__ ("" ::: "memory");
}
Then, run this command line:
$ x86_64-softmmu/qemu-system-x86_64 -qmp tcp:localhost:4242,server
And this script:
#!/usr/bin/ruby
require 'socket'
c = TCPSocket.new('localhost', 4242)
c.readline
c.puts("{'execute':'qmp_capabilities'}")
c.readline
c.puts("{'execute':'quit'}")
c.close
(The above backtrace was generated this way because for some reason,
after everything, I couldn't see the segfaults in 147 any more. Huh.
But anyway, the loop I added does not yield a coroutine or anything like
that, it just enlarges the race window, so I think it still is a real
issue.)
Now I thought it would suffice to just pull "s->disconnected = 0;" up to
the start of tcp_chr_free_connection(). But first of all I wasn't very
fond of that solution, because tcp_chr_write() says with an XXX comment
that seeing !s->disconnected is actually bad.
More importantly, though, once I did that, another segfault appeared
(again from running iotest 147 in parallel):
(gdb) bt
#0 0x00005629e4c7b29f in object_dynamic_cast_assert
(obj=0x5629e6847690, address@hidden
"qio-channel-socket", address@hidden
"io/channel-socket.c", address@hidden,
address@hidden <__func__.20946>
"qio_channel_socket_writev") at qom/object.c:686
#1 0x00005629e4d1ded2 in qio_channel_socket_writev (ioc=<optimized
out>, iov=0x7ffe36e71c40, niov=1, fds=0x0, nfds=0, errp=0x0) at
io/channel-socket.c:512
#2 0x00005629e4d0a6b4 in io_channel_send_full (ioc=0x5629e6847690,
address@hidden, address@hidden, fds=0x0, nfds=0) at
chardev/char-io.c:123
#3 0x00005629e4d0f4cb in tcp_chr_write
(chr=0x5629e67dced0, buf=0x7f7ae4006a30 "{\"timestamp\":
{\"seconds\": 1545428667, \"microseconds\": 895334}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138)
at chardev/char-socket.c:135
#4 0x00005629e4d07440 in qemu_chr_write_buffer
(address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545428667, \"microseconds\": 895334},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138, address@hidden,
write_all=false) at chardev/char.c:112
#5 0x00005629e4d076ef in qemu_chr_write
(s=0x5629e67dced0, address@hidden "{\"timestamp\":
{\"seconds\": 1545428667, \"microseconds\": 895334}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>,
address@hidden) at chardev/char.c:147
#6 0x00005629e4d096af in qemu_chr_fe_write
(address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545428667, \"microseconds\": 895334},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>) at chardev/char-fe.c:42
#7 0x00005629e49563d8 in monitor_flush_locked
(address@hidden) at monitor.c:406
#8 0x00005629e49565a9 in monitor_flush_locked (mon=0x5629e681da90) at
monitor.c:449
#9 0x00005629e49565a9 in monitor_puts (address@hidden,
str=0x5629e768c7f9 "") at monitor.c:449
#10 0x00005629e49565ef in qmp_send_response (mon=0x5629e681da90,
rsp=<optimized out>) at monitor.c:498
#11 0x00005629e49567ea in monitor_qapi_event_emit
(address@hidden,
address@hidden) at monitor.c:526
#12 0x00005629e4956b22 in monitor_qapi_event_queue_no_reenter
(qdict=0x5629e79825f0, event=QAPI_EVENT_SHUTDOWN) at monitor.c:551
#13 0x00005629e4956b22 in monitor_qapi_event_queue
(event=QAPI_EVENT_SHUTDOWN, qdict=0x0) at monitor.c:627
#14 0x00005629e4956b22 in monitor_qapi_event_queue
(address@hidden,
address@hidden) at monitor.c:594
#15 0x00005629e4d5b3b1 in qapi_event_send_shutdown (guest=<optimized
out>, reason=<optimized out>) at qapi/qapi-events-run-state.c:49
#16 0x00005629e4a7fc07 in main_loop_should_exit () at
include/sysemu/sysemu.h:36
#17 0x00005629e4a7fc07 in main_loop () at vl.c:1909
#18 0x00005629e490c735 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4661
At this point, I decided that I've ventured far enough. (Especially
since it's Dec 21 and 11 pm and all that.)
I suppose the issue is that QMP events are sent by one thread, and
client disconnects are handled by a different one. So if a QMP event is
sent while a client disconnects concurrently, races may occur; and the
only protection against concurrent access appears to be the
chr_write_lock, which I don't think is enough.
Max
signature.asc
Description: OpenPGP digital signature
- [Qemu-devel] Segfaults in chardev due to races,
Max Reitz <=