[Qemu-devel] Segfaults in chardev due to races

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Segfaults in chardev due to races

From:	Max Reitz
Subject:	[Qemu-devel] Segfaults in chardev due to races
Date:	Fri, 21 Dec 2018 23:31:03 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1

Hi,

While fixing iotest 147 (which currently cannot be run in parallel),
I've noticed that after it's fixed (and does run in parallel), I see
some segfaults in chardev/char-socket.c.

They usually look like this:

#0  0x0000556f3b5765d0 in object_get_class (address@hidden) at
qom/object.c:822
#1  0x0000556f3b6162a5 in qio_channel_writev_full (ioc=0x0,
iov=0x7fffc8bc9aa0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
#2  0x0000556f3b6056b4 in io_channel_send_full (ioc=0x0,
address@hidden, address@hidden, fds=0x0, nfds=0) at
chardev/char-io.c:123
#3  0x0000556f3b60a50b in tcp_chr_write
    (chr=0x556f3cfa8600, buf=0x7fd3a4006be0 "{\"timestamp\":
{\"seconds\": 1545429461, \"microseconds\": 104325}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138)
    at chardev/char-socket.c:135
#4  0x0000556f3b602440 in qemu_chr_write_buffer
    (address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545429461, \"microseconds\": 104325},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138, address@hidden,
write_all=false) at chardev/char.c:112
#5  0x0000556f3b6026ef in qemu_chr_write
    (s=0x556f3cfa8600, address@hidden "{\"timestamp\":
{\"seconds\": 1545429461, \"microseconds\": 104325}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>,
address@hidden) at chardev/char.c:147
#6  0x0000556f3b6046af in qemu_chr_fe_write
    (address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545429461, \"microseconds\": 104325},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>) at chardev/char-fe.c:42
#7  0x0000556f3b2513d8 in monitor_flush_locked
(address@hidden) at monitor.c:406
#8  0x0000556f3b2515a9 in monitor_flush_locked (mon=0x556f3cece140) at
monitor.c:449
#9  0x0000556f3b2515a9 in monitor_puts (address@hidden,
str=0x556f3de59919 "") at monitor.c:449
#10 0x0000556f3b2515ef in qmp_send_response (mon=0x556f3cece140,
rsp=<optimized out>) at monitor.c:498
#11 0x0000556f3b2517ea in monitor_qapi_event_emit
(address@hidden,
address@hidden) at monitor.c:526
#12 0x0000556f3b251b22 in monitor_qapi_event_queue_no_reenter
(qdict=0x556f3e4613e0, event=QAPI_EVENT_SHUTDOWN) at monitor.c:551
#13 0x0000556f3b251b22 in monitor_qapi_event_queue
(event=QAPI_EVENT_SHUTDOWN, qdict=0x0) at monitor.c:627
#14 0x0000556f3b251b22 in monitor_qapi_event_queue
(address@hidden,
address@hidden) at monitor.c:594
#15 0x0000556f3b6563f1 in qapi_event_send_shutdown (guest=<optimized
out>, reason=<optimized out>) at qapi/qapi-events-run-state.c:49
#16 0x0000556f3b37ac07 in main_loop_should_exit () at
include/sysemu/sysemu.h:36
#17 0x0000556f3b37ac07 in main_loop () at vl.c:1909
#18 0x0000556f3b207735 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4661


Investigating, I saw that sometimes tcp_chr_write() is invoked in
parallel with tcp_chr_free_connection() -- from two different
AioContexts (and two different pthreads).  s->ioc is set to NULL before
s->connected is set to 0, and because these run concurrently,
tcp_chr_write() sees the s->connected as 1 and then fetches the
already-NULL s->ioc which causes the segfault.

The issue is more easily reproducible by inserting the following into
tcp_chr_free_connection() above "s->connected = 0;":

     for (volatile int i = 0; i < 1000000; i++) {
         __asm__ __volatile__ ("" ::: "memory");

     }


Then, run this command line:

$ x86_64-softmmu/qemu-system-x86_64 -qmp tcp:localhost:4242,server

And this script:

#!/usr/bin/ruby
require 'socket'
c = TCPSocket.new('localhost', 4242)
c.readline
c.puts("{'execute':'qmp_capabilities'}")
c.readline
c.puts("{'execute':'quit'}")
c.close

(The above backtrace was generated this way because for some reason,
after everything, I couldn't see the segfaults in 147 any more.  Huh.
But anyway, the loop I added does not yield a coroutine or anything like
that, it just enlarges the race window, so I think it still is a real
issue.)

Now I thought it would suffice to just pull "s->disconnected = 0;" up to
the start of tcp_chr_free_connection().  But first of all I wasn't very
fond of that solution, because tcp_chr_write() says with an XXX comment
that seeing !s->disconnected is actually bad.

More importantly, though, once I did that, another segfault appeared
(again from running iotest 147 in parallel):

(gdb) bt
#0  0x00005629e4c7b29f in object_dynamic_cast_assert
    (obj=0x5629e6847690, address@hidden
"qio-channel-socket", address@hidden
"io/channel-socket.c", address@hidden,
address@hidden <__func__.20946>
"qio_channel_socket_writev") at qom/object.c:686
#1  0x00005629e4d1ded2 in qio_channel_socket_writev (ioc=<optimized
out>, iov=0x7ffe36e71c40, niov=1, fds=0x0, nfds=0, errp=0x0) at
io/channel-socket.c:512
#2  0x00005629e4d0a6b4 in io_channel_send_full (ioc=0x5629e6847690,
address@hidden, address@hidden, fds=0x0, nfds=0) at
chardev/char-io.c:123
#3  0x00005629e4d0f4cb in tcp_chr_write
    (chr=0x5629e67dced0, buf=0x7f7ae4006a30 "{\"timestamp\":
{\"seconds\": 1545428667, \"microseconds\": 895334}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138)
    at chardev/char-socket.c:135
#4  0x00005629e4d07440 in qemu_chr_write_buffer
    (address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545428667, \"microseconds\": 895334},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138, address@hidden,
write_all=false) at chardev/char.c:112
#5  0x00005629e4d076ef in qemu_chr_write
    (s=0x5629e67dced0, address@hidden "{\"timestamp\":
{\"seconds\": 1545428667, \"microseconds\": 895334}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>,
address@hidden) at chardev/char.c:147
#6  0x00005629e4d096af in qemu_chr_fe_write
    (address@hidden, address@hidden
"{\"timestamp\": {\"seconds\": 1545428667, \"microseconds\": 895334},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>) at chardev/char-fe.c:42
#7  0x00005629e49563d8 in monitor_flush_locked
(address@hidden) at monitor.c:406
#8  0x00005629e49565a9 in monitor_flush_locked (mon=0x5629e681da90) at
monitor.c:449
#9  0x00005629e49565a9 in monitor_puts (address@hidden,
str=0x5629e768c7f9 "") at monitor.c:449
#10 0x00005629e49565ef in qmp_send_response (mon=0x5629e681da90,
rsp=<optimized out>) at monitor.c:498
#11 0x00005629e49567ea in monitor_qapi_event_emit
(address@hidden,
address@hidden) at monitor.c:526
#12 0x00005629e4956b22 in monitor_qapi_event_queue_no_reenter
(qdict=0x5629e79825f0, event=QAPI_EVENT_SHUTDOWN) at monitor.c:551
#13 0x00005629e4956b22 in monitor_qapi_event_queue
(event=QAPI_EVENT_SHUTDOWN, qdict=0x0) at monitor.c:627
#14 0x00005629e4956b22 in monitor_qapi_event_queue
(address@hidden,
address@hidden) at monitor.c:594
#15 0x00005629e4d5b3b1 in qapi_event_send_shutdown (guest=<optimized
out>, reason=<optimized out>) at qapi/qapi-events-run-state.c:49
#16 0x00005629e4a7fc07 in main_loop_should_exit () at
include/sysemu/sysemu.h:36
#17 0x00005629e4a7fc07 in main_loop () at vl.c:1909
#18 0x00005629e490c735 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4661


At this point, I decided that I've ventured far enough.  (Especially
since it's Dec 21 and 11 pm and all that.)

I suppose the issue is that QMP events are sent by one thread, and
client disconnects are handled by a different one.  So if a QMP event is
sent while a client disconnects concurrently, races may occur; and the
only protection against concurrent access appears to be the
chr_write_lock, which I don't think is enough.

Max

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Segfaults in chardev due to races, Max Reitz <=
- Re: [Qemu-devel] Segfaults in chardev due to races, Paolo Bonzini, 2018/12/22

Prev by Date: [Qemu-devel] [RFC v2 2/4] elf-ops.h: Add get_elf_note_type()
Next by Date: [Qemu-devel] [PATCH 1/3] iotests.py: Add qemu_nbd_pipe()
Previous by thread: [Qemu-devel] [PATCH 0/3] Allow hw/audio drivers to pass raw DB values to audio/ drivers
Next by thread: Re: [Qemu-devel] Segfaults in chardev due to races
Index(es):
- Date
- Thread