[Qemu-commits] [qemu/qemu] d2476c: virtio-net: Ensure queue index fits w

qemu-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] d2476c: virtio-net: Ensure queue index fits w

From:	Peter Maydell
Subject:	[Qemu-commits] [qemu/qemu] d2476c: virtio-net: Ensure queue index fits with RSS
Date:	Fri, 23 Aug 2024 11:34:00 -0700
  Branch: refs/heads/staging-7.2
  Home:   https://github.com/qemu/qemu
  Commit: d2476ced2e34b661dded77d8774955b5a90fbda4
      
https://github.com/qemu/qemu/commit/d2476ced2e34b661dded77d8774955b5a90fbda4
  Author: Akihiko Odaki <akihiko.odaki@daynix.com>
  Date:   2024-08-06 (Tue, 06 Aug 2024)

  Changed paths:
    M hw/net/virtio-net.c

  Log Message:
  -----------
  virtio-net: Ensure queue index fits with RSS

Ensure the queue index points to a valid queue when software RSS
enabled. The new calculation matches with the behavior of Linux's TAP
device with the RSS eBPF program.

Fixes: 4474e37a5b3a ("virtio-net: implement RX RSS processing")
Reported-by: Zhibin Hu <huzhibin5@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f1595ceb9aad36a6c1da95bcb77ab9509b38822d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Fixes: CVE-2024-6505


  Commit: 681e338ee60f5cde11360c498e9c0cd100133672
      
https://github.com/qemu/qemu/commit/681e338ee60f5cde11360c498e9c0cd100133672
  Author: thomas <east.moutain.yang@gmail.com>
  Date:   2024-08-06 (Tue, 06 Aug 2024)

  Changed paths:
    M hw/net/virtio-net.c
    M hw/virtio/virtio.c
    M include/hw/virtio/virtio.h

  Log Message:
  -----------
  virtio-net: Fix network stall at the host side waiting for kick

Patch 06b12970174 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, it returns an opaque that represents the shadow_avail_idx
   of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes an opaque as input arg which is returned from
   virtqueue_get_avail_bytes(). It enables notification firstly,
   then checks whether the guest has added some buffers since
   last check of available buffers or not by virtio_queue_poll(),
   return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b12970174 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f937309fbdbb48c354220a3e7110c202ae4aa7fa)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in include/hw/virtio/virtio.h)


  Commit: fc2e706f4c5f8c081812a9dde482a96485d55cb8
      
https://github.com/qemu/qemu/commit/fc2e706f4c5f8c081812a9dde482a96485d55cb8
  Author: Philippe Mathieu-Daudé <philmd@linaro.org>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M hw/sd/sdhci.c

  Log Message:
  -----------
  hw/sd/sdhci: Reset @data_count index on invalid ADMA transfers

We neglected to clear the @data_count index on ADMA error,
allowing to trigger assertion in sdhci_read_dataport() or
sdhci_write_dataport().

Cc: qemu-stable@nongnu.org
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2455
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240730092138.32443-4-philmd@linaro.org>
(cherry picked from commit ed5a159c3de48a581f46de4c8c02b4b295e6c52d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 17962e1ea84529046731ad828dcf707231d97fc4
      
https://github.com/qemu/qemu/commit/17962e1ea84529046731ad828dcf707231d97fc4
  Author: Amjad Alsharafi <amjadsharafi10@gmail.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M block/vvfat.c

  Log Message:
  -----------
  vvfat: Fix bug in writing to middle of file

Before this commit, the behavior when calling `commit_one_file` for
example with `offset=0x2000` (second cluster), what will happen is that
we won't fetch the next cluster from the fat, and instead use the first
cluster for the read operation.

This is due to off-by-one error here, where `i=0x2000 !< offset=0x2000`,
thus not fetching the next cluster.

Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: 
<b97c1e1f1bc2f776061ae914f95d799d124fcd73.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b881cf00c99e03bc8a3648581f97736ff275b18b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 7dccb4fd0b5fae51db210d91d2f1fd596f78e41a
      
https://github.com/qemu/qemu/commit/7dccb4fd0b5fae51db210d91d2f1fd596f78e41a
  Author: Amjad Alsharafi <amjadsharafi10@gmail.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M block/vvfat.c

  Log Message:
  -----------
  vvfat: Fix usage of `info.file.offset`

The field is marked as "the offset in the file (in clusters)", but it
was being used like this
`cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.

Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: 
<72f19a7903886dda1aa78bcae0e17702ee939262.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 21b25a0e466a5bba0a45600bb8100ab564202ed1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: e2171f24be1364ec9d2a6188fa603077cb2c6514
      
https://github.com/qemu/qemu/commit/e2171f24be1364ec9d2a6188fa603077cb2c6514
  Author: Amjad Alsharafi <amjadsharafi10@gmail.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M block/vvfat.c

  Log Message:
  -----------
  vvfat: Fix wrong checks for cluster mappings invariant

How this `abort` was intended to check for was:
- if the `mapping->first_mapping_index` is not the same as
  `first_mapping_index`, which **should** happen only in one case,
  when we are handling the first mapping, in that case
  `mapping->first_mapping_index == -1`, in all other cases, the other
  mappings after the first should have the condition `true`.
- From above, we know that this is the first mapping, so if the offset
  is not `0`, then abort, since this is an invalid state.

The issue was that `first_mapping_index` is not set if we are
checking from the middle, the variable `first_mapping_index` is
only set if we passed through the check `cluster_was_modified` with the
first mapping, and in the same function call we checked the other
mappings.

One approach is to go into the loop even if `cluster_was_modified`
is not true so that we will be able to set `first_mapping_index` for the
first mapping, but since `first_mapping_index` is only used here,
another approach is to just check manually for the
`mapping->first_mapping_index != -1` since we know that this is the
value for the only entry where `offset == 0` (i.e. first mapping).

Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: 
<b0fbca3ee208c565885838f6a7deeaeb23f4f9c2.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit f60a6f7e17bf2a2a0f0a08265ac9b077fce42858)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 3f27fd9536fc0e62fba16bb3b0f0a1f69e4ab147
      
https://github.com/qemu/qemu/commit/3f27fd9536fc0e62fba16bb3b0f0a1f69e4ab147
  Author: Amjad Alsharafi <amjadsharafi10@gmail.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M block/vvfat.c

  Log Message:
  -----------
  vvfat: Fix reading files with non-continuous clusters

When reading with `read_cluster` we get the `mapping` with
`find_mapping_for_cluster` and then we call `open_file` for this
mapping.
The issue appear when its the same file, but a second cluster that is
not immediately after it, imagine clusters `500 -> 503`, this will give
us 2 mappings one has the range `500..501` and another `503..504`, both
point to the same file, but different offsets.

When we don't open the file since the path is the same, we won't assign
`s->current_mapping` and thus accessing way out of bound of the file.

>From our example above, after `open_file` (that didn't open anything) we
will get the offset into the file with
`s->cluster_size*(cluster_num-s->current_mapping->begin)`, which will
give us `0x2000 * (504-500)`, which is out of bound for this mapping and
will produce some issues.

Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Message-ID: 
<1f3ea115779abab62ba32c788073cdc99f9ad5dd.1721470238.git.amjadsharafi10@gmail.com>
[kwolf: Simplified the patch based on Amjad's analysis and input]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 5eed3db336506b529b927ba221fe0d836e5b8819)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: c179754c3c1f6f32540a1d48718ea41272a5456f
      
https://github.com/qemu/qemu/commit/c179754c3c1f6f32540a1d48718ea41272a5456f
  Author: Amjad Alsharafi <amjadsharafi10@gmail.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M tests/qemu-iotests/check
    A tests/qemu-iotests/fat16.py
    M tests/qemu-iotests/testenv.py
    A tests/qemu-iotests/tests/vvfat
    A tests/qemu-iotests/tests/vvfat.out

  Log Message:
  -----------
  iotests: Add `vvfat` tests

Added several tests to verify the implementation of the vvfat driver.

We needed a way to interact with it, so created a basic `fat16.py` driver
that handled writing correct sectors for us.

Added `vvfat` to the non-generic formats, as its not a normal image format.

Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: 
<bb8149c945301aefbdf470a0924c07f69f9c087d.1721470238.git.amjadsharafi10@gmail.com>
[kwolf: Made mypy and pylint happy to unbreak 297]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit c8f60bfb4345ea8343a53eaefe88d47b44c53f24)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: cb48a7089b7ba04f4bb3e0bb9642893e5cbd8b3d
      
https://github.com/qemu/qemu/commit/cb48a7089b7ba04f4bb3e0bb9642893e5cbd8b3d
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M blockdev-nbd.c
    M include/block/nbd.h
    M nbd/server.c
    M qemu-nbd.c

  Log Message:
  -----------
  nbd/server: Plumb in new args to nbd_client_add()

Upcoming patches to fix a CVE need to track an opaque pointer passed
in by the owner of a client object, as well as request for a time
limit on how fast negotiation must complete.  Prepare for that by
changing the signature of nbd_client_new() and adding an accessor to
get at the opaque pointer, although for now the two servers
(qemu-nbd.c and blockdev-nbd.c) do not change behavior even though
they pass in a new default timeout value.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-11-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: s/LIMIT/MAX_SECS/ as suggested by Dan]
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit fb1c2aaa981e0a2fa6362c9985f1296b74f055ac)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: dbf946c25112f682ff783240a0fb6130805e41c4
      
https://github.com/qemu/qemu/commit/dbf946c25112f682ff783240a0fb6130805e41c4
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M block/monitor/block-hmp-cmds.c
    M blockdev-nbd.c
    M include/block/nbd.h
    M qapi/block-export.json

  Log Message:
  -----------
  nbd/server: CVE-2024-7409: Cap default max-connections to 100

Allowing an unlimited number of clients to any web service is a recipe
for a rudimentary denial of service attack: the client merely needs to
open lots of sockets without closing them, until qemu no longer has
any more fds available to allocate.

For qemu-nbd, we default to allowing only 1 connection unless more are
explicitly asked for (-e or --shared); this was historically picked as
a nice default (without an explicit -t, a non-persistent qemu-nbd goes
away after a client disconnects, without needing any additional
follow-up commands), and we are not going to change that interface now
(besides, someday we want to point people towards qemu-storage-daemon
instead of qemu-nbd).

But for qemu proper, and the newer qemu-storage-daemon, the QMP
nbd-server-start command has historically had a default of unlimited
number of connections, in part because unlike qemu-nbd it is
inherently persistent until nbd-server-stop.  Allowing multiple client
sockets is particularly useful for clients that can take advantage of
MULTI_CONN (creating parallel sockets to increase throughput),
although known clients that do so (such as libnbd's nbdcopy) typically
use only 8 or 16 connections (the benefits of scaling diminish once
more sockets are competing for kernel attention).  Picking a number
large enough for typical use cases, but not unlimited, makes it
slightly harder for a malicious client to perform a denial of service
merely by opening lots of connections withot progressing through the
handshake.

This change does not eliminate CVE-2024-7409 on its own, but reduces
the chance for fd exhaustion or unlimited memory usage as an attack
surface.  On the other hand, by itself, it makes it more obvious that
with a finite limit, we have the problem of an unauthenticated client
holding 100 fds opened as a way to block out a legitimate client from
being able to connect; thus, later patches will further add timeouts
to reject clients that are not making progress.

This is an INTENTIONAL change in behavior, and will break any client
of nbd-server-start that was not passing an explicit max-connections
parameter, yet expects more than 100 simultaneous connections.  We are
not aware of any such client (as stated above, most clients aware of
MULTI_CONN get by just fine on 8 or 16 connections, and probably cope
with later connections failing by relying on the earlier connections;
libvirt has not yet been passing max-connections, but generally
creates NBD servers with the intent for a single client for the sake
of live storage migration; meanwhile, the KubeSAN project anticipates
a large cluster sharing multiple clients [up to 8 per node, and up to
100 nodes in a cluster], but it currently uses qemu-nbd with an
explicit --shared=0 rather than qemu-storage-daemon with
nbd-server-start).

We considered using a deprecation period (declare that omitting
max-parameters is deprecated, and make it mandatory in 3 releases -
then we don't need to pick an arbitrary default); that has zero risk
of breaking any apps that accidentally depended on more than 100
connections, and where such breakage might not be noticed under unit
testing but only under the larger loads of production usage.  But it
does not close the denial-of-service hole until far into the future,
and requires all apps to change to add the parameter even if 100 was
good enough.  It also has a drawback that any app (like libvirt) that
is accidentally relying on an unlimited default should seriously
consider their own CVE now, at which point they are going to change to
pass explicit max-connections sooner than waiting for 3 qemu releases.
Finally, if our changed default breaks an app, that app can always
pass in an explicit max-parameters with a larger value.

It is also intentional that the HMP interface to nbd-server-start is
not changed to expose max-connections (any client needing to fine-tune
things should be using QMP).

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-12-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[ericb: Expand commit message to summarize Dan's argument for why we
break corner-case back-compat behavior without a deprecation period]
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit c8a76dbd90c2f48df89b75bef74917f90a59b623)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: minor fixups in qapi/block-export.json)


  Commit: 7d31776d7f747cbfb83ecf0366878896f659361e
      
https://github.com/qemu/qemu/commit/7d31776d7f747cbfb83ecf0366878896f659361e
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M nbd/server.c
    M nbd/trace-events

  Log Message:
  -----------
  nbd/server: CVE-2024-7409: Drop non-negotiating clients

A client that opens a socket but does not negotiate is merely hogging
qemu's resources (an open fd and a small amount of memory); and a
malicious client that can access the port where NBD is listening can
attempt a denial of service attack by intentionally opening and
abandoning lots of unfinished connections.  The previous patch put a
default bound on the number of such ongoing connections, but once that
limit is hit, no more clients can connect (including legitimate ones).
The solution is to insist that clients complete handshake within a
reasonable time limit, defaulting to 10 seconds.  A client that has
not successfully completed NBD_OPT_GO by then (including the case of
where the client didn't know TLS credentials to even reach the point
of NBD_OPT_GO) is wasting our time and does not deserve to stay
connected.  Later patches will allow fine-tuning the limit away from
the default value (including disabling it for doing integration
testing of the handshake process itself).

Note that this patch in isolation actually makes it more likely to see
qemu SEGV after nbd-server-stop, as any client socket still connected
when the server shuts down will now be closed after 10 seconds rather
than at the client's whims.  That will be addressed in the next patch.

For a demo of this patch in action:
$ qemu-nbd -f raw -r -t -e 10 file &
$ nbdsh --opt-mode -c '
H = list()
for i in range(20):
  print(i)
  H.insert(i, nbd.NBD())
  H[i].set_opt_mode(True)
  H[i].connect_uri("nbd://localhost")
'
$ kill $!

where later connections get to start progressing once earlier ones are
forcefully dropped for taking too long, rather than hanging.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-13-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: rebase to changes earlier in series, reduce scope of timer]
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit b9b72cb3ce15b693148bd09cef7e50110566d8a0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in nbd/server.c: lack of WITH_QEMU_LOCK_GUARD in 7.2)


  Commit: 9df5c133c9435e3653c6f82e34f891959089f219
      
https://github.com/qemu/qemu/commit/9df5c133c9435e3653c6f82e34f891959089f219
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-11 (Sun, 11 Aug 2024)

  Changed paths:
    M blockdev-nbd.c

  Log Message:
  -----------
  nbd/server: CVE-2024-7409: Close stray clients at server-stop

A malicious client can attempt to connect to an NBD server, and then
intentionally delay progress in the handshake, including if it does
not know the TLS secrets.  Although the previous two patches reduce
this behavior by capping the default max-connections parameter and
killing slow clients, they did not eliminate the possibility of a
client waiting to close the socket until after the QMP nbd-server-stop
command is executed, at which point qemu would SEGV when trying to
dereference the NULL nbd_server global which is no longer present.
This amounts to a denial of service attack.  Worse, if another NBD
server is started before the malicious client disconnects, I cannot
rule out additional adverse effects when the old client interferes
with the connection count of the new server (although the most likely
is a crash due to an assertion failure when checking
nbd_server->connections > 0).

For environments without this patch, the CVE can be mitigated by
ensuring (such as via a firewall) that only trusted clients can
connect to an NBD server.  Note that using frameworks like libvirt
that ensure that TLS is used and that nbd-server-stop is not executed
while any trusted clients are still connected will only help if there
is also no possibility for an untrusted client to open a connection
but then stall on the NBD handshake.

Given the previous patches, it would be possible to guarantee that no
clients remain connected by having nbd-server-stop sleep for longer
than the default handshake deadline before finally freeing the global
nbd_server object, but that could make QMP non-responsive for a long
time.  So intead, this patch fixes the problem by tracking all client
sockets opened while the server is running, and forcefully closing any
such sockets remaining without a completed handshake at the time of
nbd-server-stop, then waiting until the coroutines servicing those
sockets notice the state change.  nbd-server-stop now has a second
AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the
blk_exp_close_all_type() that disconnects all clients that completed
handshakes), but forced socket shutdown is enough to progress the
coroutines and quickly tear down all clients before the server is
freed, thus finally fixing the CVE.

This patch relies heavily on the fact that nbd/server.c guarantees
that it only calls nbd_blockdev_client_closed() from the main loop
(see the assertion in nbd_client_put() and the hoops used in
nbd_client_put_nonzero() to achieve that); if we did not have that
guarantee, we would also need a mutex protecting our accesses of the
list of connections to survive re-entrancy from independent iothreads.

Although I did not actually try to test old builds, it looks like this
problem has existed since at least commit 862172f45c (v2.12.0, 2017) -
even back when that patch started using a QIONetListener to handle
listening on multiple sockets, nbd_server_free() was already unaware
that the nbd_blockdev_client_closed callback can be reached later by a
client thread that has not completed handshakes (and therefore the
client's socket never got added to the list closed in
nbd_export_close_all), despite that patch intentionally tearing down
the QIONetListener to prevent new clients.

Reported-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Fixes: CVE-2024-7409
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-14-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 3e7ef738c8462c45043a1d39f702a0990406a3b3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 0e46b6e20200be37dca3d73227f0c43a5885570f
      
https://github.com/qemu/qemu/commit/0e46b6e20200be37dca3d73227f0c43a5885570f
  Author: Stefano Garzarella <sgarzare@redhat.com>
  Date:   2024-08-13 (Tue, 13 Aug 2024)

  Changed paths:
    M block/blkio.c
    M meson.build

  Log Message:
  -----------
  block/blkio: use FUA flag on write zeroes only if supported

libblkio supports BLKIO_REQ_FUA with write zeros requests only since
version 1.4.0, so let's inform the block layer that the blkio driver
supports it only in this case. Otherwise we can have runtime errors
as reported in https://issues.redhat.com/browse/RHEL-32878

Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-32878
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240808080545.40744-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 547c4e50929ec6c091d9c16a7b280e829b12b463)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: meson.build fixup for the lack of
 v8.0.0-1489-g98b126f5e3 "qapi: add '@fdset' feature for 
BlockdevOptionsVirtioBlkVhostVdpa")


  Commit: 069d1a716e12d06c00bbe219ae075644a70d32d9
      
https://github.com/qemu/qemu/commit/069d1a716e12d06c00bbe219ae075644a70d32d9
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2024-08-14 (Wed, 14 Aug 2024)

  Changed paths:
    M target/i386/tcg/decode-new.c.inc

  Log Message:
  -----------
  target/i386: Do not apply REX to MMX operands

Cc: qemu-stable@nongnu.org
Fixes: b3e22b2318a ("target/i386: add core of new i386 decoder")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2495
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: 
https://lore.kernel.org/r/20240812025844.58956-2-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 416f2b16c02c618c0f233372ebfe343f9ee667d4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 48ec83d2b0c65e3b21dc4318efbe46178d2e5fe1
      
https://github.com/qemu/qemu/commit/48ec83d2b0c65e3b21dc4318efbe46178d2e5fe1
  Author: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
  Date:   2024-08-14 (Wed, 14 Aug 2024)

  Changed paths:
    M util/module.c

  Log Message:
  -----------
  module: Prevent crash by resetting local_err in module_load_qom_all()

Set local_err to NULL after it has been freed in error_report_err(). This
avoids triggering assert(*errp == NULL) failure in error_setv() when
local_err is reused in the loop.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Link: 
https://lore.kernel.org/r/20240809121340.992049-2-alexander.ivanov@virtuozzo.com
[Do the same by moving the declaration instead. - Paolo]
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 940d802b24e63650e0eacad3714e2ce171cba17c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


  Commit: 6688f59a7f8a6efd92c854b1e1590de75be3a67b
      
https://github.com/qemu/qemu/commit/6688f59a7f8a6efd92c854b1e1590de75be3a67b
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2024-08-22 (Thu, 22 Aug 2024)

  Changed paths:
    M crypto/tlscredspsk.c

  Log Message:
  -----------
  crypto/tlscredspsk: Free username on finalize

When the creds->username property is set we allocate memory
for it in qcrypto_tls_creds_psk_prop_set_username(), but
we never free this when the QCryptoTLSCredsPSK is destroyed.
Free the memory in finalize.

This fixes a LeakSanitizer complaint in migration-test:

$ (cd build/asan; ASAN_OPTIONS="fast_unwind_on_malloc=0" 
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --tap -k -p 
/x86_64/migration/precopy/unix/tls/psk)

=================================================================
==3867512==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 5 byte(s) in 1 object(s) allocated from:
    #0 0x5624e5c99dee in malloc 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x218edee)
 (BuildId: a9e623fa1009a9435c0142c037cd7b8c1ad04ce3)
    #1 0x7fb199ae9738 in g_malloc debian/build/deb/../../../glib/gmem.c:128:13
    #2 0x7fb199afe583 in g_strdup 
debian/build/deb/../../../glib/gstrfuncs.c:361:17
    #3 0x5624e82ea919 in qcrypto_tls_creds_psk_prop_set_username 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../crypto/tlscredspsk.c:255:23
    #4 0x5624e812c6b5 in property_set_str 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qom/object.c:2277:5
    #5 0x5624e8125ce5 in object_property_set 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qom/object.c:1463:5
    #6 0x5624e8136e7c in object_set_properties_from_qdict 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qom/object_interfaces.c:55:14
    #7 0x5624e81372d2 in user_creatable_add_type 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qom/object_interfaces.c:112:5
    #8 0x5624e8137964 in user_creatable_add_qapi 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qom/object_interfaces.c:157:11
    #9 0x5624e891ba3c in qmp_object_add 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qom/qom-qmp-cmds.c:227:5
    #10 0x5624e8af9118 in qmp_marshal_object_add 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qapi/qapi-commands-qom.c:337:5
    #11 0x5624e8bd1d49 in do_qmp_dispatch_bh 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../qapi/qmp-dispatch.c:128:5
    #12 0x5624e8cb2531 in aio_bh_call 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/async.c:171:5
    #13 0x5624e8cb340c in aio_bh_poll 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/async.c:218:13
    #14 0x5624e8c0be98 in aio_dispatch 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/aio-posix.c:423:5
    #15 0x5624e8cba3ce in aio_ctx_dispatch 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/async.c:360:5
    #16 0x7fb199ae0d3a in g_main_dispatch 
debian/build/deb/../../../glib/gmain.c:3419:28
    #17 0x7fb199ae0d3a in g_main_context_dispatch 
debian/build/deb/../../../glib/gmain.c:4137:7
    #18 0x5624e8cbe1d9 in glib_pollfds_poll 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/main-loop.c:287:9
    #19 0x5624e8cbcb13 in os_host_main_loop_wait 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/main-loop.c:310:5
    #20 0x5624e8cbc6dc in main_loop_wait 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../util/main-loop.c:589:11
    #21 0x5624e6f3f917 in qemu_main_loop 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../system/runstate.c:801:9
    #22 0x5624e893379c in qemu_default_main 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../system/main.c:37:14
    #23 0x5624e89337e7 in main 
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../system/main.c:48:12
    #24 0x7fb197972d8f in __libc_start_call_main 
csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #25 0x7fb197972e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #26 0x5624e5c16fa4 in _start 
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x210bfa4)
 (BuildId: a9e623fa1009a9435c0142c037cd7b8c1ad04ce3)

SUMMARY: AddressSanitizer: 5 byte(s) leaked in 1 allocation(s).

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20240819145021.38524-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 87e012f29f2e47dcd8c385ff8bb8188f9e06d4ea)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>


Compare: https://github.com/qemu/qemu/compare/76382d7bb0d2...6688f59a7f8a

To unsubscribe from these emails, change your notification settings at 
https://github.com/qemu/qemu/settings/notifications
[Prev in Thread]
Current Thread
[Next in Thread]
[Qemu-commits] [qemu/qemu] d2476c: virtio-net: Ensure queue index fits with RSS, Peter Maydell <=
Prev by Date: [Qemu-commits] [qemu/qemu] 407f9a: Update version for v9.1.0-rc3 release
Next by Date: [Qemu-commits] [qemu/qemu] 407f9a: Update version for v9.1.0-rc3 release
Previous by thread: [Qemu-commits] [qemu/qemu] 407f9a: Update version for v9.1.0-rc3 release
Next by thread: [Qemu-commits] [qemu/qemu] fb3552: hw/scsi/lsi53c895a: bump instruction limit in scri...
Index(es):
- Date
- Thread