qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] 0ab8c0: virtio: Fix virtio_mmio_read()/virtio


From: Peter Maydell
Subject: [Qemu-commits] [qemu/qemu] 0ab8c0: virtio: Fix virtio_mmio_read()/virtio_mmio_write()
Date: Tue, 23 Mar 2021 03:56:34 -0700

  Branch: refs/heads/staging
  Home:   https://github.com/qemu/qemu
  Commit: 0ab8c021c6c594846915cbeb501fa87ab8780949
      
https://github.com/qemu/qemu/commit/0ab8c021c6c594846915cbeb501fa87ab8780949
  Author: Laurent Vivier <laurent@vivier.eu>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/virtio-mmio.c

  Log Message:
  -----------
  virtio: Fix virtio_mmio_read()/virtio_mmio_write()

Both functions don't check the personality of the interface (legacy or
modern) before accessing the configuration memory and always use
virtio_config_readX()/virtio_config_writeX().

With this patch, they now check the personality and in legacy mode
call virtio_config_readX()/virtio_config_writeX(), otherwise call
virtio_config_modern_readX()/virtio_config_modern_writeX().

This change has been tested with virtio-mmio guests (virt stretch/armhf and
virt sid/m68k) and virtio-pci guests (pseries RHEL-7.3/ppc64 and /ppc64le).

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210314200300.3259170-1-laurent@vivier.eu>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: a890557d5a90b2c99988bc478bfd7f77392cfd8d
      
https://github.com/qemu/qemu/commit/a890557d5a90b2c99988bc478bfd7f77392cfd8d
  Author: Greg Kurz <groug@kaod.org>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/vhost-user.c

  Log Message:
  -----------
  vhost-user: Drop misleading EAGAIN checks in slave_read()

slave_read() checks EAGAIN when reading or writing to the socket
fails. This gives the impression that the slave channel is in
non-blocking mode, which is certainly not the case with the current
code base. And the rest of the code isn't actually ready to cope
with non-blocking I/O.

Just drop the checks everywhere in this function for the sake of
clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


  Commit: 9e06080bed293d94ec1f874e62f25f147b20bc6c
      
https://github.com/qemu/qemu/commit/9e06080bed293d94ec1f874e62f25f147b20bc6c
  Author: Greg Kurz <groug@kaod.org>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/vhost-user.c

  Log Message:
  -----------
  vhost-user: Fix double-close on slave_read() error path

Some message types, e.g. VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
can convey file descriptors. These must be closed before returning
from slave_read() to avoid being leaked. This can currently be done
in two different places:

[1] just after the request has been processed

[2] on the error path, under the goto label err:

These path are supposed to be mutually exclusive but they are not
actually. If the VHOST_USER_NEED_REPLY_MASK flag was passed and the
sending of the reply fails, both [1] and [2] are performed with the
same descriptor values. This can potentially cause subtle bugs if one
of the descriptor was recycled by some other thread in the meantime.

This code duplication complicates rollback for no real good benefit.
Do the closing in a unique place, under a new fdcleanup: goto label
at the end of the function.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


  Commit: de62e4946052076186428900a85d6547627e84c6
      
https://github.com/qemu/qemu/commit/de62e4946052076186428900a85d6547627e84c6
  Author: Greg Kurz <groug@kaod.org>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/vhost-user.c

  Log Message:
  -----------
  vhost-user: Factor out duplicated slave_fd teardown code

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-4-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


  Commit: 57dc02173cb089c11d3c84a0570cb60fe7d7f0d5
      
https://github.com/qemu/qemu/commit/57dc02173cb089c11d3c84a0570cb60fe7d7f0d5
  Author: Greg Kurz <groug@kaod.org>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/vhost-user.c

  Log Message:
  -----------
  vhost-user: Convert slave channel to QIOChannelSocket

The slave channel is implemented with socketpair() : QEMU creates
the pair, passes one of the socket to virtiofsd and monitors the
other one with the main event loop using qemu_set_fd_handler().

In order to fix a potential deadlock between QEMU and a vhost-user
external process (e.g. virtiofsd with DAX), we want to be able to
monitor and service the slave channel while handling vhost-user
requests.

Prepare ground for this by converting the slave channel to be a
QIOChannelSocket. This will make monitoring of the slave channel
as simple as calling qio_channel_add_watch_source(). Since the
connection is already established between the two sockets, only
incoming I/O (G_IO_IN) and disconnect (G_IO_HUP) need to be
serviced.

This also allows to get rid of the ancillary data parsing since
QIOChannelSocket can do this for us. Note that the MSG_CTRUNC
check is dropped on the way because QIOChannelSocket ignores this
case. This isn't a problem since slave_read() provisions space for
8 file descriptors, but affected vhost-user slave protocol messages
generally only convey one. If for some reason a buggy implementation
passes more file descriptors, no need to break the connection, just
like we don't break it if some other type of ancillary data is
received : this isn't explicitely violating the protocol per-se so
it seems better to ignore it.

The current code errors out on short reads and writes. Use the
qio_channel_*_all() variants to address this on the way.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-5-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


  Commit: a7f523c7d114d445c5d83aecdba3efc038e5a692
      
https://github.com/qemu/qemu/commit/a7f523c7d114d445c5d83aecdba3efc038e5a692
  Author: Greg Kurz <groug@kaod.org>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/vhost-user.c

  Log Message:
  -----------
  vhost-user: Introduce nested event loop in vhost_user_read()

A deadlock condition potentially exists if a vhost-user process needs
to request something to QEMU on the slave channel while processing a
vhost-user message.

This doesn't seem to affect any vhost-user implementation so far, but
this is currently biting the upcoming enablement of DAX with virtio-fs.
The issue is being observed when the guest does an emergency reboot while
a mapping still exits in the DAX window, which is very easy to get with
a busy enough workload (e.g. as simulated by blogbench [1]) :

- QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd.

- In order to complete the request, virtiofsd then asks QEMU to remove
  the mapping on the slave channel.

All these dialogs are synchronous, hence the deadlock.

As pointed out by Stefan Hajnoczi:

When QEMU's vhost-user master implementation sends a vhost-user protocol
message, vhost_user_read() does a "blocking" read during which slave_fd
is not monitored by QEMU.

The natural solution for this issue is an event loop. The main event
loop cannot be nested though since we have no guarantees that its
fd handlers are prepared for re-entrancy.

Introduce a new event loop that only monitors the chardev I/O for now
in vhost_user_read() and push the actual reading to a one-shot handler.
A subsequent patch will teach the loop to monitor and process messages
from the slave channel as well.

[1] https://github.com/jedisct1/Blogbench

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-6-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


  Commit: db8a3772e300c1a656331a92da0785d81667dc81
      
https://github.com/qemu/qemu/commit/db8a3772e300c1a656331a92da0785d81667dc81
  Author: Greg Kurz <groug@kaod.org>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/vhost-user.c

  Log Message:
  -----------
  vhost-user: Monitor slave channel in vhost_user_read()

Now that everything is in place, have the nested event loop to monitor
the slave channel. The source in the main event loop is destroyed and
recreated to ensure any pending even for the slave channel that was
previously detected is purged. This guarantees that the main loop
wont invoke slave_read() based on an event that was already handled
by the nested loop.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-7-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


  Commit: d2adda34a9989404a4fc86cb4127a3ea103a7938
      
https://github.com/qemu/qemu/commit/d2adda34a9989404a4fc86cb4127a3ea103a7938
  Author: Wang Liang <wangliangzz@inspur.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/virtio/virtio-pmem.c

  Log Message:
  -----------
  virtio-pmem: fix virtio_pmem_resp assign problem

ret in virtio_pmem_resp is a uint32_t variable, which should be assigned
using virtio_stl_p.

The kernel side driver does not guarantee virtio_pmem_resp to be initialized
to zero in advance, So sometimes the flush operation will fail.

Signed-off-by: Wang Liang <wangliangzz@inspur.com>
Message-Id: <20210317024145.271212-1-wangliangzz@126.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 79a2aca20cb4601e6a30bbe8620ee1ef9b960ae1
      
https://github.com/qemu/qemu/commit/79a2aca20cb4601e6a30bbe8620ee1ef9b960ae1
  Author: Igor Mammedov <imammedo@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M tests/qtest/bios-tables-test-allowed-diff.h

  Log Message:
  -----------
  tests: acpi: temporary whitelist DSDT changes

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: b32bd763a1ca929677e22ae1c51cb3920921bdce
      
https://github.com/qemu/qemu/commit/b32bd763a1ca929677e22ae1c51cb3920921bdce
  Author: Igor Mammedov <imammedo@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/acpi/pci.c
    M hw/acpi/pcihp.c
    M hw/acpi/piix4.c
    M hw/acpi/trace-events
    M hw/i386/acpi-build.c
    M hw/pci/pci.c
    M include/hw/acpi/pcihp.h
    M include/hw/pci/pci.h

  Log Message:
  -----------
  pci: introduce acpi-index property for PCI device

In x86/ACPI world, linux distros are using predictable
network interface naming since systemd v197. Which on
QEMU based VMs results into path based naming scheme,
that names network interfaces based on PCI topology.

With itm on has to plug NIC in exactly the same bus/slot,
which was used when disk image was first provisioned/configured
or one risks to loose network configuration due to NIC being
renamed to actually used topology.
That also restricts freedom to reshape PCI configuration of
VM without need to reconfigure used guest image.

systemd also offers "onboard" naming scheme which is
preferred over PCI slot/topology one, provided that
firmware implements:
    "
    PCI Firmware Specification 3.1
    4.6.7.  DSM for Naming a PCI or PCI Express Device Under
            Operating Systems
    "
that allows to assign user defined index to PCI device,
which systemd will use to name NIC. For example, using
  -device e1000,acpi-index=100
guest will rename NIC to 'eno100', where 'eno' is default
prefix for "onboard" naming scheme. This doesn't require
any advance configuration on guest side to com in effect
at 'onboard' scheme takes priority over path based naming.

Hope is that 'acpi-index' it will be easier to consume by
management layer, compared to forcing specific PCI topology
and/or having several disk image templates for different
topologies and will help to simplify process of spawning
VM from the same template without need to reconfigure
guest NIC.

This patch adds, 'acpi-index'* property and wires up
a 32bit register on top of pci hotplug register block
to pass index value to AML code at runtime.
Following patch will add corresponding _DSM code and
wire it up to PCI devices described in ACPI.

*) name comes from linux kernel terminology

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 4fd7da4c0336c8fd822cd808d62f7ff8c9936aef
      
https://github.com/qemu/qemu/commit/4fd7da4c0336c8fd822cd808d62f7ff8c9936aef
  Author: Igor Mammedov <imammedo@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/acpi/pcihp.c

  Log Message:
  -----------
  pci: acpi: ensure that acpi-index is unique

it helps to avoid device naming conflicts when guest OS is
configured to use acpi-index for naming.
Spec ialso says so:

PCI Firmware Specification Revision 3.2
4.6.7.  _DSM for Naming a PCI or PCI Express Device Under Operating Systems
"
Instance number must be unique under \_SB scope. This instance number does not 
have to
be sequential in a given system configuration.
"

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 910e4069710d854757c8fe8921dcff5b62dcd960
      
https://github.com/qemu/qemu/commit/910e4069710d854757c8fe8921dcff5b62dcd960
  Author: Igor Mammedov <imammedo@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/acpi/aml-build.c
    M include/hw/acpi/aml-build.h

  Log Message:
  -----------
  acpi: add aml_to_decimalstring() and aml_call6() helpers

it will be used by follow up patches

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-5-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: b7f23f62e40bb7bc87fe170471a31ab1fb8a0784
      
https://github.com/qemu/qemu/commit/b7f23f62e40bb7bc87fe170471a31ab1fb8a0784
  Author: Igor Mammedov <imammedo@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/i386/acpi-build.c
    M include/hw/acpi/pci.h

  Log Message:
  -----------
  pci: acpi: add _DSM method to PCI devices

Implement _DSM according to:
    PCI Firmware Specification 3.1
    4.6.7.  DSM for Naming a PCI or PCI Express Device Under
            Operating Systems
and wire it up to cold and hot-plugged PCI devices.
Feature depends on ACPI hotplug being enabled (as that provides
PCI devices descriptions in ACPI and MMIO registers that are
reused to fetch acpi-index).

acpi-index should work for
  - cold plugged NICs:
      $QEMU -device e1000,acpi-index=100
         => 'eno100'
  - hot-plugged
      (monitor) device_add e1000,acpi-index=200,id=remove_me
         => 'eno200'
  - re-plugged
      (monitor) device_del remove_me
      (monitor) device_add e1000,acpi-index=1
         => 'eno1'

Windows also sees index under "PCI Label Id" field in properties
dialog but otherwise it doesn't seem to have any effect.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-6-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 835fde4a781f8abf230d567f759647c403944b57
      
https://github.com/qemu/qemu/commit/835fde4a781f8abf230d567f759647c403944b57
  Author: Igor Mammedov <imammedo@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M tests/data/acpi/pc/DSDT
    M tests/data/acpi/pc/DSDT.acpihmat
    M tests/data/acpi/pc/DSDT.bridge
    M tests/data/acpi/pc/DSDT.cphp
    M tests/data/acpi/pc/DSDT.dimmpxm
    M tests/data/acpi/pc/DSDT.hpbridge
    M tests/data/acpi/pc/DSDT.ipmikcs
    M tests/data/acpi/pc/DSDT.memhp
    M tests/data/acpi/pc/DSDT.nohpet
    M tests/data/acpi/pc/DSDT.numamem
    M tests/data/acpi/pc/DSDT.roothp
    M tests/qtest/bios-tables-test-allowed-diff.h

  Log Message:
  -----------
  tests: acpi: update expected blobs

expected changes are:
 * larger BNMR operation region
 * new PIDX field and method to fetch acpi-index
 * PDSM method that implements PCI device _DSM +
   per device _DSM that calls PDSM

@@ -221,10 +221,11 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC    ", 
0x00000001)
             B0EJ,   32
         }

-        OperationRegion (BNMR, SystemIO, 0xAE10, 0x04)
+        OperationRegion (BNMR, SystemIO, 0xAE10, 0x08)
         Field (BNMR, DWordAcc, NoLock, WriteAsZeros)
         {
-            BNUM,   32
+            BNUM,   32,
+            PIDX,   32
         }

         Mutex (BLCK, 0x00)
@@ -236,6 +237,52 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC    ", 
0x00000001)
             Release (BLCK)
             Return (Zero)
         }
+
+        Method (AIDX, 2, NotSerialized)
+        {
+            Acquire (BLCK, 0xFFFF)
+            BNUM = Arg0
+            PIDX = (One << Arg1)
+            Local0 = PIDX /* \_SB_.PCI0.PIDX */
+            Release (BLCK)
+            Return (Local0)
+        }
+
+        Method (PDSM, 6, Serialized)
+        {
+            If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") /* 
Device Labeling Interface */))
+            {
+                Local0 = AIDX (Arg4, Arg5)
+                If ((Arg2 == Zero))
+                {
+                    If ((Arg1 == 0x02))
+                    {
+                        If (!((Local0 == Zero) | (Local0 == 0xFFFFFFFF)))
+                        {
+                            Return (Buffer (One)
+                            {
+                                 0x81                                          
   // .
+                            })
+                        }
+                    }
+
+                    Return (Buffer (One)
+                    {
+                         0x00                                             // .
+                    })
+                }
+                ElseIf ((Arg2 == 0x07))
+                {
+                    Local1 = Package (0x02)
+                        {
+                            Zero,
+                            ""
+                        }
+                    Local1 [Zero] = Local0
+                    Return (Local1)
+                }
+            }
+        }
     }

     Scope (_SB)
@@ -785,7 +832,7 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC    ", 
0x00000001)
                     0xAE00,             // Range Minimum
                     0xAE00,             // Range Maximum
                     0x01,               // Alignment
-                    0x14,               // Length
+                    0x18,               // Length
                     )
             })
         }
@@ -842,11 +889,22 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC    ", 
0x00000001)
             Device (S00)
             {
                 Name (_ADR, Zero)  // _ADR: Address
+                Name (_SUN, Zero)  // _SUN: Slot User Number
+                Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
+                {
+                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+                }
             }

             Device (S10)
             {
                 Name (_ADR, 0x00020000)  // _ADR: Address
+                Name (_SUN, 0x02)  // _SUN: Slot User Number
+                Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
+                {
+                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+                }
+
                 Method (_S1D, 0, NotSerialized)  // _S1D: S1 Device State
                 {
                     Return (Zero)
[...]

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-7-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 6c2b24d1d2b19cd330d971cdbc8e6b115dc97ca4
      
https://github.com/qemu/qemu/commit/6c2b24d1d2b19cd330d971cdbc8e6b115dc97ca4
  Author: David Hildenbrand <david@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/arm/virt-acpi-build.c
    M hw/i386/acpi-build.c
    M hw/i386/acpi-microvm.c
    M include/hw/acpi/aml-build.h

  Log Message:
  -----------
  acpi: Set proper maximum size for "etc/table-loader" blob

The resizeable memory region / RAMBlock that is created for the cmd blob
has a maximum size of whole host pages (e.g., 4k), because RAMBlocks
work on full host pages. In addition, in i386 ACPI code:
  acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
makes sure to align to multiples of 4k, padding with 0.

For example, if our cmd_blob is created with a size of 2k, the maximum
size is 4k - we cannot grow beyond that. Growing might be required
due to guest action when rebuilding the tables, but also on incoming
migration.

This automatic generation of the maximum size used to be sufficient,
however, there are cases where we cross host pages now when growing at
runtime: we exceed the maximum size of the RAMBlock and can crash QEMU when
trying to resize the resizeable memory region / RAMBlock:
  $ build/qemu-system-x86_64 --enable-kvm \
      -machine q35,nvdimm=on \
      -smp 1 \
      -cpu host \
      -m size=2G,slots=8,maxmem=4G \
      -object memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \
      -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \
      -nodefaults \
      -device vmgenid \
      -device intel-iommu

Results in:
  Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850:
  qemu-system-x86_64: Size too large: /rom@etc/table-loader:
    0x2000 > 0x1000: Invalid argument

In this configuration, we consume exactly 4k (32 entries, 128 bytes each)
when creating the VM. However, once the guest boots up and maps the MCFG,
we also create the MCFG table and end up consuming 2 additional entries
(pointer + checksum) -- which is where we try resizing the memory region
/ RAMBlock, however, the maximum size does not allow for it.

Currently, we get the following maximum sizes for our different
mutable tables based on behavior of resizeable RAMBlock:

  hw       table                max_size
  -------  ---------------------------------------------------------

  virt     "etc/acpi/tables"    ACPI_BUILD_TABLE_MAX_SIZE (0x200000)
  virt     "etc/table-loader"   HOST_PAGE_ALIGN(initial_size)
  virt     "etc/acpi/rsdp"      HOST_PAGE_ALIGN(initial_size)

  i386     "etc/acpi/tables"    ACPI_BUILD_TABLE_MAX_SIZE (0x200000)
  i386     "etc/table-loader"   HOST_PAGE_ALIGN(initial_size)
  i386     "etc/acpi/rsdp"      HOST_PAGE_ALIGN(initial_size)

  microvm  "etc/acpi/tables"    ACPI_BUILD_TABLE_MAX_SIZE (0x200000)
  microvm  "etc/table-loader"   HOST_PAGE_ALIGN(initial_size)
  microvm  "etc/acpi/rsdp"      HOST_PAGE_ALIGN(initial_size)

Let's set the maximum table size for "etc/table-loader" to 64k, so we
can properly grow at runtime, which should be good enough for the future.

Migration is not concerned with the maximum size of a RAMBlock, only
with the used size - so existing setups are not affected. Of course, we
cannot migrate a VM that would have crash when started on older QEMU from
new QEMU to older QEMU without failing early on the destination when
synchronizing the RAM state:
    qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: 
Invalid argument
    qemu-system-x86_64: error while loading state for instance 0x0 of device 
'ram'
    qemu-system-x86_64: load of migration failed: Invalid argument

We'll refactor the code next, to make sure we get rid of this implicit
behavior for "etc/acpi/rsdp" as well and to make the code easier to
grasp.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-2-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 2a3bdc5cec1a16fb731661d2eac896284f691e1f
      
https://github.com/qemu/qemu/commit/2a3bdc5cec1a16fb731661d2eac896284f691e1f
  Author: David Hildenbrand <david@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/i386/acpi-microvm.c

  Log Message:
  -----------
  microvm: Don't open-code "etc/table-loader"

Let's just reuse ACPI_BUILD_LOADER_FILE.

Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-3-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 6930ba0d44b2f4420948aec5209f665385411f7f
      
https://github.com/qemu/qemu/commit/6930ba0d44b2f4420948aec5209f665385411f7f
  Author: David Hildenbrand <david@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/acpi/utils.c
    M hw/arm/virt-acpi-build.c
    M hw/i386/acpi-build.c
    M hw/i386/acpi-microvm.c
    M include/hw/acpi/aml-build.h
    M include/hw/acpi/utils.h

  Log Message:
  -----------
  acpi: Move maximum size logic into acpi_add_rom_blob()

We want to have safety margins for all tables based on the table type.
Let's move the maximum size logic into acpi_add_rom_blob() and make it
dependent on the table name, so we don't have to replicate for each and
every instance that creates such tables.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-4-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 50337286b796f3866d1880f6e8a895fa5853b543
      
https://github.com/qemu/qemu/commit/50337286b796f3866d1880f6e8a895fa5853b543
  Author: David Hildenbrand <david@redhat.com>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/acpi/utils.c

  Log Message:
  -----------
  acpi: Set proper maximum size for "etc/acpi/rsdp" blob

Let's also set a maximum size for "etc/acpi/rsdp", so the maximum
size doesn't get implicitly set based on the initial table size. In my
experiments, the table size was in the range of 22 bytes, so a single
page (== what we used until now) seems to be good enough.

Now that we have defined maximum sizes for all currently used table types,
let's assert that we catch usage with new tables that need a proper maximum
size definition.

Also assert that our initial size does not exceed the maximum size; while
qemu_ram_alloc_internal() properly asserts that the initial RAMBlock size
is <= its maximum size, the result might differ when the host page size
is bigger than 4k.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-5-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: d07b22863b8e0981bdc9384a787a703f1fd4ba42
      
https://github.com/qemu/qemu/commit/d07b22863b8e0981bdc9384a787a703f1fd4ba42
  Author: Marian Postevca <posteuca@mutex.one>
  Date:   2021-03-22 (Mon, 22 Mar 2021)

  Changed paths:
    M hw/i386/acpi-build.c
    M hw/i386/acpi-microvm.c
    M hw/i386/microvm.c
    M hw/i386/pc.c
    M hw/i386/x86.c
    M include/hw/i386/microvm.h
    M include/hw/i386/pc.h
    M include/hw/i386/x86.h

  Log Message:
  -----------
  acpi: Move setters/getters of oem fields to X86MachineState

The code that sets/gets oem fields is duplicated in both PC and MICROVM
variants. This commit moves it to X86MachineState so that all x86
variants can use it and duplication is removed.

Signed-off-by: Marian Postevca <posteuca@mutex.one>
Message-Id: <20210221001737.24499-2-posteuca@mutex.one>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


  Commit: 97414988490de91673c51e6aa88a9f507e6a1edc
      
https://github.com/qemu/qemu/commit/97414988490de91673c51e6aa88a9f507e6a1edc
  Author: Peter Maydell <peter.maydell@linaro.org>
  Date:   2021-03-23 (Tue, 23 Mar 2021)

  Changed paths:
    M hw/acpi/aml-build.c
    M hw/acpi/pci.c
    M hw/acpi/pcihp.c
    M hw/acpi/piix4.c
    M hw/acpi/trace-events
    M hw/acpi/utils.c
    M hw/arm/virt-acpi-build.c
    M hw/i386/acpi-build.c
    M hw/i386/acpi-microvm.c
    M hw/i386/microvm.c
    M hw/i386/pc.c
    M hw/i386/x86.c
    M hw/pci/pci.c
    M hw/virtio/vhost-user.c
    M hw/virtio/virtio-mmio.c
    M hw/virtio/virtio-pmem.c
    M include/hw/acpi/aml-build.h
    M include/hw/acpi/pci.h
    M include/hw/acpi/pcihp.h
    M include/hw/acpi/utils.h
    M include/hw/i386/microvm.h
    M include/hw/i386/pc.h
    M include/hw/i386/x86.h
    M include/hw/pci/pci.h
    M tests/data/acpi/pc/DSDT
    M tests/data/acpi/pc/DSDT.acpihmat
    M tests/data/acpi/pc/DSDT.bridge
    M tests/data/acpi/pc/DSDT.cphp
    M tests/data/acpi/pc/DSDT.dimmpxm
    M tests/data/acpi/pc/DSDT.hpbridge
    M tests/data/acpi/pc/DSDT.ipmikcs
    M tests/data/acpi/pc/DSDT.memhp
    M tests/data/acpi/pc/DSDT.nohpet
    M tests/data/acpi/pc/DSDT.numamem
    M tests/data/acpi/pc/DSDT.roothp

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,virtio,pci: fixes, features

Fixes all over the place.
ACPI index support.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Mon 22 Mar 2021 22:58:45 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  acpi: Move setters/getters of oem fields to X86MachineState
  acpi: Set proper maximum size for "etc/acpi/rsdp" blob
  acpi: Move maximum size logic into acpi_add_rom_blob()
  microvm: Don't open-code "etc/table-loader"
  acpi: Set proper maximum size for "etc/table-loader" blob
  tests: acpi: update expected blobs
  pci: acpi: add _DSM method to PCI devices
  acpi: add aml_to_decimalstring() and aml_call6() helpers
  pci: acpi: ensure that acpi-index is unique
  pci: introduce acpi-index property for PCI device
  tests: acpi: temporary whitelist DSDT changes
  virtio-pmem: fix virtio_pmem_resp assign problem
  vhost-user: Monitor slave channel in vhost_user_read()
  vhost-user: Introduce nested event loop in vhost_user_read()
  vhost-user: Convert slave channel to QIOChannelSocket
  vhost-user: Factor out duplicated slave_fd teardown code
  vhost-user: Fix double-close on slave_read() error path
  vhost-user: Drop misleading EAGAIN checks in slave_read()
  virtio: Fix virtio_mmio_read()/virtio_mmio_write()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


Compare: https://github.com/qemu/qemu/compare/5ca634afcf83...97414988490d



reply via email to

[Prev in Thread] Current Thread [Next in Thread]