[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GICv3 for MTTCG
From: |
Alex Bennée |
Subject: |
Re: GICv3 for MTTCG |
Date: |
Fri, 18 Jun 2021 14:15:19 +0100 |
User-agent: |
mu4e 1.5.13; emacs 28.0.50 |
Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> Dear Shashi,
>
> I have applied the version 4 of the series "GICv3 LPI and ITS feature
> implementation" right after the commit 3e9f48b as before (because the
> GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).
>
> The guest OS still hangs at its start when QEMU is configured with 4 or
> more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the MTTCG
> works properly):
Does QEMU itself hang? If you attach gdb to QEMU and do:
thread apply all bt
that should dump the backtrace for all threads. Could you post the backtrace?
>
> Welcome to EulerOS 2.0 ... (Initramfs)!
>
> …
>
> [ OK ] Mounted Kernel Configuration File System.
>
> [ OK ] Started udev Coldplug all Devices.
>
> [ OK ] Reached target System Initialization.
>
> [ OK ] Reached target Basic System.
>
>
>
> IT HANGS HERE
> (with 4 or more vCPUs)!!!
>
>
> [ OK ] Found device /dev/mapper/euleros-root.
>
> [ OK ] Reached target Initrd Root Device.
>
> [ OK ] Started dracut initqueue hook.
>
> Starting File System Check on /dev/mapper/euleros-root...
>
> [ OK ] Reached target Remote File Systems (Pre).
>
> [ OK ] Reached target Remote File Systems.
>
> [ OK ] Started File System Check on /dev/mapper/euleros-root.
>
> Mounting /sysroot...
>
> [ OK ] Mounted /sysroot.
>
> …
>
>
> The back trace of threads in QEMU looks like a dead lock in MTTCG,
> doesn't it?
>
> Thread 7 (Thread 0x7f476e489700 (LWP 24967)):
>
> #0 0x00007f477c2bbd19 in syscall () at /lib64/libc.so.6
>
> #1 0x000055747d41a270 in qemu_event_wait (val=<optimized out>,
> f=<optimized out>) at /home/andy/git/qemu/include/qemu/futex.h:29
>
> #2 0x000055747d41a270 in qemu_event_wait (ev=ev@entry=0x55747e051c28
> <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
>
> #3 0x000055747d444d78 in call_rcu_thread (opaque=opaque@entry=0x0) at
> ../util/rcu.c:258
>
> #4 0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:521
>
> #5 0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6 0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 6 (Thread 0x7f472ce42700 (LWP 24970)):
>
> #0 0x00007f477c2b6ccd in poll () at /lib64/libc.so.6
>
> #1 0x00007f47805c137c in g_main_context_iterate.isra.19 () at
> /lib64/libglib-2.0.so.0
>
> #2 0x00007f47805c16ca in g_main_loop_run () at /lib64/libglib-2.0.so.0
>
> #3 0x000055747d29b071 in iothread_run
> (opaque=opaque@entry=0x55747f85f280) at ../iothread.c:80
>
> #4 0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:521
>
> #5 0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6 0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):
>
> #0 0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
> /lib64/libpthread.so.0
>
> #1 0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f916670,
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2 0x000055747d20ae36 in qemu_wait_io_event
> (cpu=cpu@entry=0x55747f8b7920) at ../softmmu/cpus.c:417
>
> #3 0x000055747d18d6a1 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x55747f8b7920) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4 0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:521
>
> #5 0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6 0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):
>
> #0 0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
> /lib64/libpthread.so.0
>
> #1 0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9897e0,
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2 0x000055747d20ae36 in qemu_wait_io_event
> (cpu=cpu@entry=0x55747f924bc0) at ../softmmu/cpus.c:417
>
> #3 0x000055747d18d6a1 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x55747f924bc0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4 0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:521
>
> #5 0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6 0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)):
>
> #0 0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
> /lib64/libpthread.so.0
>
> #1 0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9f5b40,
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2 0x000055747d20ae36 in qemu_wait_io_event
> (cpu=cpu@entry=0x55747f990ba0) at ../softmmu/cpus.c:417
>
> #3 0x000055747d18d6a1 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x55747f990ba0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4 0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:521
>
> #5 0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6 0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):
>
> #0 0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
> /lib64/libpthread.so.0
>
> ---Type <return> to continue, or q <return> to quit---
>
> #1 0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0,
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2 0x000055747d20ae36 in qemu_wait_io_event
> (cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417
>
> #3 0x000055747d18d6a1 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4 0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
> ../util/qemu-thread-posix.c:521
>
> #5 0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6 0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):
>
> #0 0x00007f477c2b6d8f in ppoll () at /lib64/libc.so.6
>
> #1 0x000055747d431439 in qemu_poll_ns (__ss=0x0,
> __timeout=0x7ffcc3188330, __nfds=<optimized out>, __fds=<optimized out>)
> at /usr/include/bits/poll2.h:77
>
> #2 0x000055747d431439 in qemu_poll_ns (fds=<optimized out>,
> nfds=<optimized out>, timeout=timeout@entry=3792947) at
> ../util/qemu-timer.c:348
>
> #3 0x000055747d4466ce in main_loop_wait (timeout=<optimized out>) at
> ../util/main-loop.c:249
>
> #4 0x000055747d4466ce in main_loop_wait
> (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
>
> #5 0x000055747d2695c7 in qemu_main_loop () at ../softmmu/runstate.c:725
>
> #6 0x000055747ccc1bde in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at ../softmmu/main.c:50
>
> (gdb)
>
>
> I run QEMU with virt-manager as this:
>
> qemu 7311 1 70 19:15 ? 00:00:05
> /usr/local/bin/qemu-system-aarch64 -name
> guest=EulerOS-2.8-Rich,debug-threads=on -S -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-EulerOS-2.8-Rich/master-key.aes
>
> -machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-version=3
> -cpu max -drive
> file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on
> -drive
> file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-Rich_VARS.fd,if=pflash,format=raw,unit=1
>
> -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid
> c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config
> -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot strict=on -device
> pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
>
> -device
> pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
> -device
> pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
> -device
> pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
> -device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device
> virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive
> file=/var/lib/libvirt/images/EulerOS-2.8-Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0
>
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>
> -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device
> scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
>
> -netdev tap,fd=28,id=hostnet0 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0
>
> -chardev pty,id=charserial0 -serial chardev:charserial0 -msg timestamp=on
>
> The issue is reproducible and persists.
> 1. Do you think that applying the series results in the dead lock in
> MTTCG? Or it may be other reason?
> 2. Which piece of QEMU source code should I investigate to locate the issue?
>
> Best regards,
> Andrey Shinkevich
>
>
> On 5/13/21 7:45 PM, Shashi Mallela wrote:
>> Hi Andrey,
>>
>> To clarify, the patch series
>>
>> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>> "GICv3 LPI and ITS feature implementation"
>>
>> is applicable for virt machine 6.1 onwards,i.e ITS TCG functionality is
>> not available for version 6.0 that is being tried
>> here.
>>
>> Thanks
>> Shashi
>>
>> On May 13 2021, at 12:35 pm, Andrey Shinkevich
>> <andrey.shinkevich@huawei.com> wrote:
>>
>> Dear colleagues,
>>
>> Thank you all very much for your responses. Let me reply with one
>> message.
>>
>> I configured QEMU for AARCH64 guest:
>> $ ./configure --target-list=aarch64-softmmu
>>
>> When I start QEMU with GICv3 on an x86 host:
>> qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
>>
>> QEMU reports this error from hw/pci/msix.c:
>> error_setg(errp, "MSI-X is not supported by interrupt controller");
>>
>> Probably, the variable 'msi_nonbroken' would be initialized in
>> hw/intc/arm_gicv3_its_common.c:
>> gicv3_its_init_mmio(..)
>>
>> I guess that it works with KVM acceleration only rather than with TCG.
>>
>> The error persists after applying the series:
>> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>> "GICv3 LPI and ITS feature implementation"
>> (special thanks for referring me to that)
>>
>> Please, make me clear and advise ideas how that error can be fixed?
>> Should the MSI-X support be implemented with GICv3 extra?
>>
>> When successful, I would like to test QEMU for a maximum number of cores
>> to get the best MTTCG performance.
>> Probably, we will get just some percentage of performance enhancement
>> with the BQL series applied, won't we? I will test it as well.
>>
>> Best regards,
>> Andrey Shinkevich
>>
>>
>> On 5/12/21 6:43 PM, Alex Bennée wrote:
>> >
>> > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>> >
>> >> Dear colleagues,
>> >>
>> >> I am looking for ways to accelerate the MTTCG for ARM guest on
>> x86-64 host.
>> >> The maximum number of CPUs for MTTCG that uses GICv2 is limited
>> by 8:
>> >>
>> >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>> >>
>> >> The version 3 of the Generic Interrupt Controller (GICv3) is not
>> >> supported in QEMU for some reason unknown to me. It would allow to
>> >> increase the limit of CPUs and accelerate the MTTCG performance on a
>> >> multiple core hypervisor.
>> >
>> > It is supported, you just need to select it.
>> >
>> >> I have got an idea to implement the Interrupt Translation
>> Service (ITS)
>> >> for using by MTTCG for ARM architecture.
>> >
>> > There is some work to support ITS under TCG already posted:
>> >
>> > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>> > Date: Thu, 29 Apr 2021 19:41:53 -0400
>> > Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>> >
>> > please do review and test.
>> >
>> >> Do you find that idea useful and feasible?
>> >> If yes, how much time do you estimate for such a project to
>> complete by
>> >> one developer?
>> >> If no, what are reasons for not implementing GICv3 for MTTCG in
>> QEMU?
>> >
>> > As far as MTTCG performance is concerned there is a degree of
>> > diminishing returns to be expected as the synchronisation cost
>> between
>> > threads will eventually outweigh the gains of additional threads.
>> >
>> > There are a number of parts that could improve this performance. The
>> > first would be picking up the BQL reduction series from your
>> FutureWei
>> > colleges who worked on the problem when they were Linaro assignees:
>> >
>> > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from
>> cpu_handle_interrupt/exception path
>> > Date: Wed, 19 Aug 2020 14:28:49 -0400
>> > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>> >
>> > There was also a longer series moving towards per-CPU locks:
>> >
>> > Subject: [PATCH v10 00/73] per-CPU locks
>> > Date: Wed, 17 Jun 2020 17:01:18 -0400
>> > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>> >
>> > I believe the initial measurements showed that the BQL cost
>> started to
>> > edge up with GIC interactions. We did discuss approaches for this
>> and I
>> > think one idea was use non-BQL locking for the GIC. You would need to
>> > revert:
>> >
>> > Subject: [PATCH-for-5.2] exec: Remove
>> MemoryRegion::global_locking field
>> > Date: Thu, 6 Aug 2020 17:07:26 +0200
>> > Message-Id: <20200806150726.962-1-philmd@redhat.com>
>> >
>> > and then implement a more fine tuned locking in the GIC emulation
>> > itself. However I think the BQL and per-CPU locks are lower hanging
>> > fruit to tackle first.
>> >
>> >>
>> >> Best regards,
>> >> Andrey Shinkevich
>> >
>> >
>>
>> Sent from Mailspring
--
Alex Bennée