qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GICv3 for MTTCG


From: Alex Bennée
Subject: Re: GICv3 for MTTCG
Date: Fri, 18 Jun 2021 14:15:19 +0100
User-agent: mu4e 1.5.13; emacs 28.0.50

Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:

> Dear Shashi,
>
> I have applied the version 4 of the series "GICv3 LPI and ITS feature 
> implementation" right after the commit 3e9f48b as before (because the 
> GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).
>
> The guest OS still hangs at its start when QEMU is configured with 4 or 
> more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the MTTCG 
> works properly):

Does QEMU itself hang? If you attach gdb to QEMU and do:

  thread apply all bt

that should dump the backtrace for all threads. Could you post the backtrace?

>
> Welcome to EulerOS 2.0 ... (Initramfs)!
>
>
>
> [  OK  ] Mounted Kernel Configuration File System.
>
> [  OK  ] Started udev Coldplug all Devices.
>
> [  OK  ] Reached target System Initialization.
>
> [  OK  ] Reached target Basic System.
>
>
>
> IT HANGS HERE
>   (with 4 or more vCPUs)!!!
>
>
> [  OK  ] Found device /dev/mapper/euleros-root.
>
> [  OK  ] Reached target Initrd Root Device.
>
> [  OK  ] Started dracut initqueue hook.
>
>           Starting File System Check on /dev/mapper/euleros-root...
>
> [  OK  ] Reached target Remote File Systems (Pre).
>
> [  OK  ] Reached target Remote File Systems.
>
> [  OK  ] Started File System Check on /dev/mapper/euleros-root.
>
>           Mounting /sysroot...
>
> [  OK  ] Mounted /sysroot.
>
>
>
>
> The back trace of threads in QEMU looks like a dead lock in MTTCG, 
> doesn't it?
>
> Thread 7 (Thread 0x7f476e489700 (LWP 24967)):
>
> #0  0x00007f477c2bbd19 in syscall () at /lib64/libc.so.6
>
> #1  0x000055747d41a270 in qemu_event_wait (val=<optimized out>, 
> f=<optimized out>) at /home/andy/git/qemu/include/qemu/futex.h:29
>
> #2  0x000055747d41a270 in qemu_event_wait (ev=ev@entry=0x55747e051c28 
> <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
>
> #3  0x000055747d444d78 in call_rcu_thread (opaque=opaque@entry=0x0) at 
> ../util/rcu.c:258
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 6 (Thread 0x7f472ce42700 (LWP 24970)):
>
> #0  0x00007f477c2b6ccd in poll () at /lib64/libc.so.6
>
> #1  0x00007f47805c137c in g_main_context_iterate.isra.19 () at 
> /lib64/libglib-2.0.so.0
>
> #2  0x00007f47805c16ca in g_main_loop_run () at /lib64/libglib-2.0.so.0
>
> #3  0x000055747d29b071 in iothread_run 
> (opaque=opaque@entry=0x55747f85f280) at ../iothread.c:80
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f916670, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f8b7920) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f8b7920) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9897e0, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f924bc0) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f924bc0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9f5b40, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f990ba0) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f990ba0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> ---Type <return> to continue, or q <return> to quit---
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):
>
> #0  0x00007f477c2b6d8f in ppoll () at /lib64/libc.so.6
>
> #1  0x000055747d431439 in qemu_poll_ns (__ss=0x0, 
> __timeout=0x7ffcc3188330, __nfds=<optimized out>, __fds=<optimized out>) 
> at /usr/include/bits/poll2.h:77
>
> #2  0x000055747d431439 in qemu_poll_ns (fds=<optimized out>, 
> nfds=<optimized out>, timeout=timeout@entry=3792947) at 
> ../util/qemu-timer.c:348
>
> #3  0x000055747d4466ce in main_loop_wait (timeout=<optimized out>) at 
> ../util/main-loop.c:249
>
> #4  0x000055747d4466ce in main_loop_wait 
> (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
>
> #5  0x000055747d2695c7 in qemu_main_loop () at ../softmmu/runstate.c:725
>
> #6  0x000055747ccc1bde in main (argc=<optimized out>, argv=<optimized 
> out>, envp=<optimized out>) at ../softmmu/main.c:50
>
> (gdb)
>
>
> I run QEMU with virt-manager as this:
>
> qemu      7311     1 70 19:15 ?        00:00:05 
> /usr/local/bin/qemu-system-aarch64 -name 
> guest=EulerOS-2.8-Rich,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-EulerOS-2.8-Rich/master-key.aes
>  
> -machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-version=3 
> -cpu max -drive 
> file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on 
> -drive 
> file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-Rich_VARS.fd,if=pflash,format=raw,unit=1
>  
> -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 
> c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config 
> -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon 
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
> -boot strict=on -device 
> pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
>  
> -device 
> pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 
> -device 
> pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 
> -device 
> pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 
> -device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device 
> virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive 
> file=/var/lib/libvirt/images/EulerOS-2.8-Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0
>  
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>  
> -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
> scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
>  
> -netdev tap,fd=28,id=hostnet0 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0
>  
> -chardev pty,id=charserial0 -serial chardev:charserial0 -msg timestamp=on
>
> The issue is reproducible and persists.
> 1. Do you think that applying the series results in the dead lock in 
> MTTCG? Or it may be other reason?
> 2. Which piece of QEMU source code should I investigate to locate the issue?
>
> Best regards,
> Andrey Shinkevich
>
>
> On 5/13/21 7:45 PM, Shashi Mallela wrote:
>> Hi Andrey,
>> 
>> To clarify, the patch series
>> 
>>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>>     "GICv3 LPI and ITS feature implementation"
>> 
>> is applicable for virt machine 6.1 onwards,i.e ITS TCG functionality is 
>> not available for version 6.0 that is being tried
>> here.
>> 
>> Thanks
>> Shashi
>> 
>> On May 13 2021, at 12:35 pm, Andrey Shinkevich 
>> <andrey.shinkevich@huawei.com> wrote:
>> 
>>     Dear colleagues,
>> 
>>     Thank you all very much for your responses. Let me reply with one
>>     message.
>> 
>>     I configured QEMU for AARCH64 guest:
>>     $ ./configure --target-list=aarch64-softmmu
>> 
>>     When I start QEMU with GICv3 on an x86 host:
>>     qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
>> 
>>     QEMU reports this error from hw/pci/msix.c:
>>     error_setg(errp, "MSI-X is not supported by interrupt controller");
>> 
>>     Probably, the variable 'msi_nonbroken' would be initialized in
>>     hw/intc/arm_gicv3_its_common.c:
>>     gicv3_its_init_mmio(..)
>> 
>>     I guess that it works with KVM acceleration only rather than with TCG.
>> 
>>     The error persists after applying the series:
>>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>>     "GICv3 LPI and ITS feature implementation"
>>     (special thanks for referring me to that)
>> 
>>     Please, make me clear and advise ideas how that error can be fixed?
>>     Should the MSI-X support be implemented with GICv3 extra?
>> 
>>     When successful, I would like to test QEMU for a maximum number of cores
>>     to get the best MTTCG performance.
>>     Probably, we will get just some percentage of performance enhancement
>>     with the BQL series applied, won't we? I will test it as well.
>> 
>>     Best regards,
>>     Andrey Shinkevich
>> 
>> 
>>     On 5/12/21 6:43 PM, Alex Bennée wrote:
>>      >
>>      > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>>      >
>>      >> Dear colleagues,
>>      >>
>>      >> I am looking for ways to accelerate the MTTCG for ARM guest on
>>     x86-64 host.
>>      >> The maximum number of CPUs for MTTCG that uses GICv2 is limited
>>     by 8:
>>      >>
>>      >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>>      >>
>>      >> The version 3 of the Generic Interrupt Controller (GICv3) is not
>>      >> supported in QEMU for some reason unknown to me. It would allow to
>>      >> increase the limit of CPUs and accelerate the MTTCG performance on a
>>      >> multiple core hypervisor.
>>      >
>>      > It is supported, you just need to select it.
>>      >
>>      >> I have got an idea to implement the Interrupt Translation
>>     Service (ITS)
>>      >> for using by MTTCG for ARM architecture.
>>      >
>>      > There is some work to support ITS under TCG already posted:
>>      >
>>      > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>>      > Date: Thu, 29 Apr 2021 19:41:53 -0400
>>      > Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>>      >
>>      > please do review and test.
>>      >
>>      >> Do you find that idea useful and feasible?
>>      >> If yes, how much time do you estimate for such a project to
>>     complete by
>>      >> one developer?
>>      >> If no, what are reasons for not implementing GICv3 for MTTCG in
>>     QEMU?
>>      >
>>      > As far as MTTCG performance is concerned there is a degree of
>>      > diminishing returns to be expected as the synchronisation cost
>>     between
>>      > threads will eventually outweigh the gains of additional threads.
>>      >
>>      > There are a number of parts that could improve this performance. The
>>      > first would be picking up the BQL reduction series from your
>>     FutureWei
>>      > colleges who worked on the problem when they were Linaro assignees:
>>      >
>>      > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from
>>     cpu_handle_interrupt/exception path
>>      > Date: Wed, 19 Aug 2020 14:28:49 -0400
>>      > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>>      >
>>      > There was also a longer series moving towards per-CPU locks:
>>      >
>>      > Subject: [PATCH v10 00/73] per-CPU locks
>>      > Date: Wed, 17 Jun 2020 17:01:18 -0400
>>      > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>>      >
>>      > I believe the initial measurements showed that the BQL cost
>>     started to
>>      > edge up with GIC interactions. We did discuss approaches for this
>>     and I
>>      > think one idea was use non-BQL locking for the GIC. You would need to
>>      > revert:
>>      >
>>      > Subject: [PATCH-for-5.2] exec: Remove
>>     MemoryRegion::global_locking field
>>      > Date: Thu, 6 Aug 2020 17:07:26 +0200
>>      > Message-Id: <20200806150726.962-1-philmd@redhat.com>
>>      >
>>      > and then implement a more fine tuned locking in the GIC emulation
>>      > itself. However I think the BQL and per-CPU locks are lower hanging
>>      > fruit to tackle first.
>>      >
>>      >>
>>      >> Best regards,
>>      >> Andrey Shinkevich
>>      >
>>      >
>> 
>> Sent from Mailspring


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]