[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64
From: |
Fabiano Rosas |
Subject: |
Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64 |
Date: |
Wed, 31 Jan 2024 10:09:16 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jan 30, 2024 at 06:23:10PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote:
>> >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote:
>> >> >
>> >> > Fabiano Rosas <farosas@suse.de> writes:
>> >> >
>> >> > > Peter Xu <peterx@redhat.com> writes:
>> >> > >
>> >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
>> >> > > The issue that occurs to me now is that 'cpu host' will not work with
>> >> > > TCG. We might actually need to go poking /dev/kvm for this to work.
>> >> >
>> >> > Nevermind this last part. There's not going to be a scenario where we
>> >> > build with CONFIG_KVM, but run in an environment that does not support
>> >> > KVM.
>> >>
>> >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
>> >> but that doesn't imply that the user running the build and
>> >> test has permissions for /dev/kvm.
>> >
>> > I'm actually pretty confused on why this would be a problem even for
>> > neoverse-n1: can we just try to use KVM, if it fails then use TCG?
>> > Something like:
>> >
>> > (construct qemu cmdline)
>> > ..
>> > #ifdef CONFIG_KVM
>>
>> > "-accel kvm "
>> > #endif
>> > "-accel tcg "
>> > ..
>> >
>> > ?
>> > IIUC if we specify two "-accel", we'll try the first, then if failed then
>> > the 2nd?
>>
>> Aside from '-cpu max', there's no -accel and -cpu combination that works
>> on all of:
>>
>> x86_64 host - TCG-only
>> aarch64 host - KVM & TCG
>> aarch64 host with --disable-tcg - KVM-only
>> aarch64 host without access to /dev/kvm - TCG-only
>>
>> And the cpus are:
>> host - KVM-only
>> neoverse-n1 - TCG-only
>>
>> We'll need something like:
>>
>> /* covers aarch64 host with --disable-tcg */
>> if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) {
>> if (open("/dev/kvm", O_RDONLY) < 0) {
>> g_test_skip()
>> } else {
>> "-accel kvm -cpu host"
>> }
>> }
>>
>> /* covers x86_64 host */
>> if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
>> "-accel tcg -cpu neoverse-n1"
>> }
>>
>> /* covers aarch64 host */
>> if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
>> if (open("/dev/kvm", O_RDONLY) < 0) {
>> "-accel tcg -cpu neoverse-n1"
>> } else {
>> "-accel kvm -cpu host"
>> }
>> }
>
> The open("/dev/kvm") logic more or less duplicates what QEMU already does
> when init accelerators:
>
> if (!qemu_opts_foreach(qemu_find_opts("accel"),
> do_configure_accelerator, &init_failed,
> &error_fatal)) {
> if (!init_failed) {
> error_report("no accelerator found");
> }
> exit(1);
> }
>
> If /dev/kvm not accessible I think it'll already fallback to tcg here, as
> do_configure_accelerator() for kvm will just silently fail for qtest. I
> hope we can still rely on that for /dev/kvm access issues.
If we ask for KVM and it falls back to TCG, we need a cpu that supports
both. We don't have that. I've put some command-line combinations at the
end of the email[1], take a look.
If we ask for KVM only and /dev/kvm is not accessible, the test will
fail and we can prevent that by checking beforehand. It's much simpler
to check first and do the right thing than to run the QEMU binary and
somehow work around the test failure in migration-test.
>
> Hmm, I just notice that test_migrate_start() already has this later:
>
> "-accel kvm%s -accel tcg "
>
> So we're actually good from that regard, AFAIU.
>
> Then did I understand it right that in the failure case KVM is properly
> initialized, however it crashed later in neoverse-n1 asking for TCG? So
It didn't crash. It simply does not accept the neoverse-n1 with KVM
because it's unsupported:
qemu-system-aarch64: KVM is not supported for this guest CPU type
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid
argument
> the logic in the accel code above didn't really work to do a real fallback?
Yep, it didn't.
> A backtrace of such crash would help, maybe; I tried to find it in the
> pipeline log but I can only see:
>
> ----------------------------------- stderr
> -----------------------------------
> Broken pipe
> ../tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process
> but encountered exit status 1 (expected 0)
We need to fix the QTEST_LOG logic someday. It currently hides QEMU
stderr. But when we enable logging then it logs every single serial read
and write and query-migrate in the face of the earth and it floods the
logs.
>
> Or, is there some aarch64 cpu that will have a stable CPU ABI (not like
> -max, which is unstable), meanwhile supports both TCG + KVM?
Not as far as I know.
>
> Another thing I noticed that we may need to be caution is that currently
> gic is also using max version:
>
> machine_opts = "gic-version=max";
>
> We may want to choose a sane version too, probably altogether with the
> patch?
Good point.
====================
[1]
On x86_64:
==========
-cpu host
---------
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: unable to find CPU model 'host'
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel
tcg
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
qemu-system-aarch64: falling back to tcg
qemu-system-aarch64: unable to find CPU model 'host'
$ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel
kvm
qemu-system-aarch64: unable to find CPU model 'host'
-cpu neoverse-n1
----------------
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
works
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
-accel tcg
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
qemu-system-aarch64: falling back to tcg
works
$ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
-accel kvm
works
On aarch64:
===========
-cpu host
---------
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel
tcg
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel
kvm
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or
HVF
-cpu neoverse-n1
----------------
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
qemu-system-aarch64: KVM is not supported for this guest CPU type
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid
argument
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
-accel tcg
qemu-system-aarch64: KVM is not supported for this guest CPU type
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid
argument
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
-accel kvm
works
On aarch64 --disable-tcg:
=========================
-cpu host
---------
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel
tcg
works
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel
kvm
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
qemu-system-aarch64: falling back to KVM
works
-cpu neoverse-n1
----------------
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
qemu-system-aarch64: unable to find CPU model 'neoverse-n1'
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm
-accel tcg
qemu-system-aarch64: unable to find CPU model 'neoverse-n1'
# ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg
-accel kvm
qemu-system-aarch64: -accel tcg: invalid accelerator tcg
qemu-system-aarch64: falling back to KVM
qemu-system-aarch64: unable to find CPU model 'neoverse-n1'
On aarch64 without access to /dev/kvm:
======================================
-cpu host
---------
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm
Could not access KVM kernel module: No such file or directory
qemu-system-aarch64: -accel kvm: failed to initialize kvm: No such file
or directory
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel
tcg
Could not access KVM kernel module: No such file or directory
qemu-system-aarch64: -accel kvm: failed to initialize kvm: No such file or
directory
qemu-system-aarch64: falling back to tcg
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or
HVF
# ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel
kvm
qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, (continued)
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Fabiano Rosas, 2024/01/26
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Peter Maydell, 2024/01/26
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Fabiano Rosas, 2024/01/26
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Peter Xu, 2024/01/28
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Fabiano Rosas, 2024/01/29
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Fabiano Rosas, 2024/01/29
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Peter Maydell, 2024/01/30
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Peter Xu, 2024/01/30
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Fabiano Rosas, 2024/01/30
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Peter Xu, 2024/01/30
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64,
Fabiano Rosas <=
- Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64, Peter Xu, 2024/01/31
[PULL 07/15] ci: Add a migration compatibility test job, peterx, 2024/01/25
[PULL 08/15] ci: Disable migration compatibility tests for aarch64, peterx, 2024/01/25
[PULL 09/15] migration/yank: Use channel features, peterx, 2024/01/25
[PULL 10/15] migration: Fix use-after-free of migration state object, peterx, 2024/01/25
[PULL 11/15] migration: Take reference to migration state around bg_migration_vm_start_bh, peterx, 2024/01/25
[PULL 12/15] migration: Reference migration state around loadvm_postcopy_handle_run_bh, peterx, 2024/01/25
[PULL 14/15] migration: Centralize BH creation and dispatch, peterx, 2024/01/25
[PULL 13/15] migration: Add a wrapper to qemu_bh_schedule, peterx, 2024/01/25
[PULL 15/15] Make 'uri' optional for migrate QAPI, peterx, 2024/01/25