qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regression in TCG emulation of VTBL neon instruction


From: Ard Biesheuvel
Subject: Re: regression in TCG emulation of VTBL neon instruction
Date: Wed, 4 Nov 2020 20:22:22 +0100

On Wed, 4 Nov 2020 at 19:01, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 4 Nov 2020 at 18:50, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Wed, 4 Nov 2020 at 17:44, Alex Bennée <alex.bennee@linaro.org> wrote:
> > > Just checking - what host are you on?
> >
>
> model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti
> ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad
> fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
> rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves
> dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear
> flush_l1d
>
>
> > Oh, good question -- what the TCG backend emits as vector
> > operations or not will depend on the host CPU (eg whether
> > it supports AVX1/AVX2/etc).
> >
> > If the test case can be cut down to a Linux userspace
> > program that can be run under the qemu-arm single-binary
> > emulator that will probably also be easier to debug than
> > "boot whole guest kernel and wait for it to get to a selftest".
> >
>
> Sure. The code can be found at [0]
>
> The sequence in question is
>
> # r4 between -31 and 0
> # q4-q5 holding 32 bytes of cipher stream
>
> adr lr, .Lpermute + 32
> add lr, lr, r4
> vld1.8 {q2-q3}, [lr]
>
> vtbl.8 d4, {q4-q5}, d4
> vtbl.8 d5, {q4-q5}, d5
> vtbl.8 d6, {q4-q5}, d6
> vtbl.8 d7, {q4-q5}, d7
>
> .Lpermute:
>  .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
>  .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
>  .byte 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17
>  .byte 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
>  .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
>  .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
>  .byte 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17
>  .byte 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
>
> This is essentially a bytewise rotate function operating on a 32 byte
> vector (the patch explains the purpose)
>
> Using GDB to single step through the code, I noticed that d6 and d7
> turn up as all zeroes.
>
>
> [0] 
> https://lore.kernel.org/linux-arm-kernel/20201103162809.28167-1-ardb@kernel.org/

OK, I could not reproduce with qemu-arm. However, I did found out that
the issue only occurs when using qemu-system-aarch64, not when using
qemu-system-arm



reply via email to

[Prev in Thread] Current Thread [Next in Thread]