qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regression in TCG emulation of VTBL neon instruction


From: Ard Biesheuvel
Subject: Re: regression in TCG emulation of VTBL neon instruction
Date: Wed, 4 Nov 2020 19:01:30 +0100

On Wed, 4 Nov 2020 at 18:50, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 4 Nov 2020 at 17:44, Alex Bennée <alex.bennee@linaro.org> wrote:
> > Just checking - what host are you on?
>

model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti
ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear
flush_l1d


> Oh, good question -- what the TCG backend emits as vector
> operations or not will depend on the host CPU (eg whether
> it supports AVX1/AVX2/etc).
>
> If the test case can be cut down to a Linux userspace
> program that can be run under the qemu-arm single-binary
> emulator that will probably also be easier to debug than
> "boot whole guest kernel and wait for it to get to a selftest".
>

Sure. The code can be found at [0]

The sequence in question is

# r4 between -31 and 0
# q4-q5 holding 32 bytes of cipher stream

adr lr, .Lpermute + 32
add lr, lr, r4
vld1.8 {q2-q3}, [lr]

vtbl.8 d4, {q4-q5}, d4
vtbl.8 d5, {q4-q5}, d5
vtbl.8 d6, {q4-q5}, d6
vtbl.8 d7, {q4-q5}, d7

.Lpermute:
 .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
 .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
 .byte 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17
 .byte 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
 .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
 .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
 .byte 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17
 .byte 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f

This is essentially a bytewise rotate function operating on a 32 byte
vector (the patch explains the purpose)

Using GDB to single step through the code, I noticed that d6 and d7
turn up as all zeroes.


[0] 
https://lore.kernel.org/linux-arm-kernel/20201103162809.28167-1-ardb@kernel.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]