[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v2 0/2] Implement AES on ARM using x86 instructions and vv
From: |
Ard Biesheuvel |
Subject: |
[PATCH v2 0/2] Implement AES on ARM using x86 instructions and vv |
Date: |
Wed, 31 May 2023 13:22:37 +0200 |
Use the host native instructions to implement the AES instructions
exposed by the emulated target. The mapping is not 1:1, so it requires a
bit of fiddling to get the right result.
This is still RFC material - the current approach feels too ad-hoc, but
given the non-1:1 correspondence, doing a proper abstraction is rather
difficult.
Changes since v1/RFC:
- add second patch to implement x86 AES instructions on ARM hosts - this
helps illustrate what an abstraction should cover.
- use cpuinfo framework to detect host support for AES instructions.
- implement ARM aesimc using x86 aesimc directly
Patch #1 produces a 1.5-2x speedup in tests using the Linux kernel's
tcrypt benchmark (mode=500)
Patch #2 produces a 2-3x speedup. The discrepancy is most likely due to
the fact that ARM uses two instructions to implement a single AES round,
whereas x86 only uses one.
Note that using the ARM intrinsics is fiddly with Clang, as it does not
declare the prototypes unless some builtin CPP macro (__ARM_FEATURE_AES)
is defined, which will be set by the compiler based on the command line
arch/cpu options. However, setting this globally for a compilation unit
is dubious, given that we test cpuinfo for AES support, and only emit
the instructions conditionally. So I used inline asm() instead.
As for the design of an abstraction: I imagine we could introduce a
host/aes.h API that implements some building blocks that the TCG helper
implementation could use.
Quoting from my reply to Richard:
Using the primitive operations defined in the AES paper, we basically
perform the following transformation for n rounds of AES (for n in {10,
12, 14})
for (n-1 rounds) {
AddRoundKey
ShiftRows
SubBytes
MixColumns
}
AddRoundKey
ShiftRows
SubBytes
AddRoundKey
AddRoundKey is just XOR, but it is incorporated into the instructions
that combine a couple of these steps.
So on x86, we have
aesenc:
ShiftRows
SubBytes
MixColumns
AddRoundKey
aesenclast:
ShiftRows
SubBytes
AddRoundKey
and on ARM we have
aese:
AddRoundKey
ShiftRows
SubBytes
aesmc:
MixColumns
So a generic routine that does only ShiftRows+SubBytes could be backed by
x86's aesenclast and ARM's aese, using a NULL round key argument in each
case. Then, it would be up to the TCG helper code for either ARM or x86
to incorporate those routines in the right way.
I suppose it really depends on whether there is a third host
architecture that could make use of this, and how its AES instructions
map onto the primitive AES ops above.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Ard Biesheuvel (2):
target/arm: use x86 intrinsics to implement AES instructions
target/i386: Implement AES instructions using AArch64 counterparts
host/include/aarch64/host/cpuinfo.h | 1 +
host/include/i386/host/cpuinfo.h | 1 +
target/arm/tcg/crypto_helper.c | 37 ++++++++++-
target/i386/ops_sse.h | 69 ++++++++++++++++++++
util/cpuinfo-aarch64.c | 1 +
util/cpuinfo-i386.c | 1 +
6 files changed, 107 insertions(+), 3 deletions(-)
--
2.39.2
- [PATCH v2 0/2] Implement AES on ARM using x86 instructions and vv,
Ard Biesheuvel <=