The implementation was validated with OpenSSL and with the test vectors in
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.
The instructions provide a ~25% improvement on hashing a 64 MiB file:
runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on
the host goes down from 5.8 billion to 4.8 billion with slightly better
IPC too. Good job Intel. ;)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.c | 2 +-
target/i386/ops_sse.h | 128 +++++++++++++++++++++++++++
target/i386/tcg/decode-new.c.inc | 11 +++
target/i386/tcg/decode-new.h | 1 +
target/i386/tcg/emit.c.inc | 54 +++++++++++
target/i386/tcg/ops_sse_header.h.inc | 14 +++
6 files changed, 209 insertions(+), 1 deletion(-)