qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 2/2] target/s390x: support SHA-512 extensions


From: David Hildenbrand
Subject: Re: [PATCH v4 2/2] target/s390x: support SHA-512 extensions
Date: Wed, 3 Aug 2022 13:55:21 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 02.08.22 21:00, Jason A. Donenfeld wrote:
> In order to fully support MSA_EXT_5, we have to also support the SHA-512
> special instructions. So implement those.
> 
> The implementation began as something TweetNacl-like, and then was
> adjusted to be useful here. It's not very beautiful, but it is quite
> short and compact, which is what we're going for.
> 

Do we have to worry about copyright/authorship of the original code or
did you write that from scratch?

[...]

I cannot really comment on the actual math, so I'll point out some code
style thingies.

> +static void kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
> parameter_block,
> +                        uint64_t *message_reg, uint64_t *len_reg, uint8_t 
> *stack_buffer)
> +{
> +    uint64_t z[8], b[8], a[8], w[16], t;
> +    int i, j;
> +
> +    for (i = 0; i < 8; ++i)
> +        z[i] = a[i] = cpu_ldq_be_data_ra(env, wrap_address(env, 
> parameter_block + 8 * i), ra);

Please always use curly brackets in QEMU for code blocks, they are
mandatory.

> +
> +    while (*len_reg >= 128) {
> +        for (i = 0; i < 16; ++i) {

i++, also for all cases below.

> +            if (message_reg)
> +                w[i] = cpu_ldq_be_data_ra(env, wrap_address(env, 
> *message_reg + 8 * i), ra);
> +            else
> +                w[i] = be64_to_cpu(((uint64_t *)stack_buffer)[i]);
> +        }
> +
> +        for (i = 0; i < 80; ++i) {
> +            for (j = 0; j < 8; ++j)
> +                b[j] = a[j];
> +            t = a[7] + Sigma1(a[4]) + Ch(a[4], a[5], a[6]) + K[i] + w[i % 
> 16];
> +            b[7] = t + Sigma0(a[0]) + Maj(a[0], a[1], a[2]);
> +            b[3] += t;
> +            for (j = 0; j < 8; ++j)
> +                a[(j + 1) % 8] = b[j];
> +            if (i % 16 == 15) {
> +                for (j = 0; j < 16; ++j)
> +                    w[j] += w[(j + 9) % 16] + sigma0(w[(j + 1) % 16]) +
> +                            sigma1(w[(j + 14) % 16]);
> +            }
> +        }
> +
> +        for (i = 0; i < 8; ++i) {
> +            a[i] += z[i];
> +            z[i] = a[i];
> +        }
> +
> +        if (message_reg)
> +            *message_reg += 128;
> +        else
> +            stack_buffer += 128;
> +        *len_reg -= 128;
> +    }
> +
> +    for (i = 0; i < 8; ++i)
> +        cpu_stq_be_data_ra(env, wrap_address(env, parameter_block + 8 * i), 
> z[i], ra);
> +}
> +
> +static void klmd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
> parameter_block,
> +                        uint64_t *message_reg, uint64_t *len_reg)
> +{
> +    uint8_t x[256];
> +    uint64_t i;
> +    int j;
> +
> +    kimd_sha512(env, ra, parameter_block, message_reg, len_reg, NULL);
> +    for (i = 0; i < *len_reg; ++i)
> +        x[i] = cpu_ldub_data_ra(env, wrap_address(env, *message_reg + i), 
> ra);
> +    *message_reg += *len_reg;
> +    *len_reg = 0;
> +    memset(x + i, 0, sizeof(x) - i);
> +    x[i] = 128;
> +    i = i < 112 ? 128 : 256;
> +    for (j = 0; j < 16; ++j)
> +        x[i - 16 + j] = cpu_ldub_data_ra(env, wrap_address(env, 
> parameter_block + 64 + j), ra);
> +    kimd_sha512(env, ra, parameter_block, NULL, &i, x);
> +}

Are we properly handling the length register (r2 + 1) in the
24-bit/31-bit addressing mode?

Similarly, are we properly handling updates to the message register (r2)
depending on the addressing mode?


It's worth noting that we might want to implement (also for PRNO-TRNG):

"The operation is ended when all
source bytes in the second operand have been pro-
cessed (called normal completion), or when a CPU-
determined number of blocks that is less than the
length of the second operand have been processed
(called partial completion). The CPU-determined
number of blocks depends on the model, and may be
a different number each time the instruction is exe-
cuted. The CPU-determined number of blocks is usu-
ally nonzero. In certain unusual situations, this
number may be zero, and condition code 3 may be
set with no progress."

Otherwise, a large length can make us loop quite a while in QEMU,
without the chance to deliver any other interrupts.

-- 
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]