> +void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b)
> +{
> + int i;
> + uint8_t s = 0;
> + for (i = 0; i < 16; i++) {
> + s ^= (b->u8[i] & 1);
> + }
> + r->u64[LO_IDX] = (!s) ? 0 : 1;
> + r->u64[HI_IDX] = 0;
> +}
> +
I think you can implement these better. First mask with 0x01010101
(of the appropriate length) to extract the LSB bits of each byte.
Then XOR the two halves together, then quarters and so forth,
ln2(size) times to arrive at the parity. This is similar to the usual
Hamming weight implementation.