[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
From: |
Nikunj A Dadhania |
Subject: |
Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation |
Date: |
Wed, 21 Sep 2016 09:14:42 +0530 |
User-agent: |
Notmuch/0.21 (https://notmuchmail.org) Emacs/25.0.94.1 (x86_64-redhat-linux-gnu) |
David Gibson <address@hidden> writes:
> [ Unknown signature status ]
> On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote:
>> David Gibson <address@hidden> writes:
>>
>> > [ Unknown signature status ]
>> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
>> >> David Gibson <address@hidden> writes:
>> >> > [ Unknown signature status ]
>> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c
>> >> >> > b/target-ppc/translate/vsx-impl.inc.c
>> >> >> > index eee6052..df278df 100644
>> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c
>> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >> >> > static void gen_lxvw4x(DisasContext *ctx)
>> >> >> > {
>> >> >> > TCGv EA;
>> >> >> > - TCGv_i64 tmp;
>> >> >> > TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >> >> > TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >> >> > if (unlikely(!ctx->vsx_enabled)) {
>> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >> >> > }
>> >> >> > gen_set_access_type(ctx, ACCESS_INT);
>> >> >> > EA = tcg_temp_new();
>> >> >> > - tmp = tcg_temp_new_i64();
>> >> >> >
>> >> >> > gen_addr_reg_index(ctx, EA);
>> >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> >> > - tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > - gen_qemu_ld32u_i64(ctx, xth, EA);
>> >> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> >> >> > -
>> >> >> > - tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> >> > - tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA);
>> >> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> >> >> > -
>> >> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> >> >> > + gen_helper_deposit32x2(xth, xth);
>> >> >> > + tcg_gen_addi_tl(EA, EA, 8);
>> >> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> >> >> > + gen_helper_deposit32x2(xtl, xtl);
>> >> >
>> >> > ..and I think this is wrong for BE mode. The deposit32x2 will get the
>> >> > words in the right order, but the bytes within each word will be wrong
>> >> > because of the LE mode load on a BE setup.
>> >>
>> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
>> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
>> >> The order within the word is not changed. Snippet of the test code at
>> >> the end of email. Can share full code if needed (maybe will do it in
>> >> kvm-unit-test)
>> >
>> > Ugh.. now I'm confused. I would not have expected the results you've
>> > seen from these tests. But I still can't understand *how* the
>> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8
>> > bytes as a single 64-bit LE integer.
>>
>> For both the case LE/BE we do a LE read ...
>
> .. and I can't see how that can be right for the BE case.
>
>> > Which should be the same as
>> > loading one 32-bit LE integer into the low half of the target
>> > register, then a 32-bit LE integer into the high half ot the target
>> > register.
>>
>> .. The 64-bit integer read is not same in these cases. The input itself
>> would be in the order of the format.
>>
>> Input rb32[]: 00010203 20212223 30313233 40414243
>>
>> LE:
>> helper_deposit32x2: 2021222300010203
>> helper_deposit32x2: 4041424330313233
>>
>> BE
>> helper_deposit32x2: 2322212003020100
>> helper_deposit32x2: 4342414033323130
>
>
> Sorry.. I can't really follow the above, because I'm not sure if
> you're displaying the bytes within each word in significance order, or
> increasing-address order.
Ah, thats just a print inside the helper just to check what MO_LEQ returned:
uint64_t helper_deposit32x2(uint64_t x)
{
fprintf(stderr, "%s: %016lx\n", __func__, x);
return deposit64((x >> 32), 32, 32, (x));
}
>
>> >
>> > As I said above, the deposit32x2 will swap the order of the two ints,
>> > but it won't byteswap the individual int32s which should have been BE
>> > in memory.
>> >
>> > Can you find the flaw in my reasoning?
>>
>> One anomaly that I see in BE code generation: it also generates a
>> stxvw4x after lxvw4x. I am not sure why.
>
> Ah... see I'm wondering if it's using the stxvw4x to store back to the
> union which you then get the results from. If that's so it could
> explain the results, since the bug I suspect is in lxvw4x would be
> cancelled out by the corresponding bug in stxv4wx, which is exactly
> why I'd prefer the approach to testing mentioned below.
Yes, I am investigating it.
>> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
>> Input rb32[]: 00010203 20212223 30313233 40414243
>>
>> gen_lxvw4x: called
>> helper_deposit32x2: 2322212003020100
>> helper_deposit32x2: 4342414033323130
>> gen_stxvw4x: called
>> helper_deposit32x2: 0302010023222120
>> helper_deposit32x2: 3332313043424140
>> Output VRT32: 00010203 20212223 30313233 40414243
>>
>> >> vsx.h:
>> >> ======
>> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
>> >>
>> >> typedef union {
>> >> __vector uint32_t v;
>> >> uint32_t a[U32_SIZE];
>> >> } vuint32_t;
>> >
>> > I am a little suspicious that whatever the compiler does to convert
>> > the vector to an array via this union might be undoing a byte reverse.
>> >
>> > I'd be more confident if you used VSX instructions to extract and
>> > store separately one of the 32-bit subwords of the vector.
>>
>> I will try to figure those instructions.
>
> Ok, thanks.
>
Regards,
Nikunj
- [Qemu-ppc] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending), Nikunj A Dadhania, 2016/09/16
- [Qemu-ppc] [PATCH v3 1/5] target-ppc: implement darn instruction, Nikunj A Dadhania, 2016/09/16
- [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/16
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/19
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/19
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/19
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/20
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/20
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, David Gibson, 2016/09/20
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation,
Nikunj A Dadhania <=
- Re: [Qemu-ppc] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation, Nikunj A Dadhania, 2016/09/19
[Qemu-ppc] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x, Nikunj A Dadhania, 2016/09/16
[Qemu-ppc] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation, Nikunj A Dadhania, 2016/09/16
[Qemu-ppc] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x, Nikunj A Dadhania, 2016/09/16
Re: [Qemu-ppc] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending), David Gibson, 2016/09/19