qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros referenced in i


From: Richard Henderson
Subject: Re: [RFC PATCH v3 26/34] Hexagon (target/hexagon) macros referenced in instruction semantics
Date: Thu, 8 Oct 2020 15:02:19 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 10/8/20 1:51 PM, Taylor Simpson wrote:
>> How can it have done without doing the merging by hand.  Which this 
>> operation ordering is intended to make unnecessary.
>>
>> I think you're missing the point.
> 
> Sorry I wasn't clear.  We have done the load from memory as it was at the 
> beginning of the packet.  The result of the store is in mem_log_stores in 
> CPUHexagonState.  This code updates the bytes that were loaded with any 
> overlapping bytes from the store that hasn't been committed yet.

Right, so you *are* missing the point.

The point is to *not* do the load earlier, but only probe the memory for
readability so that any exception is recognized before the store commits.

Then, after the store, actually perform the load.  Thus any overlapping bytes
get the values that they should.

Voila, no by-hand merging.

>     67f8:       c0 40 21 1f     1f2140c0 {      v0.uh = vsat(v0.uw,v1.uw)
>     67fc:       00 45 02 a1     a1024500        memb(r2+#0) = r5
>     6800:       02 c0 04 91     9104c002        r2 = memb(r4+#0) }
> 
> The vsat instruction requires a vector context.  If the thread doesn't have a 
> vector context, an exception will be raised.  The OS can provide a context 
> and replay the packet.

Sure.

Is there any per-packet exception priority that would require a fault from the
store to be provided in preference to the fault for the vector context?

Anyway, I'm suggesting ordering the operations within the packet to be one
that's most convenient for us.

>>> store_val = le32_to_cpu(store_val);
>>> load_val = le32_to_cpu(load_val);
>>> <merge bytes>
>>> /* store_val is dead so no need to convert back */
>>> load_val = cpu_to_le32(load_val)
>>
>> There's some misuse of cpu_to_le32 vs le32_to_cpu there (I've never liked
>> those
>> names, but it helps to think about what form the data is already in).
> 
> So, what is the right sequence?

Well, <merge_bytes> wants to operate on a le ordering, so the final load_val
assignment should use le32_to_cpu.  Think about this in terms of units, like
Fahrenheit vs Celsius.

As for the other two, it depends on where the values come from.  Probably they
should be cpu_to_le32, but I can't tell without extra context.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]