qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on


From: Peter Maydell
Subject: Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Date: Tue, 5 Jan 2016 17:35:30 +0000

On 4 January 2016 at 22:00, Jakob Bohm <address@hidden> wrote:
> On 04/01/2016 22:29, Peter Maydell wrote:
>>
>> On 4 January 2016 at 13:24, Jakob Bohm <address@hidden> wrote:
>> https://en.wikipedia.org/wiki/Memory_ordering#In_symmetric_multiprocessing_.28SMP.29_microprocessor_systems
>> lists several cases like load-after-load that ARM might
>> reorder but x86 forbids reordering for.)
>>
>> But I haven't looked into the details beyond mentally
>> tagging the situation as "here be dragons" for if/when
>> I ever need to review any code dealing with it.
>>
>
> Looking briefly at that table, I am unsure which items are covered by
> those first 3 lines they say are not permitted on x86, but are
> permitted on ARMv7.

For instance, x86 forbids reordering of writes with other writes
(excluding a few special cases like the temporal move instructions),
and in an MP system requires that writes by one processor are
observed in the same order by other processors. ARM doesn't require
this. So for this sequence of operations:

 (initial state: both locations X and Y contain 0)

 P1:   store 1 to address X
       store 1 to address Y

 P2:   load register R1 from address Y
       load register R2 from address X

on ARM it is possible for P2 to finish with R1 == 1 and R2 == 0
(ie for P2 to observe P1's store to Y before it observes P1's
store to X). On x86 this is not permitted.

(Compare the ARMv8 ARM ARM rev A.h appendix K10.6 section
K10.6.1, and the Intel architecture reference volume 3
section 8.2.2 "Memory ordering in P6 and more recent processor
families"; in particular this is the code sequence "weakly
ordered message passing problem" in the ARM ARM and the
example 8-1 "stores not reordered with other stores" in the
x86 documentation.)

This means that if we want to emulate the x86 architecture's
memory ordering guarantees on an ARM host, we need to add
extra barriers after emulated loads and stores to enforce that
P2 does not see the stores P1 makes2 in an order that the
x86 architecture doesn't permit. (It would also be possible
to use the v8 ARM load-acquire and store-release instructions
rather than full on barriers, but on v7 I think barriers are
the only answer.)

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]