Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on

qemu-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on

From:	Jakob Bohm
Subject:	Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Date:	Wed, 6 Jan 2016 00:10:14 +0100
User-agent:	Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

On 05/01/2016 18:35, Peter Maydell wrote:

On 4 January 2016 at 22:00, Jakob Bohm <address@hidden> wrote:

On 04/01/2016 22:29, Peter Maydell wrote:


On 4 January 2016 at 13:24, Jakob Bohm <address@hidden> wrote:
https://en.wikipedia.org/wiki/Memory_ordering#In_symmetric_multiprocessing_.28SMP.29_microprocessor_systems
lists several cases like load-after-load that ARM might
reorder but x86 forbids reordering for.)

But I haven't looked into the details beyond mentally
tagging the situation as "here be dragons" for if/when
I ever need to review any code dealing with it.


Looking briefly at that table, I am unsure which items are covered by
those first 3 lines they say are not permitted on x86, but are
permitted on ARMv7.


For instance, x86 forbids reordering of writes with other writes
(excluding a few special cases like the temporal move instructions),
and in an MP system requires that writes by one processor are
observed in the same order by other processors. ARM doesn't require
this. So for this sequence of operations:

  (initial state: both locations X and Y contain 0)

  P1:   store 1 to address X
        store 1 to address Y

  P2:   load register R1 from address Y
        load register R2 from address X

on ARM it is possible for P2 to finish with R1 == 1 and R2 == 0
(ie for P2 to observe P1's store to Y before it observes P1's
store to X). On x86 this is not permitted.

(Compare the ARMv8 ARM ARM rev A.h appendix K10.6 section
K10.6.1, and the Intel architecture reference volume 3
section 8.2.2 "Memory ordering in P6 and more recent processor
families"; in particular this is the code sequence "weakly
ordered message passing problem" in the ARM ARM and the
example 8-1 "stores not reordered with other stores" in the
x86 documentation.)

This means that if we want to emulate the x86 architecture's
memory ordering guarantees on an ARM host, we need to add
extra barriers after emulated loads and stores to enforce that
P2 does not see the stores P1 makes2 in an order that the
x86 architecture doesn't permit. (It would also be possible
to use the v8 ARM load-acquire and store-release instructions
rather than full on barriers, but on v7 I think barriers are
the only answer.)


The Load acquire/store if no conflict instruction pair was introduced
halfway through the Armv6 architecture, though it may be missing on
some non-A Armv7 cores, since it is not required for that processor
class.  Additionally, I think some ARM MMUs have page or region level
memory ordering flags, including some flag combinations that break
normal Arm synchronization instructions.

But anyway, it might be worth allowing the P5 reordering rules on x86
if that improves the situation.  It might also be worth doing some "is
the host CPU too aggressively reordering" conditionals both compile
time and runtime, switching between different TCG multi-core strategies
depending on the exact host CPU.

Another tactic could be to not let more than one virtual core have
actual access to the same page if at least one of them has write
access.  So the minority of code that actually does do multi-core data
updates to the same virtualized memory page and might thus be affected
by ordering rules would cause the emulator to constantly switch the
shared page back and forth, while most other code will just run along
nicely using shared read or exclusive write page accesses.

But in the end if x86 really makes these guarantees even in multi-
socket setups (more than one physical x86 CPU in a suitable
motherboard), despite the normal effects of caching, while ARM doesn't,

that kind of sucks. Though we shouldn't forget that those are not theonly 2 architectures involved.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, David Durham, 2016/01/03
- Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Peter Maydell, 2016/01/04
  - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Jakob Bohm, 2016/01/04
    - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Peter Maydell, 2016/01/04
    - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Jakob Bohm, 2016/01/04
    - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Peter Maydell, 2016/01/05
    - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Jakob Bohm <=
    - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Peter Maydell, 2016/01/05
    - Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host, Jakob Bohm, 2016/01/05

Prev by Date: Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Next by Date: Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Previous by thread: Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Next by thread: Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host
Index(es):
- Date
- Thread