|
From: | Jakob Bohm |
Subject: | Re: [Qemu-discuss] Getting qemu-system-i386 to use more than one core on Cortex A7 host |
Date: | Mon, 4 Jan 2016 14:24:20 +0100 |
User-agent: | Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
On 04/01/2016 13:21, Peter Maydell
wrote:
On 3 January 2016 at 20:57, David Durham <address@hidden> wrote:Any suggestions or comments on how to do this are very welcome ... I built qemu with --target-list i386-softmmu and when I run qemu, top only shows one qemu-system-i386 using 100% of one coreThis is expected. Our current emulation is single threaded even when emulating multiple target CPUs, so we'll only use one host core. (We do have some helper threads for a few IO tasks etc but those are not cpu-bound.) There is some development work in progress to try to make better use of multi-core hosts but it's not very far advanced yet. (Also emulating x86 guests on arm hosts with multiple cpus might not ever be supported because the x86 memory model would require barriers everywhere and it's not clear it would overall improve performance. ARM-on-x86 is the primary initial usecase.) thanks -- PMM For your information, the x86 memory model only requires barriers in the following cases (this is somewhat implemented on modern machines with multiple actual x86 CPU sockets, as opposed to multicore chips, it may also be observed when using any kind of DMA/bus-master hardware such as GPUs): 1. Instructions with the explicit "LOCK" prefix, these require a memory barrier, then a locked read-modify-write on a single address, then another memory barrier. 2. Explicit memory barrier instructions (there have been a few over the years). 3. Some of the XCHG-family instructions implicitly behave as though there was a LOCK in front. 4. On modern CPUs, the floating point ("ESC") instructions are treated as normal instructions, the related historic "WAIT" opcode is now a NOP (optionally throwing an "FPU disabled" exception), (on 386 and older, floating point instructions might postpone their memory writes to any point up to and including the next same-CPU WAIT, but this was never a multi-CPU barrier, just synchronization between the CPU and FPU chips within each two-chip CPU). 5. Some specific operations (see the architecture manuals) typically associated with cache management, system calls and/or thread switching also act as barriers. 6. Only a minority of instructions flush the instruction decode (and hence TCG translation) buffers, though for highest consistency any actual write to a memory page with code should cause the translation of that code to be discarded from cache. 7. If doing cycle-accurate bug-for-bug emulation of specific CPU models, it might be necessary to exactly model the implicit size limitations of their various caches, such as how many page table entries are cached by the on-CPU TLB or how many bytes ahead the instruction decoder may look. But I don't think that is a qemu feature anyway. This still leaves the majority of code not doing memory barriers. Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded |
[Prev in Thread] | Current Thread | [Next in Thread] |