[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code
From: |
Alexander Graf |
Subject: |
Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code |
Date: |
Wed, 28 Sep 2011 09:40:53 +0200 |
Am 27.09.2011 um 21:05 schrieb Blue Swirl <address@hidden>:
> On Tue, Sep 27, 2011 at 5:23 PM, Alexander Graf <address@hidden> wrote:
>>
>> On 27.09.2011, at 19:20, Blue Swirl wrote:
>>
>>> On Tue, Sep 27, 2011 at 5:03 PM, Alexander Graf <address@hidden> wrote:
>>>>
>>>> On 27.09.2011, at 18:53, Blue Swirl wrote:
>>>>
>>>>> On Tue, Sep 27, 2011 at 3:59 PM, Alexander Graf <address@hidden> wrote:
>>>>>>
>>>>>> On 27.09.2011, at 17:50, Blue Swirl wrote:
>>>>>>
>>>>>>> On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood <address@hidden> wrote:
>>>>>>>> On 09/24/2011 05:00 AM, Alexander Graf wrote:
>>>>>>>>> On 24.09.2011, at 10:44, Blue Swirl wrote:
>>>>>>>>>> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <address@hidden>
>>>>>>>>>> wrote:
>>>>>>>>>>> On 24.09.2011, at 09:41, Blue Swirl wrote:
>>>>>>>>>>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <address@hidden> wrote:
>>>>>>>>>>>>> The goal with the spin table stuff, suboptimal as it is, was
>>>>>>>>>>>>> something
>>>>>>>>>>>>> that would work on any powerpc implementation. Other
>>>>>>>>>>>>> implementation-specific release mechanisms are allowed, and are
>>>>>>>>>>>>> indicated by a property in the cpu node, but only if the loader
>>>>>>>>>>>>> knows
>>>>>>>>>>>>> that the OS supports it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> IIUC the spec that includes these bits is not finalized yet. It
>>>>>>>>>>>>>> is however in use on all u-boot versions for e500 that I'm aware
>>>>>>>>>>>>>> of and the method Linux uses to bring up secondary CPUs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's in ePAPR 1.0, which has been out for a while now. ePAPR 1.1
>>>>>>>>>>>>> was
>>>>>>>>>>>>> just released which clarifies some things such as WIMG.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stuart / Scott, do you have any pointers to documentation where
>>>>>>>>>>>>>> the spinning is explained?
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>>>>>>>>>>>>
>>>>>>>>>>>> Chapter 5.5.2 describes the table. This is actually an interface
>>>>>>>>>>>> between OS and Open Firmware, obviously there can't be a real
>>>>>>>>>>>> hardware
>>>>>>>>>>>> device that magically loads r3 etc.
>>>>>>>>
>>>>>>>> Not Open Firmware, but rather an ePAPR-compliant loader.
>>>>>>>
>>>>>>> 'boot program to client program interface definition'.
>>>>>>>
>>>>>>>>>>>> The device method would break abstraction layers,
>>>>>>>>
>>>>>>>> Which abstraction layers?
>>>>>>>
>>>>>>> QEMU system emulation emulates hardware, not software. Hardware
>>>>>>> devices don't touch CPU registers.
>>>>>>
>>>>>> The great part about this emulated device is that it's basically guest
>>>>>> software running in host context. To the guest, it's not a device in the
>>>>>> ordinary sense, such as vmport, but rather the same as software running
>>>>>> on another core, just that the other core isn't running any software.
>>>>>>
>>>>>> Sure, if you consider this a device, it does break abstraction layers.
>>>>>> Just consider it as host running guest code, then it makes sense :).
>>>>>>
>>>>>>>
>>>>>>>>>>>> it's much like
>>>>>>>>>>>> vmport stuff in x86. Using a hypercall would be a small
>>>>>>>>>>>> improvement.
>>>>>>>>>>>> Instead it should be possible to implement a small boot ROM which
>>>>>>>>>>>> puts
>>>>>>>>>>>> the secondary CPUs into managed halt state without spinning, then
>>>>>>>>>>>> the
>>>>>>>>>>>> boot CPU could send an IPI to a halted CPU to wake them up based on
>>>>>>>>>>>> the spin table, just like real HW would do.
>>>>>>>>
>>>>>>>> The spin table, with no IPI or halt state, is what real HW does (or
>>>>>>>> rather, what software does on real HW) today. It's ugly and
>>>>>>>> inefficient
>>>>>>>> but it should work everywhere. Anything else would be dependent on a
>>>>>>>> specific HW implementation.
>>>>>>>
>>>>>>> Yes. Hardware doesn't ever implement the spin table.
>>>>>>>
>>>>>>>>>>>> On Sparc32 OpenBIOS this
>>>>>>>>>>>> is something like a few lines of ASM on both sides.
>>>>>>>>>>>
>>>>>>>>>>> That sounds pretty close to what I had implemented in v1. Back then
>>>>>>>>>>> the only comment was to do it using this method from Scott.
>>>>>>>>
>>>>>>>> I had some comments on the actual v1 implementation as well. :-)
>>>>>>>>
>>>>>>>>>>> So we have the choice between having code inside the guest that
>>>>>>>>>>> spins, maybe even only checks every x ms, by programming a timer,
>>>>>>>>>>> or we can try to make an event out of the memory write. V1 was
>>>>>>>>>>> the former, v2 (this one) is the latter. This version performs a
>>>>>>>>>>> lot better and is easier to understand.
>>>>>>>>>>
>>>>>>>>>> The abstraction layers should not be broken lightly, I suppose some
>>>>>>>>>> performance or laziness^Wlocal optimization reasons were behind
>>>>>>>>>> vmport
>>>>>>>>>> design too. The ideal way to solve this could be to detect a spinning
>>>>>>>>>> CPU and optimize that for all architectures, that could be tricky
>>>>>>>>>> though (if a CPU remains in the same TB for extended periods, inspect
>>>>>>>>>> the TB: if it performs a loop with a single load instruction, replace
>>>>>>>>>> the load by a special wait operation for any memory stores to that
>>>>>>>>>> page).
>>>>>>>>
>>>>>>>> How's that going to work with KVM?
>>>>>>>>
>>>>>>>>> In fact, the whole kernel loading way we go today is pretty much
>>>>>>>>> wrong. We should rather do it similar to OpenBIOS where firmware
>>>>>>>>> always loads and then pulls the kernel from QEMU using a PV
>>>>>>>>> interface. At that point, we would have to implement such an
>>>>>>>>> optimization as you suggest. Or implement a hypercall :).
>>>>>>>>
>>>>>>>> I think the current approach is more usable for most purposes. If you
>>>>>>>> start U-Boot instead of a kernel, how do pass information on from the
>>>>>>>> user (kernel, rfs, etc)? Require the user to create flash images[1]?
>>>>>>>
>>>>>>> No, for example OpenBIOS gets the kernel command line from fw_cfg
>>>>>>> device.
>>>>>>>
>>>>>>>> Maybe that's a useful mode of operation in some cases, but I don't
>>>>>>>> think
>>>>>>>> we should be slavishly bound to it. Think of the current approach as
>>>>>>>> something between whole-system and userspace emulation.
>>>>>>>
>>>>>>> This is similar to ARM, M68k and Xtensa semi-hosting mode, but not at
>>>>>>> kernel level but lower. Perhaps this mode should be enabled with
>>>>>>> -semihosting flag or a new flag. Then the bare metal version could be
>>>>>>> run without the flag.
>>>>>>
>>>>>> and then we'd have 2 implementations for running in system emulation
>>>>>> mode and need to maintain both. I don't think that scales very well.
>>>>>
>>>>> No, but such hacks are not common.
>>>>>
>>>>>>>
>>>>>>>> Where does the device tree come from? How do you tell the guest about
>>>>>>>> what devices it has, especially in virtualization scenarios with
>>>>>>>> non-PCI
>>>>>>>> passthrough devices, or custom qdev instantiations?
>>>>>>>>
>>>>>>>>> But at least we'd always be running the same guest software stack.
>>>>>>>>
>>>>>>>> No we wouldn't. Any U-Boot that runs under QEMU would have to be
>>>>>>>> heavily modified, unless we want to implement a ton of random device
>>>>>>>> emulation, at least one extra memory translation layer (LAWs, localbus
>>>>>>>> windows, CCSRBAR, and such), hacks to allow locked cache lines to
>>>>>>>> operate despite a lack of backing store, etc.
>>>>>>>
>>>>>>> I'd say HW emulation business as usual. Now with the new memory API,
>>>>>>> it should be possible to emulate the caches with line locking and TLBs
>>>>>>> etc., this was not previously possible. IIRC implementing locked cache
>>>>>>> lines would allow x86 to boot unmodified coreboot.
>>>>>>
>>>>>> So how would you emulate cache lines with line locking on KVM?
>>>>>
>>>>> The cache would be a MMIO device which registers to handle all memory
>>>>> space. Configuring the cache controller changes how the device
>>>>> operates. Put this device between CPU and memory and other devices.
>>>>> Performance would probably be horrible, so CPU should disable the
>>>>> device automatically after some time.
>>>>
>>>> So how would you execute code on this region then? :)
>>>
>>> Easy, fix QEMU to allow executing from MMIO. (Yeah, I forgot about that).
>>
>> It's not quite as easy to fix KVM to do the same though unfortunately. We'd
>> have to either implement a full instruction emulator in the kernel (x86
>> style) or transfer all state from KVM into QEMU to execute it there (hell
>> breaks loose). Both alternatives are not exactly appealing.
>>
>>>
>>>>>
>>>>>> However, we already have a number of hacks in SeaBIOS to run in QEMU, so
>>>>>> I don't see an issue in adding a few here and there in u-boot. The
>>>>>> memory pressure is a real issue though. I'm not sure how we'd manage
>>>>>> that one. Maybe we could try and reuse the host u-boot binary? heh
>>>>>
>>>>> I don't think SeaBIOS breaks layering except for fw_cfg.
>>>>
>>>> I'm not saying we're breaking layering there. I'm saying that changing
>>>> u-boot is not so bad, since it's the same as we do with SeaBIOS. It was an
>>>> argument in favor of your position.
>>>
>>> Never mind then ;-)
>>>
>>>>> For extremely
>>>>> memory limited situation, perhaps QEMU (or Native KVM Tool for lean
>>>>> and mean version) could be run without glibc, inside kernel or even
>>>>> interfacing directly with the hypervisor. I'd also continue making it
>>>>> possible to disable building unused devices and features.
>>>>
>>>> I'm pretty sure you're not the only one with that goal ;).
>>>
>>> Great, let's do it.
>>
>> VGA comes first :)
>
> This patch fixes the easy parts, ISA devices remain since they are not
> qdevified. But didn't someone already send patches to do that?
> <vga-optional.patch>
Heh - I was thinking about the Mac VGA breakage :). Still looking at it. Your
patch did look correct, but something seems to go wrong with vram mapping.
Maybe.
Alex
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, (continued)
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Alexander Graf, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Blue Swirl, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Richard Henderson, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Blue Swirl, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Richard Henderson, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Blue Swirl, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Alexander Graf, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Blue Swirl, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Alexander Graf, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Blue Swirl, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code,
Alexander Graf <=
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Scott Wood, 2011/09/27
- Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code, Blue Swirl, 2011/09/27
[Qemu-devel] [PATCH 51/58] Gdbstub: handle read of fpscr, Alexander Graf, 2011/09/14
[Qemu-devel] [PATCH 47/58] Implement POWER7's CFAR in TCG, Alexander Graf, 2011/09/14
[Qemu-devel] [PATCH 11/58] PPC: Bump MPIC up to 32 supported CPUs, Alexander Graf, 2011/09/14
[Qemu-devel] [PATCH 41/58] pseries: Add real mode debugging hcalls, Alexander Graf, 2011/09/14
[Qemu-devel] [PATCH 36/58] pseries: Bugfixes for interrupt numbering in XICS code, Alexander Graf, 2011/09/14