|
From: | Alexander Graf |
Subject: | Re: [Qemu-ppc] [PATCH 8/9] spapr: Implement processor compatibility in ibm, client-architecture-support |
Date: | Mon, 19 May 2014 11:01:09 +0200 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 |
On 17.05.14 03:45, Alexey Kardashevskiy wrote:
On 05/17/2014 06:46 AM, Alexander Graf wrote:On 16.05.14 17:57, Alexey Kardashevskiy wrote:On 05/17/2014 12:16 AM, Alexander Graf wrote:On 15.05.14 13:28, Alexey Kardashevskiy wrote:Modern Linux kernels support last POWERPC CPUs so when a kernel boots, in most cases it can find a matching cpu_spec in the kernel's cpu_specs list. However if the kernel is quite old, it may be missing a definition of the actual CPU. To provide an ability for old kernels to work on modern hardware, a Processor Compatibility Mode has been introduced by the PowerISA specification. From the hardware prospective, it is supported by the Processor Compatibility Register (PCR) which is defined in PowerISA. The register enables one of the compatibility modes (2.05/2.06/2.07). Since PCR is a hypervisor privileged register and cannot be accessed from the guest, the mode selection is done via ibm,client-architecture-support (CAS) RTAS call using which the guest specifies what "raw" and "architected" CPU versions it supports. QEMU works out the best match, changes a "cpu-version" property of every CPU and notifies the guest about the change by setting these properties in the buffer passed as a response on a custom H_CAS hypercall. Signed-off-by: Alexey Kardashevskiy <address@hidden> --- hw/ppc/spapr.c | 40 +++++++++++++++++++++++++ hw/ppc/spapr_hcall.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++ trace-events | 5 ++++ 3 files changed, 128 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a2c9106..a0882a1 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -35,6 +35,7 @@ #include "kvm_ppc.h" #include "mmu-hash64.h" #include "cpu-models.h" +#include "qom/cpu.h" #include "hw/boards.h" #include "hw/ppc/ppc.h" @@ -592,11 +593,50 @@ int spapr_h_cas_compose_response(target_ulong addr, target_ulong size) { void *fdt; sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; + CPUState *cs; + int smt = kvmppc_smt_threads(); size -= sizeof(hdr); fdt = g_malloc0(size); _FDT((fdt_create(fdt, size))); + _FDT((fdt_begin_node(fdt, "cpus"))); + + CPU_FOREACH(cs) { + PowerPCCPU *cpu = POWERPC_CPU(cs); + DeviceClass *dc = DEVICE_GET_CLASS(cpu); + int smpt = spapr_get_compat_smp_threads(cpu); + int i, index = ppc_get_vcpu_dt_id(cpu); + uint32_t servers_prop[smpt]; + uint32_t gservers_prop[smpt * 2]; + char tmp[32]; + + if ((index % smt) != 0) { + continue; + } + + snprintf(tmp, 32, "address@hidden", dc->fw_name, index); + trace_spapr_cas_add_subnode(tmp); + + _FDT((fdt_begin_node(fdt, tmp))); + _FDT((fdt_property_cell(fdt, "cpu-version", cpu->cpu_version))); + + /* Build interrupt servers and gservers properties */ + for (i = 0; i < smpt; i++) { + servers_prop[i] = cpu_to_be32(index + i); + /* Hack, direct the group queues back to cpu 0 */ + gservers_prop[i*2] = cpu_to_be32(index + i); + gservers_prop[i*2 + 1] = 0; + } + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s", + servers_prop, sizeof(servers_prop)))); + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s", + gservers_prop, sizeof(gservers_prop)))); + + _FDT((fdt_end_node(fdt))); + } + + _FDT((fdt_end_node(fdt)));Why is this so much code? Can we only replace full nodes?It is a diff tree, in only has properties to change.So why do we change so much then? :)3 (three) properties are "much" now? :)Then please extract the original CPU node creation into a function and just call it from the original generation and from here. If we can also replace single properties I'd say we only need to override cpu-version.Mmm. The user could run qemu with threads=8 on POWER8 with SLES11 which must not see 8 threads but it can update itself and reboot with the kernel which is aware of POWER8. You are saying we do not want to support that?First off, I think that practically speaking people will want to use 2-thread granularity on POWER8 because that gives them the best load exploitation on the system.Practically speaking, people will want to use kernels which can work on POWER8 in raw mode, no? Here we are trying to fix older guests.So if we just disable CPU nodes that would not be visible usually (there was a disabled flag in cpu nodes, right?) for threads that are incompatible with a new CAS mode, we can then remove the disabled flag when the guest accepts more threads again. That means we could expose 8 threads (POWER8), guest only supports 2 threads (POWER6), so we waste 6 threads. But the remaining 2 will be fast ;). We basically degrade the system to SMT2 mode.We do not want to show more threads (i.e. bigger ibm,ppc-interrupt-server#s properties) if we do not support more threads as we do not really know what guest kernel (including AIX) or some ppc64_cpu/drmgr/whatever tool will do if they find more.What does pHyp do here?I wish it was easy to verify :) I fail to see what exactly your objections are all about, I definitely miss something big here. Is it a minor code duplication or some dangerous bug? If it is just a code duplication, I could do something about it.
My main concern is that this is a big chunk of magic code that swizzles around random device tree properties without an easy to follow explanation that would show people what exactly the code does.
I think you can overcome all of this by combining the function you introduce here with the one that we already have for cpu node generation. You would of course need to parameterize it.
While you're at the process of combining the functions, you will realize that some things are not exactly obvious to every reader. For example
+ if ((index % smt) != 0) { + continue; + }is based on the fact that only cores are listed in /cpus. But the code doesn't tell you, so if anyone who is not Alexey wants to debug this code and find out what's going wrong, he has to sit down for a few minutes are read up on that fact. It'd be a lot nicer to hint him towards the correct direction. This could either be a comment or reader friendly code such as
for_each_core(...) { } at which point the reader would immediately know what we're trying to do.The same goes for the interrupt servers and gservers properties. If we don't explain the format at least by variable / function names in the code, you either have to know the format off the top of your head or constantly have the sPAPR spec (which is not public) lying next to you. This increases the barrier to understanding the code.
For example, if you would just extract the interrupt server generation into a function and give it parameters such as "num_threads" and give every value a good name before you assign it to the property array, the code suddenly becomes self-documenting. If you then share that function with the normal cpu node generation, anyone who fixes a bug in it also only has to look at a single spot.
Alex
[Prev in Thread] | Current Thread | [Next in Thread] |