[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 4/5] target/ppc: Implement
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 4/5] target/ppc: Implement ISA V3.00 radix page fault handler |
Date: |
Thu, 30 Mar 2017 17:09:38 +1100 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
On Wed, Mar 29, 2017 at 04:43:48PM +1100, Suraj Jitindar Singh wrote:
> ISA V3.00 introduced a new radix mmu model. Implement the page fault
> handler for this so we can run a tcg guest in radix mode and perform
> address translation correctly.
>
> In real mode (mmu turned off) addresses are masked to remove the top
> 4 bits and then are subject to partition scoped translation, since we only
> support pseries at this stage it is only necessary to perform the masking
> and then we're done.
>
> In virtual mode (mmu turned on) address translation if performed as
> follows:
>
> 1. Use the quadrant to determine the fully qualified address.
>
> The fully qualified address is defined as the combination of the effective
> address, the effective logical partition id (LPID) and the effective
> process id (PID). Based on the quadrant (EA63:62) we set the pid and lpid
> like so:
>
> quadrant 0: lpid = LPIDR, pid = PIDR
> quadrant 1: HV only (not allowed in pseries)
> quadrant 2: HV only (not allowed in pseries)
> quadrant 3: lpid = LPIDR, pid = 0
>
> If we can't get the fully qualified address we raise a segment interrupt.
>
> 2. Find the guest radix tree
>
> We ask the virtual hypervisor for the partition table which was registered
> with H_REGISTER_PROC_TBL which points us to the process table in guest
> memory. We then index this table by pid to get the process table entry
> which points us to the appropriate radix tree to translate the address.
>
> If the process table isn't big enough to contain an entry for the current
> pid then we raise a storage interrupt.
>
> 3. Walk the radix tree
>
> Next we walk the radix tree where each level is a table of page directory
> entries indexed by some number of bits from the effective address, where
> the number of bits is determined by the table size. We continue to walk
> the tree (while entries are valid and the table is of minimum size) until
> we reach a table of page table entries, indicated by having the leaf bit
> set. The appropriate pte is then checked for sufficient access permissions,
> the reference and change bits are updated and the real address is
> calculated from the real page number bits of the pte and the low bits of
> the effective address.
>
> If we can't find an entry or can't access the entry bacause of permissions
> then we raise a storage interrupt.
>
> Signed-off-by: Suraj Jitindar Singh <address@hidden>
Looks good overall, a handful of nits and clarifications below.
>
> ---
>
> V2 -> V3:
> - Convert the remaining 1/0s to true/false which I missed
> - Move DSISR bad radix config define from radix64.h to book3s-v3.h
> V1 -> V2:
> - Use TRUE/FALSE instead of 1/0
> - Set_IS/ISEG now void with return 1 after
> - Add assert(cpu->vhyp) since there's currently no PowerNV Radix Support
> ---
> target/ppc/Makefile.objs | 1 +
> target/ppc/mmu-book3s-v3.h | 24 +++++
> target/ppc/mmu-radix64.c | 263
> +++++++++++++++++++++++++++++++++++++++++++++
> target/ppc/mmu-radix64.h | 68 ++++++++++++
> 4 files changed, 356 insertions(+)
> create mode 100644 target/ppc/mmu-radix64.c
> create mode 100644 target/ppc/mmu-radix64.h
>
> diff --git a/target/ppc/Makefile.objs b/target/ppc/Makefile.objs
> index f963777..f92ba67 100644
> --- a/target/ppc/Makefile.objs
> +++ b/target/ppc/Makefile.objs
> @@ -4,6 +4,7 @@ obj-y += translate.o
> ifeq ($(CONFIG_SOFTMMU),y)
> obj-y += machine.o mmu_helper.o mmu-hash32.o monitor.o arch_dump.o
> obj-$(TARGET_PPC64) += mmu-hash64.o mmu-book3s-v3.o compat.o
> +obj-$(TARGET_PPC64) += mmu-radix64.o
> endif
> obj-$(CONFIG_KVM) += kvm.o
> obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
> diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h
> index 636f6ab..6bfc0fa 100644
> --- a/target/ppc/mmu-book3s-v3.h
> +++ b/target/ppc/mmu-book3s-v3.h
> @@ -25,6 +25,30 @@
> /* Partition Table Entry Fields */
> #define PATBE1_GR 0x8000000000000000
>
> +/* Interrupt Fields */
> +
> +/* DSISR */
> +#define DSISR_NOPTE 0x40000000
> +/* Not permitted by access authority of encoded access authority */
> +#define DSISR_PROTFAULT 0x08000000
> +#define DSISR_ISSTORE 0x02000000
> +/* Not permitted by virtual page class key protection */
> +#define DSISR_AMR 0x00200000
> +/* Unsupported Radix Tree Configuration */
> +#define DSISR_R_BADCONFIG 0x00080000
> +
> +/* SRR1 */
> +#define SRR1_NOPTE DSISR_NOPTE
> +/* Not permitted due to no-execute or guard bit set */
> +#define SRR1_NOEXEC_GUARD 0x10000000
> +#define SRR1_PROTFAULT DSISR_PROTFAULT
> +#define SRR1_IAMR DSISR_AMR
Uh.. these partially duplicate deifnitions already in
target/ppc/cpu.h. I was suggestingmoving your new defines there,
rather than mmu-book3s-v3.h
> +/* Process Table Entry */
> +struct prtb_entry {
> + uint64_t prtbe0, prtbe1;
> +};
> +
> #ifdef TARGET_PPC64
>
> static inline bool ppc64_use_proc_tbl(PowerPCCPU *cpu)
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> new file mode 100644
> index 0000000..c407c5f
> --- /dev/null
> +++ b/target/ppc/mmu-radix64.c
> @@ -0,0 +1,263 @@
> +/*
> + * PowerPC Radix MMU mulation helpers for QEMU.
> + *
> + * Copyright (c) 2016 Suraj Jitindar Singh, IBM Corporation
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/kvm.h"
> +#include "kvm_ppc.h"
> +#include "exec/log.h"
> +#include "mmu-radix64.h"
> +#include "mmu-book3s-v3.h"
> +
> +static bool ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr
> eaddr,
> + uint64_t *lpid, uint64_t
> *pid)
> +{
> + if (msr_hv) { /* MSR[HV] -> Host */
> + /* TODO */
> + /* PowerNV ONLY */
> + error_report("RADIX PowerNV Support Unimplemented");
> + exit(1);
Just assert(!msr_hv) plus comment would be fine here. If msr_hv ends
up set at the moment, it means something has gone badly wrong
elsewhere in the qemu code.
> + } else { /* !MSR[HV] -> Guest */
> + switch (eaddr & R_EADDR_QUADRANT) {
> + case R_EADDR_QUADRANT0: /* Guest application */
> + *lpid = env->spr[SPR_LPIDR];
> + *pid = env->spr[SPR_BOOKS_PID];
> + break;
> + case R_EADDR_QUADRANT1: /* Illegal */
> + case R_EADDR_QUADRANT2:
> + return false;
> + case R_EADDR_QUADRANT3: /* Guest OS */
> + *lpid = env->spr[SPR_LPIDR];
> + *pid = 0; /* pid set to 0 -> addresses guest operating system */
> + break;
> + }
> + }
> + return true;
> +}
> +
> +static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr eaddr)
> +{
> + CPUState *cs = CPU(cpu);
> + CPUPPCState *env = &cpu->env;
> +
> + if (rwx == 2) { /* Instruction Segment Interrupt */
> + cs->exception_index = POWERPC_EXCP_ISEG;
> + } else { /* Data Segment Interrupt */
> + cs->exception_index = POWERPC_EXCP_DSEG;
> + env->spr[SPR_DAR] = eaddr;
> + }
> + env->error_code = 0;
> +}
> +
> +static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, vaddr eaddr,
> + uint32_t cause)
> +{
> + CPUState *cs = CPU(cpu);
> + CPUPPCState *env = &cpu->env;
> +
> + if (rwx == 2) { /* Instruction Storage Interrupt */
> + cs->exception_index = POWERPC_EXCP_ISI;
> + env->error_code = cause;
> + } else { /* Data Storage Interrupt */
> + cs->exception_index = POWERPC_EXCP_DSI;
> + if (rwx == 1) { /* Write -> Store */
> + cause |= DSISR_ISSTORE;
> + }
> + env->spr[SPR_DSISR] = cause;
> + env->spr[SPR_DAR] = eaddr;
> + env->error_code = 0;
> + }
> +}
> +
> +
> +static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int rwx, uint64_t pte,
> + int *fault_cause, int *prot)
> +{
> + CPUPPCState *env = &cpu->env;
> + const int pte_att[] = { 0x02, 0x0E, 0x07, 0x06 };
> + const int need_prot[] = { PAGE_READ, PAGE_WRITE, PAGE_EXEC };
> + int att = pte_att[pte & R_PTE_ATT];
> +
> + /* Check Page Attributes (pte58:59) */
> + if ((att & R_PTE_ATT_G) && (rwx == 2)) { /* Guarded Storage */
> + *fault_cause |= SRR1_NOEXEC_GUARD;
This seems a bit roundabout. R_PTE_ATT_G is - confusingly - not a bit
actually within the R_PTE_ATT, but rather a bit in some decoded form.
It's not really clear what the decoding of R_PTE_ATT -> pte_att means.
Since this test amounts to just checking for a single value of the
R_PTE_ATT field, it's not clear to me why the decoding is useful.
> + return true;
> + }
> + /* I think we can ignore the other attributes?! */
> +
> + /* Determine permissions allowed by Encoded Access Authority */
> + if ((pte & R_PTE_EAA_PRIV) && msr_pr) { /* Insufficient Privilege */
> + *prot = 0;
> + } else if (msr_pr || (pte & R_PTE_EAA_PRIV)) {
> + *prot = ppc_radix64_get_prot_eaa(pte);
> + } else { /* !msr_pr && !(pte & R_PTE_EAA_PRIV) */
> + *prot = ppc_radix64_get_prot_eaa(pte);
> + *prot &= ppc_radix64_get_prot_amr(cpu); /* Least combined
> permissions */
So, if I'm reading this right, the AMR mask only applies to privileged
accesses to user PTEs. Is that right?
> + }
> +
> + /* Check if requested access type is allowed */
> + if (need_prot[rwx] & ~(*prot)) { /* Page Protected for that Access */
> + *fault_cause |= DSISR_PROTFAULT;
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t pte,
> + hwaddr pte_addr, int *prot)
> +{
> + CPUState *cs = CPU(cpu);
> + uint64_t npte;
> +
> + npte = pte | R_PTE_R; /* Always set reference bit */
> +
> + if (rwx == 1) { /* Store/Write */
> + npte |= R_PTE_C; /* Set change bit */
> + } else {
> + /*
> + * Treat the page as read-only for now, so that a later write
> + * will pass through this function again to set the C bit.
> + */
> + *prot &= ~PAGE_WRITE;
> + }
> +
> + if (pte ^ npte) { /* If pte has changed then write it back */
> + stq_phys(cs->as, pte_addr, npte);
> + }
> +}
> +
> +static bool ppc_radix64_walk_tree(PowerPCCPU *cpu, int rwx, vaddr eaddr,
> + uint64_t base_addr, uint64_t nls,
> + hwaddr *raddr, int *psize, int
> *fault_cause,
> + int *prot)
> +{
> + CPUState *cs = CPU(cpu);
> + uint64_t index, pde;
> +
> + if (nls < 5) { /* Directory maps less than 2**5 entries */
> + *fault_cause |= DSISR_R_BADCONFIG;
> + return true;
> + }
> +
> + /* Read page <directory/table> entry from guest address space */
> + index = eaddr >> (*psize - nls); /* Shift */
> + index &= ((1UL << nls) - 1); /* Mask */
> + pde = ldq_phys(cs->as, base_addr + (index * sizeof(pde)));
> + if (!(pde & R_PTE_VALID)) { /* Invalid Entry */
> + *fault_cause |= DSISR_NOPTE;
> + return true;
> + }
> +
> + *psize -= nls;
> +
> + /* Check if Leaf Entry -> Page Table Entry -> Stop the Search */
> + if (pde & R_PTE_LEAF) {
> + uint64_t rpn = pde & R_PTE_RPN;
> + uint64_t mask = (1UL << *psize) - 1;
> +
> + if (ppc_radix64_check_prot(cpu, rwx, pde, fault_cause, prot)) {
> + return true; /* Protection Denied Access */
> + }
> +
> + /* Update Reference and Change Bits */
> + ppc_radix64_set_rc(cpu, rwx, pde, base_addr + (index * sizeof(pde)),
> + prot);
One nit to think about. At some point you'll want to implement the
get_phys_page_debug() hookso that qemu's gdbstub works. For the most
part you'll want to re-use this function for that - but the debug
version should not update the RC bits.
> + /* Or high bits of rpn and low bits to ea to form whole real addr */
> + *raddr = (rpn & ~mask) | (eaddr & mask);
> + return false;
> + } else { /* Next Level of Radix Tree */
> + return ppc_radix64_walk_tree(cpu, rwx, eaddr, pde & R_PDE_NLB,
> + pde & R_PDE_NLS, raddr, psize,
> + fault_cause, prot);
> + }
> +}
> +
> +int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
> + int mmu_idx)
> +{
> + CPUState *cs = CPU(cpu);
> + CPUPPCState *env = &cpu->env;
> + PPCVirtualHypervisorClass *vhc =
> + PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> + hwaddr raddr;
> + uint64_t lpid = 0, pid = 0, offset, size, patbe, prtbe0;
> + int page_size, prot, fault_cause = 0;
> +
> + assert((rwx == 0) || (rwx == 1) || (rwx == 2));
> + assert(cpu->vhyp); /* For now there is no Radix PowerNV Support */
> +
> + /* Real Mode Access */
> + if (((rwx == 2) && (msr_ir == 0)) || ((rwx != 2) && (msr_dr == 0))) {
> + /* In real mode top 4 effective addr bits (mostly) ignored */
> + raddr = eaddr & 0x0FFFFFFFFFFFFFFFULL;
> +
> + if (msr_hv) {
> + /* if EA0 == 0, raddr = eaddr4:63 | HRMOR */
> + if (!(eaddr >> 63)) {
> + raddr |= env->spr[SPR_HRMOR];
> + }
> + } else {
> + /* TODO */
> + /* If PowerNV then do partition scoped translation */
> + }
Again, you might as well just assert(!msr_hv) here.
This probably isn't actually where you want to do partition scoped
translation for powernv eventually anyway: you probably want to have
one function do the guest effective to real translation (handling both
guest IR/DR states), then regardless of guest MMU state do the guest
real to host real translation.
> +
> + tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
> + PAGE_READ | PAGE_WRITE | PAGE_EXEC, mmu_idx,
> + TARGET_PAGE_SIZE);
> + return 0;
> + }
> +
> + /* Virtual Mode Access - get the fully qualified address */
> + if (!ppc_radix64_get_fully_qualified_addr(env, eaddr, &lpid, &pid)) {
> + ppc_radix64_raise_segi(cpu, rwx, eaddr);
> + return 1;
> + }
> +
> + /* Get Process Table */
> + patbe = vhc->get_patbe(cpu->vhyp);
> +
> + /* Index Process Table by PID to Find Corresponding Process Table Entry
> */
> + offset = pid * sizeof(struct prtb_entry);
> + size = 1ULL << ((patbe & PATBE1_R_PRTS) + 12);
> + if (offset >= size) {
> + /* offset exceeds size of the process table */
> + ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE);
> + return 1;
> + }
> + prtbe0 = ldq_phys(cs->as, (patbe & PATBE1_R_PRTB) + offset);
> +
> + /* Walk Radix Tree from Process Table Entry to Convert EA to RA */
> + page_size = PRTBE_R_GET_RTS(prtbe0);
> + if (ppc_radix64_walk_tree(cpu, rwx, eaddr & R_EADDR_MASK,
> + prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
> + &raddr, &page_size, &fault_cause, &prot)) {
> + ppc_radix64_raise_si(cpu, rwx, eaddr, fault_cause);
> + return 1;
> + }
> +
> + tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
> + prot, mmu_idx, 1UL << page_size);
> + return 1;
> +}
> diff --git a/target/ppc/mmu-radix64.h b/target/ppc/mmu-radix64.h
> new file mode 100644
> index 0000000..028c915
> --- /dev/null
> +++ b/target/ppc/mmu-radix64.h
> @@ -0,0 +1,68 @@
> +#ifndef MMU_RADIX64_H
> +#define MMU_RADIX64_H
> +
> +#ifndef CONFIG_USER_ONLY
> +
> +/* Radix Quadrants */
> +#define R_EADDR_MASK 0x3FFFFFFFFFFFFFFF
> +#define R_EADDR_QUADRANT 0xC000000000000000
> +#define R_EADDR_QUADRANT0 0x0000000000000000
> +#define R_EADDR_QUADRANT1 0x4000000000000000
> +#define R_EADDR_QUADRANT2 0x8000000000000000
> +#define R_EADDR_QUADRANT3 0xC000000000000000
> +
> +/* Radix Partition Table Entry Fields */
> +#define PATBE1_R_PRTB 0x0FFFFFFFFFFFF000
> +#define PATBE1_R_PRTS 0x000000000000001F
> +
> +/* Radix Process Table Entry Fields */
> +#define PRTBE_R_GET_RTS(rts) (((rts >> 58) & 0x18) | ((rts >> 5) & 0x7))
> + 31
> +#define PRTBE_R_RPDB 0x0FFFFFFFFFFFFF00
> +#define PRTBE_R_RPDS 0x000000000000001F
> +
> +/* Radix Page Directory/Table Entry Fields */
> +#define R_PTE_VALID 0x8000000000000000
> +#define R_PTE_LEAF 0x4000000000000000
> +#define R_PTE_SW0 0x2000000000000000
> +#define R_PTE_RPN 0x01FFFFFFFFFFF000
> +#define R_PTE_SW1 0x0000000000000E00
> +#define R_GET_SW(sw) (((sw >> 58) & 0x8) | ((sw >> 9) & 0x7))
> +#define R_PTE_R 0x0000000000000100
> +#define R_PTE_C 0x0000000000000080
> +#define R_PTE_ATT 0x0000000000000030
> +#define R_PTE_ATT_G 0x1
> +#define R_PTE_EAA_PRIV 0x0000000000000008
> +#define R_PTE_EAA_R 0x0000000000000004
> +#define R_PTE_EAA_RW 0x0000000000000002
> +#define R_PTE_EAA_X 0x0000000000000001
> +#define R_PDE_NLB PRTBE_R_RPDB
> +#define R_PDE_NLS PRTBE_R_RPDS
> +
> +#ifdef TARGET_PPC64
> +
> +int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
> + int mmu_idx);
> +
> +static inline int ppc_radix64_get_prot_eaa(uint64_t pte)
> +{
> + return (pte & R_PTE_EAA_R ? PAGE_READ : 0) |
> + (pte & R_PTE_EAA_RW ? PAGE_READ | PAGE_WRITE : 0) |
> + (pte & R_PTE_EAA_X ? PAGE_EXEC : 0);
> +}
> +
> +static inline int ppc_radix64_get_prot_amr(PowerPCCPU *cpu)
> +{
> + CPUPPCState *env = &cpu->env;
> + int amr = env->spr[SPR_AMR] >> 62; /* We only care about key0 AMR63:62 */
> + int iamr = env->spr[SPR_IAMR] >> 62; /* We only care about key0
> IAMR63:62 */
> +
> + return (amr & 0x2 ? 0 : PAGE_WRITE) | /* Access denied if bit is set */
> + (amr & 0x1 ? 0 : PAGE_READ) |
> + (iamr & 0x1 ? 0 : PAGE_EXEC);
> +}
> +
> +#endif /* TARGET_PPC64 */
> +
> +#endif /* CONFIG_USER_ONLY */
> +
> +#endif /* MMU_RADIX64_H */
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 1/5] target/ppc: Add ibm, processor-radix-AP-encodings for TCG, Suraj Jitindar Singh, 2017/03/29
- [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 2/5] target/ppc: Flush TLB on write to PIDR, Suraj Jitindar Singh, 2017/03/29
- [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 3/5] target/ppc: Adapt tlbie[l] for ISAv3.00 Support, Suraj Jitindar Singh, 2017/03/29
- [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 5/5] target/ppc: Enable RADIX mmu mode for pseries TCG guest, Suraj Jitindar Singh, 2017/03/29
- [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 4/5] target/ppc: Implement ISA V3.00 radix page fault handler, Suraj Jitindar Singh, 2017/03/29
- Re: [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 4/5] target/ppc: Implement ISA V3.00 radix page fault handler,
David Gibson <=
- Re: [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 1/5] target/ppc: Add ibm, processor-radix-AP-encodings for TCG, David Gibson, 2017/03/30