qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 4/5] target/ppc: Implement


From: David Gibson
Subject: Re: [Qemu-ppc] [Qemu-ppc for-2.10] [PATCH V3 4/5] target/ppc: Implement ISA V3.00 radix page fault handler
Date: Thu, 30 Mar 2017 17:09:38 +1100
User-agent: Mutt/1.8.0 (2017-02-23)

On Wed, Mar 29, 2017 at 04:43:48PM +1100, Suraj Jitindar Singh wrote:
> ISA V3.00 introduced a new radix mmu model. Implement the page fault
> handler for this so we can run a tcg guest in radix mode and perform
> address translation correctly.
> 
> In real mode (mmu turned off) addresses are masked to remove the top
> 4 bits and then are subject to partition scoped translation, since we only
> support pseries at this stage it is only necessary to perform the masking
> and then we're done.
> 
> In virtual mode (mmu turned on) address translation if performed as
> follows:
> 
> 1. Use the quadrant to determine the fully qualified address.
> 
> The fully qualified address is defined as the combination of the effective
> address, the effective logical partition id (LPID) and the effective
> process id (PID). Based on the quadrant (EA63:62) we set the pid and lpid
> like so:
> 
> quadrant 0: lpid = LPIDR, pid = PIDR
> quadrant 1: HV only (not allowed in pseries)
> quadrant 2: HV only (not allowed in pseries)
> quadrant 3: lpid = LPIDR, pid = 0
> 
> If we can't get the fully qualified address we raise a segment interrupt.
> 
> 2. Find the guest radix tree
> 
> We ask the virtual hypervisor for the partition table which was registered
> with H_REGISTER_PROC_TBL which points us to the process table in guest
> memory. We then index this table by pid to get the process table entry
> which points us to the appropriate radix tree to translate the address.
> 
> If the process table isn't big enough to contain an entry for the current
> pid then we raise a storage interrupt.
> 
> 3. Walk the radix tree
> 
> Next we walk the radix tree where each level is a table of page directory
> entries indexed by some number of bits from the effective address, where
> the number of bits is determined by the table size. We continue to walk
> the tree (while entries are valid and the table is of minimum size) until
> we reach a table of page table entries, indicated by having the leaf bit
> set. The appropriate pte is then checked for sufficient access permissions,
> the reference and change bits are updated and the real address is
> calculated from the real page number bits of the pte and the low bits of
> the effective address.
> 
> If we can't find an entry or can't access the entry bacause of permissions
> then we raise a storage interrupt.
> 
> Signed-off-by: Suraj Jitindar Singh <address@hidden>

Looks good overall, a handful of nits and clarifications below.

> 
> ---
> 
> V2 -> V3:
>  - Convert the remaining 1/0s to true/false which I missed
>  - Move DSISR bad radix config define from radix64.h to book3s-v3.h
> V1 -> V2:
>  - Use TRUE/FALSE instead of 1/0
>  - Set_IS/ISEG now void with return 1 after
>  - Add assert(cpu->vhyp) since there's currently no PowerNV Radix Support
> ---
>  target/ppc/Makefile.objs   |   1 +
>  target/ppc/mmu-book3s-v3.h |  24 +++++
>  target/ppc/mmu-radix64.c   | 263 
> +++++++++++++++++++++++++++++++++++++++++++++
>  target/ppc/mmu-radix64.h   |  68 ++++++++++++
>  4 files changed, 356 insertions(+)
>  create mode 100644 target/ppc/mmu-radix64.c
>  create mode 100644 target/ppc/mmu-radix64.h
> 
> diff --git a/target/ppc/Makefile.objs b/target/ppc/Makefile.objs
> index f963777..f92ba67 100644
> --- a/target/ppc/Makefile.objs
> +++ b/target/ppc/Makefile.objs
> @@ -4,6 +4,7 @@ obj-y += translate.o
>  ifeq ($(CONFIG_SOFTMMU),y)
>  obj-y += machine.o mmu_helper.o mmu-hash32.o monitor.o arch_dump.o
>  obj-$(TARGET_PPC64) += mmu-hash64.o mmu-book3s-v3.o compat.o
> +obj-$(TARGET_PPC64) += mmu-radix64.o
>  endif
>  obj-$(CONFIG_KVM) += kvm.o
>  obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
> diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h
> index 636f6ab..6bfc0fa 100644
> --- a/target/ppc/mmu-book3s-v3.h
> +++ b/target/ppc/mmu-book3s-v3.h
> @@ -25,6 +25,30 @@
>  /* Partition Table Entry Fields */
>  #define PATBE1_GR 0x8000000000000000
>  
> +/* Interrupt Fields */
> +
> +/* DSISR */
> +#define DSISR_NOPTE              0x40000000
> +/* Not permitted by access authority of encoded access authority */
> +#define DSISR_PROTFAULT          0x08000000
> +#define DSISR_ISSTORE            0x02000000
> +/* Not permitted by virtual page class key protection */
> +#define DSISR_AMR                0x00200000
> +/* Unsupported Radix Tree Configuration */
> +#define DSISR_R_BADCONFIG        0x00080000
> +
> +/* SRR1 */
> +#define SRR1_NOPTE               DSISR_NOPTE
> +/* Not permitted due to no-execute or guard bit set */
> +#define SRR1_NOEXEC_GUARD        0x10000000
> +#define SRR1_PROTFAULT           DSISR_PROTFAULT
> +#define SRR1_IAMR                DSISR_AMR

Uh.. these partially duplicate deifnitions already in
target/ppc/cpu.h.  I was suggestingmoving your new defines there,
rather than mmu-book3s-v3.h

> +/* Process Table Entry */
> +struct prtb_entry {
> +    uint64_t prtbe0, prtbe1;
> +};
> +
>  #ifdef TARGET_PPC64
>  
>  static inline bool ppc64_use_proc_tbl(PowerPCCPU *cpu)
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> new file mode 100644
> index 0000000..c407c5f
> --- /dev/null
> +++ b/target/ppc/mmu-radix64.c
> @@ -0,0 +1,263 @@
> +/*
> + *  PowerPC Radix MMU mulation helpers for QEMU.
> + *
> + *  Copyright (c) 2016 Suraj Jitindar Singh, IBM Corporation
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see 
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/kvm.h"
> +#include "kvm_ppc.h"
> +#include "exec/log.h"
> +#include "mmu-radix64.h"
> +#include "mmu-book3s-v3.h"
> +
> +static bool ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr 
> eaddr,
> +                                                 uint64_t *lpid, uint64_t 
> *pid)
> +{
> +    if (msr_hv) { /* MSR[HV] -> Host */
> +        /* TODO */
> +        /* PowerNV ONLY */
> +        error_report("RADIX PowerNV Support Unimplemented");
> +        exit(1);

Just assert(!msr_hv) plus comment would be fine here.  If msr_hv ends
up set at the moment, it means something has gone badly wrong
elsewhere in the qemu code.

> +    } else { /* !MSR[HV] -> Guest */
> +        switch (eaddr & R_EADDR_QUADRANT) {
> +        case R_EADDR_QUADRANT0: /* Guest application */
> +            *lpid = env->spr[SPR_LPIDR];
> +            *pid = env->spr[SPR_BOOKS_PID];
> +            break;
> +        case R_EADDR_QUADRANT1: /* Illegal */
> +        case R_EADDR_QUADRANT2:
> +            return false;
> +        case R_EADDR_QUADRANT3: /* Guest OS */
> +            *lpid = env->spr[SPR_LPIDR];
> +            *pid = 0; /* pid set to 0 -> addresses guest operating system */
> +            break;
> +        }
> +    }
> +    return true;
> +}
> +
> +static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr eaddr)
> +{
> +    CPUState *cs = CPU(cpu);
> +    CPUPPCState *env = &cpu->env;
> +
> +    if (rwx == 2) { /* Instruction Segment Interrupt */
> +        cs->exception_index = POWERPC_EXCP_ISEG;
> +    } else { /* Data Segment Interrupt */
> +        cs->exception_index = POWERPC_EXCP_DSEG;
> +        env->spr[SPR_DAR] = eaddr;
> +    }
> +    env->error_code = 0;
> +}
> +
> +static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, vaddr eaddr,
> +                                uint32_t cause)
> +{
> +    CPUState *cs = CPU(cpu);
> +    CPUPPCState *env = &cpu->env;
> +
> +    if (rwx == 2) { /* Instruction Storage Interrupt */
> +        cs->exception_index = POWERPC_EXCP_ISI;
> +        env->error_code = cause;
> +    } else { /* Data Storage Interrupt */
> +        cs->exception_index = POWERPC_EXCP_DSI;
> +        if (rwx == 1) { /* Write -> Store */
> +            cause |= DSISR_ISSTORE;
> +        }
> +        env->spr[SPR_DSISR] = cause;
> +        env->spr[SPR_DAR] = eaddr;
> +        env->error_code = 0;
> +    }
> +}
> +
> +
> +static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int rwx, uint64_t pte,
> +                                   int *fault_cause, int *prot)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    const int pte_att[] = { 0x02, 0x0E, 0x07, 0x06 };
> +    const int need_prot[] = { PAGE_READ, PAGE_WRITE, PAGE_EXEC };
> +    int att = pte_att[pte & R_PTE_ATT];
> +
> +    /* Check Page Attributes (pte58:59) */
> +    if ((att & R_PTE_ATT_G) && (rwx == 2)) { /* Guarded Storage */
> +        *fault_cause |= SRR1_NOEXEC_GUARD;

This seems a bit roundabout.  R_PTE_ATT_G is - confusingly - not a bit
actually within the R_PTE_ATT, but rather a bit in some decoded form.
It's not really clear what the decoding of R_PTE_ATT -> pte_att means.

Since this test amounts to just checking for a single value of the
R_PTE_ATT field, it's not clear to me why the decoding is useful.

> +        return true;
> +    }
> +    /* I think we can ignore the other attributes?! */
> +
> +    /* Determine permissions allowed by Encoded Access Authority */
> +    if ((pte & R_PTE_EAA_PRIV) && msr_pr) { /* Insufficient Privilege */
> +        *prot = 0;
> +    } else if (msr_pr || (pte & R_PTE_EAA_PRIV)) {
> +        *prot = ppc_radix64_get_prot_eaa(pte);
> +    } else { /* !msr_pr && !(pte & R_PTE_EAA_PRIV) */
> +        *prot = ppc_radix64_get_prot_eaa(pte);
> +        *prot &= ppc_radix64_get_prot_amr(cpu); /* Least combined 
> permissions */

So, if I'm reading this right, the AMR mask only applies to privileged
accesses to user PTEs.  Is that right?

> +    }
> +
> +    /* Check if requested access type is allowed */
> +    if (need_prot[rwx] & ~(*prot)) { /* Page Protected for that Access */
> +        *fault_cause |= DSISR_PROTFAULT;
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t pte,
> +                               hwaddr pte_addr, int *prot)
> +{
> +    CPUState *cs = CPU(cpu);
> +    uint64_t npte;
> +
> +    npte = pte | R_PTE_R; /* Always set reference bit */
> +
> +    if (rwx == 1) { /* Store/Write */
> +        npte |= R_PTE_C; /* Set change bit */
> +    } else {
> +        /*
> +         * Treat the page as read-only for now, so that a later write
> +         * will pass through this function again to set the C bit.
> +         */
> +        *prot &= ~PAGE_WRITE;
> +    }
> +
> +    if (pte ^ npte) { /* If pte has changed then write it back */
> +        stq_phys(cs->as, pte_addr, npte);
> +    }
> +}
> +
> +static bool ppc_radix64_walk_tree(PowerPCCPU *cpu, int rwx, vaddr eaddr,
> +                                  uint64_t base_addr, uint64_t nls,
> +                                  hwaddr *raddr, int *psize, int 
> *fault_cause,
> +                                  int *prot)
> +{
> +    CPUState *cs = CPU(cpu);
> +    uint64_t index, pde;
> +
> +    if (nls < 5) { /* Directory maps less than 2**5 entries */
> +        *fault_cause |= DSISR_R_BADCONFIG;
> +        return true;
> +    }
> +
> +    /* Read page <directory/table> entry from guest address space */
> +    index = eaddr >> (*psize - nls); /* Shift */
> +    index &= ((1UL << nls) - 1); /* Mask */
> +    pde = ldq_phys(cs->as, base_addr + (index * sizeof(pde)));
> +    if (!(pde & R_PTE_VALID)) { /* Invalid Entry */
> +        *fault_cause |= DSISR_NOPTE;
> +        return true;
> +    }
> +
> +    *psize -= nls;
> +
> +    /* Check if Leaf Entry -> Page Table Entry -> Stop the Search */
> +    if (pde & R_PTE_LEAF) {
> +        uint64_t rpn = pde & R_PTE_RPN;
> +        uint64_t mask = (1UL << *psize) - 1;
> +
> +        if (ppc_radix64_check_prot(cpu, rwx, pde, fault_cause, prot)) {
> +            return true; /* Protection Denied Access */
> +        }
> +
> +        /* Update Reference and Change Bits */
> +        ppc_radix64_set_rc(cpu, rwx, pde, base_addr + (index * sizeof(pde)),
> +                           prot);

One nit to think about.  At some point you'll want to implement the
get_phys_page_debug() hookso that qemu's gdbstub works.  For the most
part you'll want to re-use this function for that - but the debug
version should not update the RC bits.

> +        /* Or high bits of rpn and low bits to ea to form whole real addr */
> +        *raddr = (rpn & ~mask) | (eaddr & mask);
> +        return false;
> +    } else { /* Next Level of Radix Tree */
> +        return ppc_radix64_walk_tree(cpu, rwx, eaddr, pde & R_PDE_NLB,
> +                                     pde & R_PDE_NLS, raddr, psize,
> +                                     fault_cause, prot);
> +    }
> +}
> +
> +int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
> +                                 int mmu_idx)
> +{
> +    CPUState *cs = CPU(cpu);
> +    CPUPPCState *env = &cpu->env;
> +    PPCVirtualHypervisorClass *vhc =
> +        PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> +    hwaddr raddr;
> +    uint64_t lpid = 0, pid = 0, offset, size, patbe, prtbe0;
> +    int page_size, prot, fault_cause = 0;
> +
> +    assert((rwx == 0) || (rwx == 1) || (rwx == 2));
> +    assert(cpu->vhyp); /* For now there is no Radix PowerNV Support */
> +
> +    /* Real Mode Access */
> +    if (((rwx == 2) && (msr_ir == 0)) || ((rwx != 2) && (msr_dr == 0))) {
> +        /* In real mode top 4 effective addr bits (mostly) ignored */
> +        raddr = eaddr & 0x0FFFFFFFFFFFFFFFULL;
> +
> +        if (msr_hv) {
> +            /* if EA0 == 0, raddr = eaddr4:63 | HRMOR */
> +            if (!(eaddr >> 63)) {
> +                raddr |= env->spr[SPR_HRMOR];
> +            }
> +        } else {
> +            /* TODO */
> +            /* If PowerNV then do partition scoped translation */
> +        }

Again, you might as well just assert(!msr_hv) here.

This probably isn't actually where you want to do partition scoped
translation for powernv eventually anyway: you probably want to have
one function do the guest effective to real translation (handling both
guest IR/DR states), then regardless of guest MMU state do the guest
real to host real translation.

> +
> +        tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
> +                     PAGE_READ | PAGE_WRITE | PAGE_EXEC, mmu_idx,
> +                     TARGET_PAGE_SIZE);
> +        return 0;
> +    }
> +
> +    /* Virtual Mode Access - get the fully qualified address */
> +    if (!ppc_radix64_get_fully_qualified_addr(env, eaddr, &lpid, &pid)) {
> +        ppc_radix64_raise_segi(cpu, rwx, eaddr);
> +        return 1;
> +    }
> +
> +    /* Get Process Table */
> +    patbe = vhc->get_patbe(cpu->vhyp);
> +
> +    /* Index Process Table by PID to Find Corresponding Process Table Entry 
> */
> +    offset = pid * sizeof(struct prtb_entry);
> +    size = 1ULL << ((patbe & PATBE1_R_PRTS) + 12);
> +    if (offset >= size) {
> +        /* offset exceeds size of the process table */
> +        ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE);
> +        return 1;
> +    }
> +    prtbe0 = ldq_phys(cs->as, (patbe & PATBE1_R_PRTB) + offset);
> +
> +    /* Walk Radix Tree from Process Table Entry to Convert EA to RA */
> +    page_size = PRTBE_R_GET_RTS(prtbe0);
> +    if (ppc_radix64_walk_tree(cpu, rwx, eaddr & R_EADDR_MASK,
> +                              prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
> +                              &raddr, &page_size, &fault_cause, &prot)) {
> +        ppc_radix64_raise_si(cpu, rwx, eaddr, fault_cause);
> +        return 1;
> +    }
> +
> +    tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
> +                 prot, mmu_idx, 1UL << page_size);
> +    return 1;
> +}
> diff --git a/target/ppc/mmu-radix64.h b/target/ppc/mmu-radix64.h
> new file mode 100644
> index 0000000..028c915
> --- /dev/null
> +++ b/target/ppc/mmu-radix64.h
> @@ -0,0 +1,68 @@
> +#ifndef MMU_RADIX64_H
> +#define MMU_RADIX64_H
> +
> +#ifndef CONFIG_USER_ONLY
> +
> +/* Radix Quadrants */
> +#define R_EADDR_MASK            0x3FFFFFFFFFFFFFFF
> +#define R_EADDR_QUADRANT        0xC000000000000000
> +#define R_EADDR_QUADRANT0       0x0000000000000000
> +#define R_EADDR_QUADRANT1       0x4000000000000000
> +#define R_EADDR_QUADRANT2       0x8000000000000000
> +#define R_EADDR_QUADRANT3       0xC000000000000000
> +
> +/* Radix Partition Table Entry Fields */
> +#define PATBE1_R_PRTB           0x0FFFFFFFFFFFF000
> +#define PATBE1_R_PRTS           0x000000000000001F
> +
> +/* Radix Process Table Entry Fields */
> +#define PRTBE_R_GET_RTS(rts)    (((rts >> 58) & 0x18) | ((rts >> 5) & 0x7)) 
> + 31
> +#define PRTBE_R_RPDB            0x0FFFFFFFFFFFFF00
> +#define PRTBE_R_RPDS            0x000000000000001F
> +
> +/* Radix Page Directory/Table Entry Fields */
> +#define R_PTE_VALID             0x8000000000000000
> +#define R_PTE_LEAF              0x4000000000000000
> +#define R_PTE_SW0               0x2000000000000000
> +#define R_PTE_RPN               0x01FFFFFFFFFFF000
> +#define R_PTE_SW1               0x0000000000000E00
> +#define R_GET_SW(sw)            (((sw >> 58) & 0x8) | ((sw >> 9) & 0x7))
> +#define R_PTE_R                 0x0000000000000100
> +#define R_PTE_C                 0x0000000000000080
> +#define R_PTE_ATT               0x0000000000000030
> +#define R_PTE_ATT_G             0x1
> +#define R_PTE_EAA_PRIV          0x0000000000000008
> +#define R_PTE_EAA_R             0x0000000000000004
> +#define R_PTE_EAA_RW            0x0000000000000002
> +#define R_PTE_EAA_X             0x0000000000000001
> +#define R_PDE_NLB               PRTBE_R_RPDB
> +#define R_PDE_NLS               PRTBE_R_RPDS
> +
> +#ifdef TARGET_PPC64
> +
> +int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
> +                                 int mmu_idx);
> +
> +static inline int ppc_radix64_get_prot_eaa(uint64_t pte)
> +{
> +    return (pte & R_PTE_EAA_R ? PAGE_READ : 0) |
> +           (pte & R_PTE_EAA_RW ? PAGE_READ | PAGE_WRITE : 0) |
> +           (pte & R_PTE_EAA_X ? PAGE_EXEC : 0);
> +}
> +
> +static inline int ppc_radix64_get_prot_amr(PowerPCCPU *cpu)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    int amr = env->spr[SPR_AMR] >> 62; /* We only care about key0 AMR63:62 */
> +    int iamr = env->spr[SPR_IAMR] >> 62; /* We only care about key0 
> IAMR63:62 */
> +
> +    return (amr & 0x2 ? 0 : PAGE_WRITE) | /* Access denied if bit is set */
> +           (amr & 0x1 ? 0 : PAGE_READ) |
> +           (iamr & 0x1 ? 0 : PAGE_EXEC);
> +}
> +
> +#endif /* TARGET_PPC64 */
> +
> +#endif /* CONFIG_USER_ONLY */
> +
> +#endif /* MMU_RADIX64_H */

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]