qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH V2 09/10] target/ppc: Implement ISA V3.00 radix pa


From: David Gibson
Subject: Re: [Qemu-ppc] [PATCH V2 09/10] target/ppc: Implement ISA V3.00 radix page fault handler
Date: Wed, 29 Mar 2017 14:44:26 +1100
User-agent: Mutt/1.8.0 (2017-02-23)

On Wed, Mar 29, 2017 at 02:03:23PM +1100, Suraj Jitindar Singh wrote:
> On Fri, 2017-03-03 at 15:55 +1100, David Gibson wrote:
> > On Wed, Mar 01, 2017 at 06:13:00PM +1100, Suraj Jitindar Singh wrote:
> > > 
> > > ISA V3.00 introduced a new radix mmu model. Implement the page
> > > fault
> > > handler for this so we can run a tcg guest in radix mode and
> > > perform
> > > address translation correctly.
> > > 
> > > In real mode (mmu turned off) addresses are masked to remove the
> > > top
> > > 4 bits and then are subject to partition scoped translation, since
> > > we only
> > > support pseries at this stage it is only necessary to perform the
> > > masking
> > > and then we're done.
> > > 
> > > In virtual mode (mmu turned on) address translation if performed as
> > > follows:
> > > 
> > > 1. Use the quadrant to determine the fully qualified address.
> > > 
> > > The fully qualified address is defined as the combination of the
> > > effective
> > > address, the effective logical partition id (LPID) and the
> > > effective
> > > process id (PID). Based on the quadrant (EA63:62) we set the pid
> > > and lpid
> > > like so:
> > > 
> > > quadrant 0: lpid = LPIDR, pid = PIDR
> > > quadrant 1: HV only (not allowed in pseries)
> > > quadrant 2: HV only (not allowed in pseries)
> > > quadrant 3: lpid = LPIDR, pid = 0
> > > 
> > > If we can't get the fully qualified address we raise a segment
> > > interrupt.
> > > 
> > > 2. Find the guest radix tree
> > > 
> > > We ask the virtual hypervisor for the partition table which was
> > > registered
> > > with H_REGISTER_PROC_TBL which points us to the process table in
> > > guest
> > > memory. We then index this table by pid to get the process table
> > > entry
> > > which points us to the appropriate radix tree to translate the
> > > address.
> > > 
> > > If the process table isn't big enough to contain an entry for the
> > > current
> > > pid then we raise a storage interrupt.
> > > 
> > > 3. Walk the radix tree
> > > 
> > > Next we walk the radix tree where each level is a table of page
> > > directory
> > > entries indexed by some number of bits from the effective address,
> > > where
> > > the number of bits is determined by the table size. We continue to
> > > walk
> > > the tree (while entries are valid and the table is of minimum size)
> > > until
> > > we reach a table of page table entries, indicated by having the
> > > leaf bit
> > > set. The appropriate pte is then checked for sufficient access
> > > permissions,
> > > the reference and change bits are updated and the real address is
> > > calculated from the real page number bits of the pte and the low
> > > bits of
> > > the effective address.
> > > 
> > > If we can't find an entry or can't access the entry bacause of
> > > permissions
> > > then we raise a storage interrupt.
> > > 
> > > Signed-off-by: Suraj Jitindar Singh <address@hidden>
> > > 
> > > ---
> > > 
> > > V1 -> V2:
> > >  - Use TRUE/FALSE instead of 1/0
> > >  - Set_IS/ISEG now void with return 1 after
> > >  - Add assert(cpu->vhyp) since there's currently no PowerNV Radix
> > > Support
> > > ---
> > >  target/ppc/mmu-book3s-v3.h |   5 +
> > >  target/ppc/mmu-radix64.c   | 238
> > > +++++++++++++++++++++++++++++++++++++++++++++
> > >  target/ppc/mmu-radix64.h   |  59 +++++++++++
> > >  3 files changed, 302 insertions(+)
> > > 
> > > diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-
> > > v3.h
> > > index a9d74cc..d6e7926 100644
> > > --- a/target/ppc/mmu-book3s-v3.h
> > > +++ b/target/ppc/mmu-book3s-v3.h
> > > @@ -42,6 +42,11 @@
> > >  #define SRR1_PROTFAULT           DSISR_PROTFAULT
> > >  #define SRR1_IAMR                DSISR_AMR
> > >  
> > > +/* Process Table Entry */
> > > +struct prtb_entry {
> > > +    uint64_t prtbe0, prtbe1;
> > > +};
> > > +
> > >  #ifdef TARGET_PPC64
> > >  
> > >  static inline bool ppc64_use_proc_tbl(PowerPCCPU *cpu)
> > > diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> > > index 92025cf..d235213 100644
> > > --- a/target/ppc/mmu-radix64.c
> > > +++ b/target/ppc/mmu-radix64.c
> > > @@ -49,3 +49,241 @@ struct ppc_radix_page_info
> > > *ppc_radix64_get_page_info(void)
> > >  {
> > >      return kvm_enabled() ? kvm_get_radix_page_info() :
> > > &radix_page_info;
> > >  }
> > > +
> > > +static bool ppc_radix64_get_fully_qualified_addr(CPUPPCState *env,
> > > vaddr eaddr,
> > > +                                                 uint64_t *lpid,
> > > uint64_t *pid)
> > > +{
> > > +    if (msr_hv) { /* MSR[HV] -> Host */
> > > +        /* TODO */
> > > +        /* PowerNV ONLY */
> > > +        error_report("RADIX PowerNV Support Unimplemented");
> > > +        exit(1);
> > > +    } else { /* !MSR[HV] -> Guest */
> > > +        switch (eaddr & R_EADDR_QUADRANT) {
> > > +        case R_EADDR_QUADRANT0: /* Guest application */
> > > +            *lpid = env->spr[SPR_LPIDR];
> > > +            *pid = env->spr[SPR_BOOKS_PID];
> > > +            break;
> > > +        case R_EADDR_QUADRANT1: /* Illegal */
> > > +        case R_EADDR_QUADRANT2:
> > > +            return false;
> > > +        case R_EADDR_QUADRANT3: /* Guest OS */
> > > +            *lpid = env->spr[SPR_LPIDR];
> > > +            *pid = 0; /* pid set to 0 -> addresses guest operating
> > > system */
> > > +            break;
> > > +        }
> > > +    }
> > > +    return true;
> > > +}
> > > +
> > > +static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr
> > > eaddr)
> > > +{
> > > +    CPUState *cs = CPU(cpu);
> > > +    CPUPPCState *env = &cpu->env;
> > > +
> > > +    if (rwx == 2) { /* Instruction Segment Interrupt */
> > > +        cs->exception_index = POWERPC_EXCP_ISEG;
> > > +    } else { /* Data Segment Interrupt */
> > > +        cs->exception_index = POWERPC_EXCP_DSEG;
> > > +        env->spr[SPR_DAR] = eaddr;
> > > +    }
> > > +    env->error_code = 0;
> > > +}
> > > +
> > > +static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, vaddr
> > > eaddr,
> > > +                                uint32_t cause)
> > > +{
> > > +    CPUState *cs = CPU(cpu);
> > > +    CPUPPCState *env = &cpu->env;
> > > +
> > > +    if (rwx == 2) { /* Instruction Storage Interrupt */
> > > +        cs->exception_index = POWERPC_EXCP_ISI;
> > > +        env->error_code = cause;
> > > +    } else { /* Data Storage Interrupt */
> > > +        cs->exception_index = POWERPC_EXCP_DSI;
> > > +        if (rwx == 1) { /* Write -> Store */
> > > +            cause |= DSISR_ISSTORE;
> > > +        }
> > > +        env->spr[SPR_DSISR] = cause;
> > > +        env->spr[SPR_DAR] = eaddr;
> > > +        env->error_code = 0;
> > > +    }
> > > +}
> > > +
> > > +
> > > +static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int rwx,
> > > uint64_t pte,
> > > +                                   int *fault_cause, int *prot)
> > > +{
> > > +    CPUPPCState *env = &cpu->env;
> > > +    const int pte_att[] = { 0x02, 0x0E, 0x07, 0x06 };
> > > +    const int need_prot[] = { PAGE_READ, PAGE_WRITE, PAGE_EXEC };
> > > +    int att = pte_att[pte & R_PTE_ATT];
> > > +
> > > +    /* Check Page Attributes (pte58:59) */
> > > +    if ((att & R_PTE_ATT_G) && (rwx == 2)) { /* Guarded Storage */
> > > +        *fault_cause |= SRR1_NOEXEC_GUARD;
> > > +        return 1;
> > true/false instead of 0/1 here as well.
> 
> Ok
> 
> > 
> > > 
> > > +    }
> > > +    /* I think we can ignore the other attributes?! */
> > > +
> > > +    /* Determine permissions allowed by Encoded Access Authority
> > > */
> > > +    if ((pte & R_PTE_EAA_PRIV) && msr_pr) { /* Insufficient
> > > Privilege */
> > > +        *prot = 0;
> > > +    } else if (msr_pr || (pte & R_PTE_EAA_PRIV)) {
> > > +        *prot = ppc_radix64_get_prot_eaa(pte);
> > > +    } else { /* !msr_pr && !(pte & R_PTE_EAA_PRIV) */
> > > +        *prot = ppc_radix64_get_prot_eaa(pte);
> > > +        *prot &= ppc_radix64_get_prot_amr(cpu); /* Least combined
> > > permissions */
> > > +    }
> > > +
> > > +    /* Check if requested access type is allowed */
> > > +    if (need_prot[rwx] & ~(*prot)) { /* Page Protected for that
> > > Access */
> > > +        *fault_cause |= DSISR_PROTFAULT;
> > > +        return 1;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t
> > > pte,
> > > +                               hwaddr pte_addr, int *prot)
> > > +{
> > > +    CPUState *cs = CPU(cpu);
> > > +    uint64_t npte;
> > > +
> > > +    npte = pte | R_PTE_R; /* Always set reference bit */
> > > +
> > > +    if (rwx == 1) { /* Store/Write */
> > > +        npte |= R_PTE_C; /* Set change bit */
> > > +    } else {
> > > +        /*
> > > +         * Treat the page as read-only for now, so that a later
> > > write
> > > +         * will pass through this function again to set the C bit.
> > > +         */
> > > +        *prot &= ~PAGE_WRITE;
> > > +    }
> > > +
> > > +    if (pte ^ npte) { /* If pte has changed then write it back */
> > > +        stq_phys(cs->as, pte_addr, npte);
> > > +    }
> > > +}
> > > +
> > > +static bool ppc_radix64_walk_tree(PowerPCCPU *cpu, int rwx, vaddr
> > > eaddr,
> > > +                                  uint64_t base_addr, uint64_t
> > > nls,
> > > +                                  hwaddr *raddr, int *psize, int
> > > *fault_cause,
> > > +                                  int *prot)
> > > +{
> > > +    CPUState *cs = CPU(cpu);
> > > +    uint64_t index, pde;
> > > +
> > > +    if (nls < 5) { /* Directory maps less than 2**5 entries */
> > > +        *fault_cause |= DSISR_R_BADCONFIG;
> > > +        return 1;
> > .. and here.
> > 
> 
> Yep
> 
> > > 
> > > +    }
> > > +
> > > +    /* Read page <directory/table> entry from guest address space
> > > */
> > > +    index = eaddr >> (*psize - nls); /* Shift */
> > > +    index &= ((1UL << nls) - 1); /* Mask */
> > > +    pde = ldq_phys(cs->as, base_addr + (index * sizeof(pde)));
> > > +    if (!(pde & R_PTE_VALID)) { /* Invalid Entry */
> > > +        *fault_cause |= DSISR_NOPTE;
> > > +        return 1;
> > > +    }
> > > +
> > > +    *psize -= nls;
> > > +
> > > +    /* Check if Leaf Entry -> Page Table Entry -> Stop the Search
> > > */
> > > +    if (pde & R_PTE_LEAF) {
> > > +        uint64_t rpn = pde & R_PTE_RPN;
> > > +        uint64_t mask = (1UL << *psize) - 1;
> > > +
> > > +        if (ppc_radix64_check_prot(cpu, rwx, pde, fault_cause,
> > > prot)) {
> > > +            return 1; /* Protection Denied Access */
> > > +        }
> > > +
> > > +        /* Update Reference and Change Bits */
> > > +        ppc_radix64_set_rc(cpu, rwx, pde, base_addr + (index *
> > > sizeof(pde)),
> > > +                           prot);
> > > +
> > > +        /* Or high bits of rpn and low bits to ea to form whole
> > > real addr */
> > > +        *raddr = (rpn & ~mask) | (eaddr & mask);
> > > +        return 0;
> > > +    } else { /* Next Level of Radix Tree */
> > > +        return ppc_radix64_walk_tree(cpu, rwx, eaddr, pde &
> > > R_PDE_NLB,
> > > +                                     pde & R_PDE_NLS, raddr,
> > > psize,
> > > +                                     fault_cause, prot);
> > > +    }
> > > +}
> > > +
> > > +int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int
> > > rwx,
> > > +                                 int mmu_idx)
> > > +{
> > > +    CPUState *cs = CPU(cpu);
> > > +    CPUPPCState *env = &cpu->env;
> > > +    PPCVirtualHypervisorClass *vhc =
> > > +        PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> > > +    hwaddr raddr;
> > > +    uint64_t lpid = 0, pid = 0, offset, size, patbe, prtbe0;
> > > +    int page_size, prot, fault_cause = 0;
> > > +
> > > +    assert((rwx == 0) || (rwx == 1) || (rwx == 2));
> > > +    assert(cpu->vhyp); /* For now there is no Radix PowerNV
> > > Support */
> > > +
> > > +    /* Real Mode Access */
> > > +    if (((rwx == 2) && (msr_ir == 0)) || ((rwx != 2) && (msr_dr ==
> > > 0))) {
> > > +        /* In real mode top 4 effective addr bits (mostly) ignored
> > > */
> > > +        raddr = eaddr & 0x0FFFFFFFFFFFFFFFULL;
> > > +
> > > +        if (msr_hv) {
> > > +            /* if EA0 == 0, raddr = eaddr4:63 | HRMOR */
> > > +            if (!(eaddr >> 63)) {
> > > +                raddr |= env->spr[SPR_HRMOR];
> > > +            }
> > > +        } else {
> > > +            /* TODO */
> > > +            /* If PowerNV then do partition scoped translation */
> > > +        }
> > > +
> > > +        tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr &
> > > TARGET_PAGE_MASK,
> > > +                     PAGE_READ | PAGE_WRITE | PAGE_EXEC, mmu_idx,
> > > +                     TARGET_PAGE_SIZE);
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* Virtual Mode Access - get the fully qualified address */
> > > +    if (!ppc_radix64_get_fully_qualified_addr(env, eaddr, &lpid,
> > > &pid)) {
> > > +        ppc_radix64_raise_segi(cpu, rwx, eaddr);
> > > +        return 1;
> > > +    }
> > > +
> > > +    /* Get Process Table */
> > > +    patbe = vhc->get_patbe(cpu->vhyp);
> > > +    /*
> > > +     * Must be a non-zero process table base, otherwise mmu turned
> > > on without
> > > +     * previous H_REGISTER_PROC_TBL call, which isn't allowed
> > > +     */
> > > +    assert(patbe & ~PATBE1_GR);
> > I don't think this can be an assert though.  I think this is guest
> > triggerable if it sets MSR_DR without calling H_REGISTER_PROC_TBL, in
> > which case you'll have to raise some sort of fault.
> 
> We can't actually trigger this assert since we use the PATBE1_GR bit to
> determine the fault handler that we call. I'll remove it.

Oh.. good point.  No, leaving it in makes sense, I just missed that it
controlled the branch to this function.

> 
> > 
> > > 
> > > +
> > > +    /* Index Process Table by PID to Find Corresponding Process
> > > Table Entry */
> > > +    offset = pid * sizeof(struct prtb_entry);
> > > +    size = 1UL << ((patbe & PATBE1_R_PRTS) + 12)
> > Better use 1ULL or on a 32-bit host this could overflow.
> > 
> > > 
> > > +    if (offset >= size) {
> > > +        /* offset exceeds size of the process table */
> > > +        ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE);
> > > +        return 1;
> > > +    }
> > > +    prtbe0 = ldq_phys(cs->as, (patbe & PATBE1_R_PRTB) + offset);
> > > +
> > > +    /* Walk Radix Tree from Process Table Entry to Convert EA to
> > > RA */
> > > +    page_size = PRTBE_R_GET_RTS(prtbe0);
> > > +    if (ppc_radix64_walk_tree(cpu, rwx, eaddr & R_EADDR_MASK,
> > > +                              prtbe0 & PRTBE_R_RPDB, prtbe0 &
> > > PRTBE_R_RPDS,
> > > +                              &raddr, &page_size, &fault_cause,
> > > &prot)) {
> > > +        ppc_radix64_raise_si(cpu, rwx, eaddr, fault_cause);
> > > +        return 1;
> > > +    }
> > > +
> > > +    tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr &
> > > TARGET_PAGE_MASK,
> > > +                 prot, mmu_idx, 1UL << page_size);
> > > +    return 1;
> > > +}
> > > diff --git a/target/ppc/mmu-radix64.h b/target/ppc/mmu-radix64.h
> > > index 9bd283c..4026cd9 100644
> > > --- a/target/ppc/mmu-radix64.h
> > > +++ b/target/ppc/mmu-radix64.h
> > > @@ -3,9 +3,68 @@
> > >  
> > >  #ifndef CONFIG_USER_ONLY
> > >  
> > > +/* Radix Quadrants */
> > > +#define R_EADDR_MASK            0x3FFFFFFFFFFFFFFF
> > > +#define R_EADDR_QUADRANT        0xC000000000000000
> > > +#define R_EADDR_QUADRANT0       0x0000000000000000
> > > +#define R_EADDR_QUADRANT1       0x4000000000000000
> > > +#define R_EADDR_QUADRANT2       0x8000000000000000
> > > +#define R_EADDR_QUADRANT3       0xC000000000000000
> > > +
> > > +/* Radix Partition Table Entry Fields */
> > > +#define PATBE1_R_PRTB           0x0FFFFFFFFFFFF000
> > > +#define PATBE1_R_PRTS           0x000000000000001F
> > > +
> > > +/* Radix Process Table Entry Fields */
> > > +#define PRTBE_R_GET_RTS(rts)    ((((rts >> 58) & 0x18) | ((rts >>
> > > 5) & 0x7)) \
> > > +                                + 31)
> > > +#define PRTBE_R_RPDB            0x0FFFFFFFFFFFFF00
> > > +#define PRTBE_R_RPDS            0x000000000000001F
> > > +
> > > +/* Radix Page Directory/Table Entry Fields */
> > > +#define R_PTE_VALID             0x8000000000000000
> > > +#define R_PTE_LEAF              0x4000000000000000
> > > +#define R_PTE_SW0               0x2000000000000000
> > > +#define R_PTE_RPN               0x01FFFFFFFFFFF000
> > > +#define R_PTE_SW1               0x0000000000000E00
> > > +#define R_GET_SW(sw)            (((sw >> 58) & 0x8) | ((sw >> 9) &
> > > 0x7))
> > > +#define R_PTE_R                 0x0000000000000100
> > > +#define R_PTE_C                 0x0000000000000080
> > > +#define R_PTE_ATT               0x0000000000000030
> > > +#define R_PTE_ATT_G             0x1
> > > +#define R_PTE_EAA_PRIV          0x0000000000000008
> > > +#define R_PTE_EAA_R             0x0000000000000004
> > > +#define R_PTE_EAA_RW            0x0000000000000002
> > > +#define R_PTE_EAA_X             0x0000000000000001
> > > +#define R_PDE_NLB               PRTBE_R_RPDB
> > > +#define R_PDE_NLS               PRTBE_R_RPDS
> > > +
> > > +/* DSISR/SRR1 Fields */
> > > +#define DSISR_R_BADCONFIG       (1 << (63 - 44))
> > So this bit _is_ radix specific, but I think it should go into cpu.h
> > with the other DSISR/SRR1 fields (I moved them there from the v3
> > specific file Sam originally had them in).
> 
> Will do
> 
> > 
> > > 
> > >  #ifdef TARGET_PPC64
> > >  
> > >  struct ppc_radix_page_info *ppc_radix64_get_page_info(void);
> > > +int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int
> > > rwx,
> > > +                                 int mmu_idx);
> > > +
> > > +static inline int ppc_radix64_get_prot_eaa(uint64_t pte)
> > > +{
> > > +    return (pte & R_PTE_EAA_R ? PAGE_READ : 0) |
> > > +           (pte & R_PTE_EAA_RW ? PAGE_READ | PAGE_WRITE : 0) |
> > > +           (pte & R_PTE_EAA_X ? PAGE_EXEC : 0);
> > > +}
> > > +
> > > +static inline int ppc_radix64_get_prot_amr(PowerPCCPU *cpu)
> > > +{
> > > +    CPUPPCState *env = &cpu->env;
> > > +    int amr = env->spr[SPR_AMR] >> 62; /* We only care about key0
> > > AMR63:62 */
> > > +    int iamr = env->spr[SPR_IAMR] >> 62; /* We only care about
> > > key0 IAMR63:62 */
> > Do we only ever care about key0 with radix? Or can other keys come
> > into play with powernv, or extensions that aren't implmented yet?
> 
> We only ever use key0 for radix.
> 
> > 
> > > 
> > > +    return (amr & 0x2 ? 0 : PAGE_WRITE) | /* Access denied if bit
> > > is set */
> > > +           (amr & 0x1 ? 0 : PAGE_READ) |
> > > +           (iamr & 0x1 ? 0 : PAGE_EXEC);
> > > +}
> > >  
> > >  #endif /* TARGET_PPC64 */
> > >  
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]