qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ppc/pnv: Fix NMI system reset SRR1 value


From: Cédric Le Goater
Subject: Re: [PATCH] ppc/pnv: Fix NMI system reset SRR1 value
Date: Fri, 8 May 2020 16:05:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

>> of a pending interrupt. It occurs on a SMP PowerNV machine when it is 
>> stressed with IO, such as scp of a big file. 
>>
>> I am suspecting more and more an issue with an interrupt being handled 
>> when the CPU is coming out of idle. I haven't seen anything wrong in
> 
> So you can't hit it when booting Linux with powersave=off?

no. I uploaded 32GB steadily at 3.0MB/s on a smp 2 machine. 

When powersave is on, a P8 or P9 machine will miss an interrupt quite 
quickly. This assert can catch a symptom of the failure  :

    @@ -75,6 +83,9 @@ void ppc_set_irq(PowerPCCPU *cpu, int n_
         if (level) {
             env->pending_interrupts |= 1 << n_IRQ;
             cpu_interrupt(cs, CPU_INTERRUPT_HARD);
    +        if (!(env->pending_interrupts & (1 << n_IRQ))) {
    +            g_assert_not_reached();
    +        }
         } else {
             env->pending_interrupts &= ~(1 << n_IRQ);
             if (env->pending_interrupts == 0) {

env->pending_interrupts is reseted in ppc_set_irq() setting it. I think 
it is the CPU handling the external IO interrupt which is kicked to wake 
up in cpu_interrupt(). The IRQ level goes out of sync with what the device 
expects and things go bad very quickly after. 

But this is post mortem. I need to find the right spot where to put an 
assert() to analyze. But, adding too much traces closes the window ...

> Do we model stop with EC=0 properly? Looks like helper_pminsn seems to
> be doing the right thing there.

Yes. It seems so. The CPUs enter nap and come out with PACA_IRQ_EE set.

>> the models. Unless this maybe :
>>
>>     /* Pretend to be returning from doze always as we don't lose state */
>>     *msr |= (0x1ull << (63 - 47));
>>
>> I am not sure how in sync it is with PSSCR.
> 
> That should be okay, the hardware can always enter a shallower state 
> than was asked for. Linux will handle it. For testing purpose, we could
> model deeper states by scribbling on registers and indicating state loss.
> 
> Aide from SRR1 sleep state value, Linux uses the SRR1 wake reason value 
> to run the interrupt handler, but even if we got SRR1 wrong, Linux 
> eventually enables MSR[EE] so the interrupt should get replayed then 
> (this is what Linux used to do until we added the wake-reason processing 
> for improved performance).
> 
> But we do appear to get those right in powerpc_reset_wakeup().

yes. Still digging.

Thanks,

C.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]