[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [qemu-s390x] [RFC v3 0/56] per-CPU locks
From: |
Emilio G. Cota |
Subject: |
Re: [qemu-s390x] [RFC v3 0/56] per-CPU locks |
Date: |
Fri, 19 Oct 2018 15:29:32 -0400 |
User-agent: |
Mutt/1.9.4 (2018-02-28) |
On Fri, Oct 19, 2018 at 18:01:18 +0200, Paolo Bonzini wrote:
> On 19/10/2018 16:50, Emilio G. Cota wrote:
> > On Fri, Oct 19, 2018 at 08:59:24 +0200, Paolo Bonzini wrote:
> >> On 19/10/2018 03:05, Emilio G. Cota wrote:
> >>> I'm calling this series a v3 because it supersedes the two series
> >>> I previously sent about using atomics for interrupt_request:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02013.html
> >>> The approach in that series cannot work reliably; using (locked) atomics
> >>> to set interrupt_request but not using (locked) atomics to read it
> >>> can lead to missed updates.
> >>
> >> The idea here was that changes to protected fields are all followed by
> >> kick. That may not have been the case, granted, but I wonder if the
> >> plan is unworkable.
> >
> > I suspect that the cpu->interrupt_request+kick mechanism is not the issue,
> > otherwise master should not work--we do atomic_read(cpu->interrupt_request)
> > and only if that read != 0 we take the BQL.
> >
> > My guess is that the problem is with other reads of cpu->interrupt_request,
> > e.g. those in cpu_has_work. Currently those reads happen with the
> > BQL held, and updates to cpu->interrupt_request take the BQL. If we drop
> > the BQL from the setters to instead use locked atomics (like in the
> > aforementioned series), those BQL-protected readers might miss updates.
>
> cpu_has_work is only needed to handle the processor's halted state (or
> is it?). If it is, OR+kick should work.
>
> > Given that we need a per-CPU lock anyway to remove the BQL from the
> > CPU loop, extending this lock to protect cpu->interrupt_request is
> > a simple solution that keeps the current logic and allows for
> > greater scalability.
>
> Sure, I was just curious what the problem was. KVM uses OR+kick with no
> problems.
I never found exactly where things break. The hangs happen
pretty early when booting a large (-smp > 16) x86_64 Ubuntu guest.
Booting never completes (ssh unresponsive) if I don't have the
console output (I suspect the console output slows things down
enough to hide some races). I only see a few threads busy:
a couple of vCPU threads, and the I/O thread.
I didn't have time to debug any further, so I moved on
to an alternative approach.
So it is possible that it was my implementation, and not the approach,
what was at fault :-)
Thanks,
E.
- [qemu-s390x] [RFC v3 17/56] s390x: convert to cpu_halted, (continued)
- [qemu-s390x] [RFC v3 17/56] s390x: convert to cpu_halted, Emilio G. Cota, 2018/10/18
- [qemu-s390x] [RFC v3 39/56] s390x: convert to cpu_interrupt_request, Emilio G. Cota, 2018/10/18
- [qemu-s390x] [RFC v3 27/56] s390x: use cpu_reset_interrupt, Emilio G. Cota, 2018/10/18
- [qemu-s390x] [RFC v3 50/56] s390: acquire the BQL in cpu_has_work, Emilio G. Cota, 2018/10/18
- Re: [qemu-s390x] [RFC v3 0/56] per-CPU locks, Paolo Bonzini, 2018/10/19