[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC] vfio-ccw: forward halt/clear errors
From: |
Cornelia Huck |
Subject: |
Re: [PATCH RFC] vfio-ccw: forward halt/clear errors |
Date: |
Mon, 17 May 2021 19:31:45 +0200 |
On Wed, 12 May 2021 16:19:15 -0400
Eric Farman <farman@linux.ibm.com> wrote:
> On Tue, 2021-05-11 at 17:11 +0200, Cornelia Huck wrote:
> > hsch and csch basically have two parts: execute the command,
> > and perform the halt/clear function. For fully emulated
> > subchannels, it is pretty clear how it will work: check the
> > subchannel state, and actually 'perform the halt/clear function'
> > and set cc 0 if everything looks good.
> >
> > For passthrough subchannels, some of the checking is done
> > within QEMU, but some has to be done within the kernel. QEMU's
> > subchannel state may be such that we can perform the async
> > function, but the kernel may still get a cc != 0 when it is
> > actually executing the instruction. In that case, we need to
> > set the condition actually encountered by the kernel; if we
> > set cc 0 on error, we would actually need to inject an interrupt
> > as well.
> >
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---
> >
> > Stumbled over this during the vfio-ccw kernel locking discussions.
> >
> > This is probably a corner case, and I'm not sure how I can actually
> > get this path excercised, but it passes my smoke tests.
>
> I'll see if I can hammer my way into some of this.
Thanks, that would be cool.
>
> >
> > Not sure whether this is the way to go.
>
> I think it seems reasonable.
>
> > The unit exceptions in the
> > halt/clear error paths also seem slightly fishy.
>
> It is peculiar. Looking at the old POPS references, the unit exception
> states that the --device-- detected something unusual, not really the
> subchannel (which is how vfio-ccw is behaving). But, providing some
> indication that something went seriously wrong is good. Which I guess
> was the point of the UE code, even though it's obviously set up to be
> generated after a failure on the START.
Yes, I had copied it over when I wired up halt/clear.
>
> I guess at the least, we need to clean up the FCTL based on the
> function that failed, rather than only cleaning up the START function.
Makes sense.
> The UE itself may just be an extra "hey this is busted" indicator.
It still feels a bit odd, but I'm not sure that there's a better
alternative.
>
> >
> > ---
> > hw/s390x/css.c | 34 ++++++++++++++++++++++++++++++----
> > hw/vfio/ccw.c | 4 ++--
> > 2 files changed, 32 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> > index bed46f5ec3a2..ce2e903ca25a 100644
> > --- a/hw/s390x/css.c
> > +++ b/hw/s390x/css.c
> > @@ -1206,23 +1206,49 @@ static void
> > sch_handle_start_func_virtual(SubchDev *sch)
> >
> > }
> >
> > -static void sch_handle_halt_func_passthrough(SubchDev *sch)
> > +static IOInstEnding sch_handle_halt_func_passthrough(SubchDev *sch)
> > {
> > int ret;
> >
> > ret = s390_ccw_halt(sch);
> > if (ret == -ENOSYS) {
> > sch_handle_halt_func(sch);
> > + return IOINST_CC_EXPECTED;
>
> This is the fallback, so makes sense. You could fold it into the switch
> table, but since that's for "stuff from the kernel" versus -ENOSYS says
> "there's no way to call the kernel" I guess this is fine too.
Yes, that was the reason I kept that separate. (I don't expect it to be
a heavily exercised path nowadays anyway.)
>
> > + }
> > + /*
> > + * Some conditions may have been detected prior to starting the
> > halt
> > + * function; map them to the correct cc.
> > + */
> > + switch (ret) {
> > + case -EBUSY:
> > + return IOINST_CC_BUSY;
> > + case -ENODEV:
> > + case -EACCES:
> > + return IOINST_CC_NOT_OPERATIONAL;
> > + default:
> > + return IOINST_CC_EXPECTED;
> > }
> > }
(...)
> > diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> > index e752c845e9e4..39275a917bd2 100644
> > --- a/hw/vfio/ccw.c
> > +++ b/hw/vfio/ccw.c
> > @@ -199,7 +199,7 @@ again:
>
> // This is for CLEAR
>
> > case 0:
> > case -ENODEV:
> > case -EACCES:
> > - return 0;
> > + return ret;
> > case -EFAULT:
> > default:
> > sch_gen_unit_exception(sch);
> > @@ -240,7 +240,7 @@ again:
>
> // This is for HALT
>
> > case -EBUSY:
> > case -ENODEV:
> > case -EACCES:
> > - return 0;
> > + return ret;
>
> Aside: How could we get EACCES for either HALT or CLEAR? I only see
> that set in the normal request path, if we got a CC3 on the SSCH.
We should only get -ENODEV, which basically indicates cc3 on the kernel
side. For start, the kernel distinguishes between "you didn't specify a
path in the orb that's actually available" and "device not
operational", but we map everything to cc 3 in QEMU (and I don't think
there's anything else we could do with that information anyway.)
>
> Can we scrub them, or do we need to update kernel
> Documentation/s390/vfio-ccw.rst ? :)
We could remove them, or log something if they come up unexpectedly;
but their presence does not really hurt, either.