|
From: | Eric Farman |
Subject: | Re: [qemu-s390x] [PATCH v4 2/6] vfio-ccw: rework ssch state handling |
Date: | Fri, 8 Mar 2019 17:18:22 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 03/01/2019 04:38 AM, Cornelia Huck wrote:
The flow for processing ssch requests can be improved by splitting the BUSY state: - CP_PROCESSING: We reject any user space requests while we are in the process of translating a channel program and submitting it to the hardware. Use -EAGAIN to signal user space that it should retry the request. - CP_PENDING: We have successfully submitted a request with ssch and are now expecting an interrupt. As we can't handle more than one channel program being processed, reject any further requests with -EBUSY. A final interrupt will move us out of this state; this also fixes a latent bug where a non-final interrupt might have freed up a channel program that still was in progress. By making this a separate state, we make it possible to issue a halt or a clear while we're still waiting for the final interrupt for the ssch (in a follow-on patch). It also makes a lot of sense not to preemptively filter out writes to the io_region if we're in an incorrect state: the state machine will handle this correctly. Reviewed-by: Eric Farman <address@hidden> Signed-off-by: Cornelia Huck <address@hidden> --- drivers/s390/cio/vfio_ccw_drv.c | 8 ++++++-- drivers/s390/cio/vfio_ccw_fsm.c | 19 ++++++++++++++----- drivers/s390/cio/vfio_ccw_ops.c | 2 -- drivers/s390/cio/vfio_ccw_private.h | 3 ++- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c index a10cec0e86eb..0b3b9de45c60 100644 --- a/drivers/s390/cio/vfio_ccw_drv.c +++ b/drivers/s390/cio/vfio_ccw_drv.c @@ -72,20 +72,24 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) { struct vfio_ccw_private *private; struct irb *irb; + bool is_final;private = container_of(work, struct vfio_ccw_private, io_work);irb = &private->irb;+ is_final = !(scsw_actl(&irb->scsw) &+ (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); if (scsw_is_solicited(&irb->scsw)) { cp_update_scsw(&private->cp, &irb->scsw); - cp_free(&private->cp); + if (is_final) + cp_free(&private->cp); } memcpy(private->io_region->irb_area, irb, sizeof(*irb));if (private->io_trigger)eventfd_signal(private->io_trigger, 1);- if (private->mdev)+ if (private->mdev && is_final) private->state = VFIO_CCW_STATE_IDLE; }
Coincidentally, I did something AWESOME last night that the chunks listed above actually fix. I have a large channel program, and when it runs my host crashes which isn't nice. First, the callback:
[ 547.821235] Call Trace: [ 547.821236] ([<0000000000000000>] (null)) [ 547.821244] [<000003ff808d8b4a>] cp_prefetch+0x422/0x750 [vfio_ccw] [ 547.821247] [<000003ff808d9a90>] fsm_io_request+0x1a0/0x2f0 [vfio_ccw][ 547.821250] [<000003ff808d90c4>] vfio_ccw_mdev_write+0xc4/0x1d8 [vfio_ccw]
[ 547.821255] [<0000000000358d8c>] __vfs_write+0x34/0x1a8 [ 547.821256] [<00000000003590d0>] vfs_write+0xa0/0x1d8 [ 547.821259] [<0000000000359572>] ksys_pwrite64+0x8a/0xa8 [ 547.821264] [<0000000000866cf0>] system_call+0x270/0x290 [ 547.821264] Last Breaking-Event-Address: [ 547.821267] [<00000000003325b2>] __kmalloc+0x1c2/0x288 The channel program in question looks like this: x01 cmd=0b flags=44 count=0006 x02 cmd=02 flags=64 count=07bf x03 cmd=47 flags=44 count=0010 x04 cmd=49 flags=64 count=049b x05 cmd=08 flags=00 count=0000 TIC to x04 x06 cmd=0b flags=64 count=0007 x07 cmd=23 flags=44 count=0001 x08 cmd=e4 flags=44 count=0018 x09 cmd=07 flags=44 count=0006 x0a cmd=e4 flags=44 count=0018 x0b cmd=47 flags=64 count=001b x0c cmd=8e flags=64 count=013a x0d cmd=9a flags=64 count=0009 x0e cmd=31 flags=4c count=0005 x0f cmd=08 flags=00 count=0000 TIC to x0e x10 cmd=0d flags=64 count=061b x11 cmd=07 flags=64 count=000b x12 cmd=96 flags=64 count=0144 x13 cmd=a9 flags=64 count=0025 x14 cmd=08 flags=00 count=0000 TIC to x13 x15 cmd=05 flags=64 count=0387 x16 cmd=a4 flags=64 count=003e x17 cmd=e4 flags=44 count=0018 x18 cmd=0b flags=64 count=000a x19 cmd=96 flags=64 count=0497 x1a cmd=8e flags=64 count=02c3 x1b cmd=29 flags=64 count=01bf x1c cmd=08 flags=00 count=0000 TIC to x1b x1d cmd=1b flags=24 count=000aDebugging it today, I found that we get an intermediate interrupt on CCW 0x0e, and a final interrupt (well, unit check) on CCW 0x11. But because of the intermediate interrupt, rewinding in cp_prefetch() at label out_err fails and we crash. Whoops!
Recalling the above changes, I applied JUST the above pieces (not the remainder of this patch), and the above channel program works fine. Now to figure out why I get a unit check. :)
- Eric
[Prev in Thread] | Current Thread | [Next in Thread] |