[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [qemu-s390x] [PATCH v5 15/15] s390-bios: Support booting from real d
From: |
Thomas Huth |
Subject: |
Re: [qemu-s390x] [PATCH v5 15/15] s390-bios: Support booting from real dasd device |
Date: |
Fri, 29 Mar 2019 09:33:17 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 |
On 13/03/2019 17.31, Jason J. Herne wrote:
> Allows guest to boot from a vfio configured real dasd device.
>
> Signed-off-by: Jason J. Herne <address@hidden>
> Reviewed-by: Cornelia Huck <address@hidden>
> ---
[...]
> diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
> new file mode 100644
> index 0000000..236428a
> --- /dev/null
> +++ b/docs/devel/s390-dasd-ipl.txt
> @@ -0,0 +1,133 @@
> +*****************************
> +***** s390 hardware IPL *****
> +*****************************
> +
> +The s390 hardware IPL process consists of the following steps.
> +
> +1. A READ IPL ccw is constructed in memory location 0x0.
> + This ccw, by definition, reads the IPL1 record which is located on the
> disk
> + at cylinder 0 track 0 record 1. Note that the chain flag is on in this
> ccw
> + so when it is complete another ccw will be fetched and executed from
> memory
> + location 0x08.
> +
> +2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
> + IPL1 data is 24 bytes in length and consists of the following pieces of
> + information: [psw][read ccw][tic ccw]. When the machine executes the Read
> + IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
> + location 0x0. Then the ccw program at 0x08 which consists of a read
> + ccw and a tic ccw is automatically executed because of the chain flag
> from
> + the original READ IPL ccw. The read ccw will read the IPL2 data into
> memory
> + and the TIC (Tranfer In Channel) will transfer control to the channel
s/Tranfer/Transfer/ ?
[...]
> +**********************************************************
> +***** How this all pertains to QEMU (and the kernel) *****
> +**********************************************************
> +
> +In theory we should merely have to do the following to IPL/boot a guest
> +operating system from a DASD device:
> +
> +1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
> +2. Execute channel program at 0x0.
> +3. LPSW 0x0.
> +
> +However, our emulation of the machine's channel program logic within the
> kernel
> +is missing one key feature that is required for this process to work:
> +non-prefetch of ccw data.
> +
> +When we start a channel program we pass the channel subsystem parameters via
> an
> +ORB (Operation Request Block). One of those parameters is a prefetch bit. If
> the
> +bit is on then the vfio-ccw kernel driver is allowed to read the entire
> channel
> +program from guest memory before it starts executing it. This means that any
> +channel commands that read additional channel commands will not work as
> expected
> +because the newly read commands will only exist in guest memory and NOT
> within
> +the kernel's channel subsystem memory. The kernel vfio-ccw driver currently
> +requires this bit to be on for all channel programs. This is a problem
> because
> +the IPL process consists of transferring control from the "Read IPL" ccw
> +immediately to the IPL1 channel program that was read by "Read IPL".
> +
> +Not being able to turn off prefetch will also prevent the TIC at the end of
> the
> +IPL1 channel program from transferring control to the IPL2 channel program.
> +
> +Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
> +tansfers control to another channel program segment immediately after
> reading it
s/tansfers/transfers/
> +from the disk. So we need to be able to handle this case.
> +
> +**************************
> +***** What QEMU does *****
> +**************************
> +
> +Since we are forced to live with prefetch we cannot use the very simple IPL
> +procedure we defined in the preceding section. So we compensate by doing the
> +following.
> +
> +1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
> +2. Execute "Read IPL" at 0x0.
> +
> + So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
> +
> +4. Write a custom channel program that will seek to the IPL2 record and then
> + execute the READ and TIC ccws from IPL1. Normamly the seek is not
> required
s/Normamly/Normally/
[...]
> diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
> new file mode 100644
> index 0000000..1a44469
> --- /dev/null
> +++ b/pc-bios/s390-ccw/dasd-ipl.c
> @@ -0,0 +1,249 @@
> +/*
> + * S390 IPL (boot) from a real DASD device via vfio framework.
> + *
> + * Copyright (c) 2019 Jason J. Herne <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "libc.h"
> +#include "s390-ccw.h"
> +#include "s390-arch.h"
> +#include "dasd-ipl.h"
> +#include "helper.h"
> +
> +static char prefix_page[PAGE_SIZE * 2]
> + __attribute__((__aligned__(PAGE_SIZE * 2)));
> +
> +static void enable_prefixing(void)
> +{
> + memcpy(&prefix_page, (void *)0, 4096);
You could use the "lowcore" variable from s390-arch.h here instead of
"(void *)0", I guess.
> + set_prefix(ptr2u32(&prefix_page));
> +}
> +
> +static void disable_prefixing(void)
> +{
> + set_prefix(0);
> + /* Copy io interrupt info back to low core */
> + memcpy((void *)0xB8, prefix_page + 0xB8, 12);
Maybe use &lowcore->subchannel_id instead of 0xB8 ? ... not sure whether
that's nicer here, though...
> +}
> +
> +static bool is_read_tic_ccw_chain(Ccw0 *ccw)
> +{
> + Ccw0 *next_ccw = ccw + 1;
> +
> + return ((ccw->cmd_code == CCW_CMD_DASD_READ ||
> + ccw->cmd_code == CCW_CMD_DASD_READ_MT) &&
> + ccw->chain && next_ccw->cmd_code == CCW_CMD_TIC);
> +}
> +
> +static bool dynamic_cp_fixup(uint32_t ccw_addr, uint32_t *next_cpa)
> +{
> + Ccw0 *cur_ccw = (Ccw0 *)(uint64_t)ccw_addr;
> + Ccw0 *tic_ccw;
> +
> + while (true) {
> + /* Skip over inline TIC (it might not have the chain bit on) */
> + if (cur_ccw->cmd_code == CCW_CMD_TIC &&
> + cur_ccw->cda == ptr2u32(cur_ccw) - 8) {
> + cur_ccw += 1;
> + continue;
> + }
> +
> + if (!cur_ccw->chain) {
> + break;
> + }
> + if (is_read_tic_ccw_chain(cur_ccw)) {
> + /*
> + * Breaking a chain of CCWs may alter the semantics or even the
> + * validity of a channel program. The heuristic implemented below
> + * seems to work well in practice for the channel programs
> + * generated by zipl.
> + */
> + tic_ccw = cur_ccw + 1;
> + *next_cpa = tic_ccw->cda;
> + cur_ccw->chain = 0;
> + return true;
> + }
> + cur_ccw += 1;
> + }
> + return false;
> +}
> +
> +static int run_dynamic_ccw_program(SubChannelId schid, uint16_t cutype,
> + uint32_t cpa)
> +{
> + bool has_next;
> + uint32_t next_cpa = 0;
> + int rc;
> +
> + do {
> + has_next = dynamic_cp_fixup(cpa, &next_cpa);
> +
> + print_int("executing ccw chain at ", cpa);
> + enable_prefixing();
> + rc = do_cio(schid, cutype, cpa, CCW_FMT0);
> + disable_prefixing();
> +
> + if (rc) {
> + break;
> + }
> + cpa = next_cpa;
> + } while (has_next);
> +
> + return rc;
> +}
> +
> +static void make_readipl(void)
> +{
> + Ccw0 *ccwIplRead = (Ccw0 *)0x00;
> +
> + /* Create Read IPL ccw at address 0 */
> + ccwIplRead->cmd_code = CCW_CMD_READ_IPL;
> + ccwIplRead->cda = 0x00; /* Read into address 0x00 in main memory */
> + ccwIplRead->chain = 0; /* Chain flag */
> + ccwIplRead->count = 0x18; /* Read 0x18 bytes of data */
> +}
> +
> +static void run_readipl(SubChannelId schid, uint16_t cutype)
> +{
> + if (do_cio(schid, cutype, 0x00, CCW_FMT0)) {
> + panic("dasd-ipl: Failed to run Read IPL channel program\n");
> + }
> +}
> +
> +/*
> + * The architecture states that IPL1 data should consist of a psw followed by
> + * format-0 READ and TIC CCWs. Let's sanity check.
> + */
> +static void check_ipl1(void)
> +{
> + Ccw0 *ccwread = (Ccw0 *)0x08;
> + Ccw0 *ccwtic = (Ccw0 *)0x10;
> +
> + if (ccwread->cmd_code != CCW_CMD_DASD_READ ||
> + ccwtic->cmd_code != CCW_CMD_TIC) {
> + panic("dasd-ipl: IPL1 data invalid. Is this disk really
> bootable?\n");
> + }
> +}
> +
> +static void check_ipl2(uint32_t ipl2_addr)
> +{
> + Ccw0 *ccw = u32toptr(ipl2_addr);
> +
> + if (ipl2_addr == 0x00) {
> + panic("IPL2 address invalid. Is this disk really bootable?\n");
> + }
> + if (ccw->cmd_code == 0x00) {
> + panic("IPL2 ccw data invalid. Is this disk really bootable?\n");
> + }
> +}
> +
> +static uint32_t read_ipl2_addr(void)
> +{
> + Ccw0 *ccwtic = (Ccw0 *)0x10;
> +
> + return ccwtic->cda;
> +}
> +
> +static void ipl1_fixup(void)
> +{
> + Ccw0 *ccwSeek = (Ccw0 *) 0x08;
> + Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
> + Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
> + Ccw0 *ccwRead = (Ccw0 *) 0x20;
> + CcwSeekData *seekData = (CcwSeekData *) 0x30;
> + CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
> +
> + /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
> + memcpy(ccwRead, (void *)0x08, 16);
lowcore->ccw1 ?
> + /* Disable chaining so we don't TIC to IPL2 channel program */
> + ccwRead->chain = 0x00;
> +
> + ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
> + ccwSeek->cda = ptr2u32(seekData);
> + ccwSeek->chain = 1;
> + ccwSeek->count = sizeof(*seekData);
> + seekData->reserved = 0x00;
> + seekData->cyl = 0x00;
> + seekData->head = 0x00;
> +
> + ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
> + ccwSearchID->cda = ptr2u32(searchData);
> + ccwSearchID->chain = 1;
> + ccwSearchID->count = sizeof(*searchData);
> + searchData->cyl = 0;
> + searchData->head = 0;
> + searchData->record = 2;
> +
> + /* Go back to Search CCW if correct record not yet found */
> + ccwSearchTic->cmd_code = CCW_CMD_TIC;
> + ccwSearchTic->cda = ptr2u32(ccwSearchID);
> +}
> +
> +static void run_ipl1(SubChannelId schid, uint16_t cutype)
> + {
> + uint32_t startAddr = 0x08;
> +
> + if (do_cio(schid, cutype, startAddr, CCW_FMT0)) {
> + panic("dasd-ipl: Failed to run IPL1 channel program\n");
> + }
> +}
> +
> +static void run_ipl2(SubChannelId schid, uint16_t cutype, uint32_t addr)
> +{
> + if (run_dynamic_ccw_program(schid, cutype, addr)) {
> + panic("dasd-ipl: Failed to run IPL2 channel program\n");
> + }
> +}
> +
> +static void lpsw(void *psw_addr)
> +{
> + PSWLegacy *pswl = (PSWLegacy *) psw_addr;
> +
> + pswl->mask |= PSW_MASK_EAMODE; /* Force z-mode */
> + pswl->addr |= PSW_MASK_BAMODE;
> + asm volatile(" llgtr 0,0\n llgtr 1,1\n" /* Some OS's expect to be */
> + " llgtr 2,2\n llgtr 3,3\n" /* in 32-bit mode. Clear */
> + " llgtr 4,4\n llgtr 5,5\n" /* high part of regs to */
> + " llgtr 6,6\n llgtr 7,7\n" /* avoid messing up */
> + " llgtr 8,8\n llgtr 9,9\n" /* instructions that work */
> + " llgtr 10,10\n llgtr 11,11\n" /* in both addressing */
> + " llgtr 12,12\n llgtr 13,13\n" /* modes, like servc. */
> + " llgtr 14,14\n llgtr 15,15\n"
> + " lpsw %0\n"
> + : : "Q" (*pswl) : "cc");
> +}
Have you tried to use jump_to_low_kernel() already? ... it might be
cleaner to do the diag 0x308 reset here, too, to avoid that some part of
the machine is in an unexpected state...
Thomas
- [qemu-s390x] [PATCH v5 00/15] s390: vfio-ccw dasd ipl support, Jason J. Herne, 2019/03/13
- [qemu-s390x] [PATCH v5 14/15] s390-bios: Add channel command codes/structs needed for dasd-ipl, Jason J. Herne, 2019/03/13
- [qemu-s390x] [PATCH v5 15/15] s390-bios: Support booting from real dasd device, Jason J. Herne, 2019/03/13
- Re: [qemu-s390x] [PATCH v5 15/15] s390-bios: Support booting from real dasd device,
Thomas Huth <=
- [qemu-s390x] [PATCH v5 13/15] s390-bios: Use control unit type to determine boot method, Jason J. Herne, 2019/03/13
- [qemu-s390x] [PATCH v5 11/15] s390-bios: cio error handling, Jason J. Herne, 2019/03/13
- [qemu-s390x] [PATCH v5 10/15] s390-bios: Support for running format-0/1 channel programs, Jason J. Herne, 2019/03/13
- [qemu-s390x] [PATCH v5 08/15] s390-bios: Map low core memory, Jason J. Herne, 2019/03/13
- [qemu-s390x] [PATCH v5 07/15] s390-bios: Decouple channel i/o logic from virtio, Jason J. Herne, 2019/03/13