[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] ieee1275: Avoiding many unnecessary open/close
From: |
Eric Snowberg |
Subject: |
Re: [RFC] ieee1275: Avoiding many unnecessary open/close |
Date: |
Fri, 20 Nov 2020 15:52:47 -0700 |
> On Nov 16, 2020, at 10:29 AM, diegodo <diegodo@linux.vnet.ibm.com> wrote:
>
> On 2020-09-24 20:58, Eric Snowberg wrote:
>>> On Sep 24, 2020, at 11:11 AM, diegodo <diegodo@linux.vnet.ibm.com> wrote:
>>> Hi,
>>> we are facing some performance issues with ieee1275 platform (ppc64le
>>> architecture to be more specific) and I would like to suggest a couple of
>>> changes to try to avoid them.
>>> Sometimes the system can take more than 10 minutes to boot (in fact, it can
>>> takes more than 30 and 40 minutes in some cases). From my investigations, I
>>> noticied the following points:
>>> 1. Some types of storage devices (NVMe, Fibre Channel, ...) can have a long
>>> time for shutdown notification. Our investigation here showed us this
>>> shutdown increase with the size of the storage. For example, 6.4TB NVMe can
>>> takes something like 7 seconds just to issue a shutdown notification.
>>> 2. The grub calls a lot for OPEN/READ/CLOSE for each device. I ran some
>>> tests with just one disk (NVMe 6.4TB) and found that, even with one disk,
>>> grub call the OPEN/CLOSE functions more than 60 times. Considering the
>>> opening time of the device something like 1.5 seconds, we have almost 9
>>> minutes just opening and closing the disk.
>>> I did some changes in the code and the time during the boot dropped
>>> drastically - from 492s to only 15s.
>>> The idea is to handle the _actual close_ while _opening_ the disk. So,
>>> during the _grub_ofdisk_open_ function, we check if the requested disk is
>>> the same as the already opened and close the previous one if we are dealing
>>> with a different disk. Yet, I removed the OPEN/CLOSE calls inside the
>>> _grub_ofdisk_get_block_size_ function and let it just checking for the
>>> block size.
>>> What are your suggestions? Is there a better way to handle this? I really
>>> appreciate your help here.
>> I ran into the same issue with the ieee1275 platform on SPARC when using
>> ofdisk. It would take 20+ minutes to get to the GRUB menu. I found the same
>> problem, open and close can take a long time and upper level grub code would
>> issue hundreds of them. I ended up writing obdisk.c. Within it, each disk is
>> only opened once. I tried to make it as generic as possible. You might
>> consider
>> moving from ofdisk to obdisk. If you did, ppc64le device specific code would
>> need to be included in iterate_devtree.
>> It also includes new features not found in ofdisk, such as hot-plug support
>> and
>> the ability to find disks that don’t have a device alias.
>>> Thanks
>>> ---
>>> grub-core/disk/ieee1275/ofdisk.c | 63 +++++++++++++++++---------------
>>> 1 file changed, 34 insertions(+), 29 deletions(-)
>>> diff --git a/grub-core/disk/ieee1275/ofdisk.c
>>> b/grub-core/disk/ieee1275/ofdisk.c
>>> index d887d4b..d0ee4ae 100644
>>> --- a/grub-core/disk/ieee1275/ofdisk.c
>>> +++ b/grub-core/disk/ieee1275/ofdisk.c
>>> @@ -44,7 +44,7 @@ struct ofdisk_hash_ent
>>> };
>>> static grub_err_t
>>> -grub_ofdisk_get_block_size (const char *device, grub_uint32_t *block_size,
>>> +grub_ofdisk_get_block_size (grub_uint32_t *block_size,
>>> struct ofdisk_hash_ent *op);
>>> #define OFDISK_HASH_SZ 8
>>> @@ -461,6 +461,7 @@ grub_ofdisk_open (const char *name, grub_disk_t disk)
>>> grub_ssize_t actual;
>>> grub_uint32_t block_size = 0;
>>> grub_err_t err;
>>> + struct ofdisk_hash_ent *op;
>>> if (grub_strncmp (name, "ieee1275/", sizeof ("ieee1275/") - 1) != 0)
>>> return grub_error (GRUB_ERR_UNKNOWN_DEVICE,
>>> @@ -471,6 +472,34 @@ grub_ofdisk_open (const char *name, grub_disk_t disk)
>>> grub_dprintf ("disk", "Opening `%s'.\n", devpath);
>>> + op = ofdisk_hash_find (devpath);
>>> + if (!op)
>>> + op = ofdisk_hash_add (devpath, NULL);
>>> + if (!op)
>>> + {
>>> + grub_free (devpath);
>>> + return grub_errno;
>>> + }
>>> +
>>> + /* Check if the call to open is the same to the last disk already opened
>>> */
>>> + if (last_devpath && !grub_strcmp(op->open_path,last_devpath))
>>> + {
>>> + goto finish;
>>> + }
>>> +
>>> + /* If not, we need to close the previous disk and open the new one */
>>> + else {
>>> + if (last_ihandle){
>>> + grub_ieee1275_close (last_ihandle);
>>> + }
>>> + last_ihandle = 0;
>>> + last_devpath = NULL;
>>> +
>>> + grub_ieee1275_open (op->open_path, &last_ihandle);
>>> + if (! last_ihandle)
>>> + return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "can't open device");
>>> + }
>>> +
>>> if (grub_ieee1275_finddevice (devpath, &dev))
>>> {
>>> grub_free (devpath);
>>> @@ -491,25 +520,18 @@ grub_ofdisk_open (const char *name, grub_disk_t disk)
>>> return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "not a block device");
>>> }
>>> +
>>> + finish:
>>> /* XXX: There is no property to read the number of blocks. There
>>> should be a property `#blocks', but it is not there. Perhaps it
>>> is possible to use seek for this. */
>>> disk->total_sectors = GRUB_DISK_SIZE_UNKNOWN;
>>> {
>>> - struct ofdisk_hash_ent *op;
>>> - op = ofdisk_hash_find (devpath);
>>> - if (!op)
>>> - op = ofdisk_hash_add (devpath, NULL);
>>> - if (!op)
>>> - {
>>> - grub_free (devpath);
>>> - return grub_errno;
>>> - }
>>> disk->id = (unsigned long) op;
>>> disk->data = op->open_path;
>>> - err = grub_ofdisk_get_block_size (devpath, &block_size, op);
>>> + err = grub_ofdisk_get_block_size (&block_size, op);
>>> if (err)
>>> {
>>> grub_free (devpath);
>>> @@ -532,13 +554,6 @@ grub_ofdisk_open (const char *name, grub_disk_t disk)
>>> static void
>>> grub_ofdisk_close (grub_disk_t disk)
>>> {
>>> - if (disk->data == last_devpath)
>>> - {
>>> - if (last_ihandle)
>>> - grub_ieee1275_close (last_ihandle);
>>> - last_ihandle = 0;
>>> - last_devpath = NULL;
>>> - }
>>> disk->data = 0;
>>> }
>>> @@ -685,7 +700,7 @@ grub_ofdisk_init (void)
>>> }
>>> static grub_err_t
>>> -grub_ofdisk_get_block_size (const char *device, grub_uint32_t *block_size,
>>> +grub_ofdisk_get_block_size (grub_uint32_t *block_size,
>>> struct ofdisk_hash_ent *op)
>>> {
>>> struct size_args_ieee1275
>>> @@ -698,16 +713,6 @@ grub_ofdisk_get_block_size (const char *device,
>>> grub_uint32_t *block_size,
>>> grub_ieee1275_cell_t size2;
>>> } args_ieee1275;
>>> - if (last_ihandle)
>>> - grub_ieee1275_close (last_ihandle);
>>> -
>>> - last_ihandle = 0;
>>> - last_devpath = NULL;
>>> -
>>> - grub_ieee1275_open (device, &last_ihandle);
>>> - if (! last_ihandle)
>>> - return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "can't open device");
>>> -
>>> *block_size = 0;
>>> if (op->block_size_fails >= 2)
>>> --
>>> 2.21.3
>>> _______________________________________________
>>> Grub-devel mailing list
>>> Grub-devel@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/grub-devel
>
>
> Thanks for the help here Eric and good job on writing this new module.
>
> Well, I'm still considering this patch because:
>
> 1. We are facing some problems with powerpc and it seems the code of ofdisk.c
> is somewhat bogus. IMO would be nice to have this code at least working well
> on community. It would help us almost instantly with our problems regarding
> grub on powerpc.
> 2. Although we are considering/discussing to switch to obdisk module, this
> will need an extra effort and time on writing specific code, testing, etc.
>
> From our tests here with powerpc, this code is promising to solve many
> problems that we are currently facing.
>
> So, since we already have obdisk for SPARC, do you think it could be a
> problem to update this code as suggested in the patch?
>
Years ago I tried submitting a patch very similar to yours [1]. The
response I got back from the maintainers was there was concern it may
impact some of the other ieee1275 platforms. I didn’t have the resources
to test them all. Personally I don’t see how anyone can use ofdisk. It
contains a terrible dangling pointer bug that prevented me from adding any
new code [2].
1. https://lists.gnu.org/archive/html/grub-devel/2016-06/msg00038.html
2. https://lists.gnu.org/archive/html/grub-devel/2016-06/msg00035.html