qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation


From: Philippe Mathieu-Daudé
Subject: Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation
Date: Tue, 30 Jun 2020 08:09:10 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

On 6/29/20 11:31 PM, BALATON Zoltan wrote:
> On Mon, 29 Jun 2020, Philippe Mathieu-Daudé wrote:
>> On 6/27/20 9:17 AM, Markus Armbruster wrote:
>>> BALATON Zoltan <balaton@eik.bme.hu> writes:
>>>> On Wed, 22 Apr 2020, BALATON Zoltan wrote:
>>>>> On Wed, 22 Apr 2020, Philippe Mathieu-Daudé wrote:
>>>>>> On 4/22/20 4:27 PM, BALATON Zoltan wrote:
>>>>>>> On Wed, 22 Apr 2020, Markus Armbruster wrote:
>>>>>>>> The Error ** argument must be NULL, &error_abort, &error_fatal,
>>>>>>>> or a
>>>>>>>> pointer to a variable containing NULL.  Passing an
>>>>>>>> argument of the
>>>>>>>> latter kind twice without clearing it in between is wrong: if the
>>>>>>>> first call sets an error, it no longer points to NULL for the
>>>>>>>> second
>>>>>>>> call.
>>>>>>>>
>>>>>>>> spd_data_generate() can pass @errp to error_setg() more than
>>>>>>>> once when
>>>>>>>> it adjusts both memory size and type.  Harmless, because
>>>>>>>> no caller
>>>>>>>> passes anything that needs adjusting.  Until the previous
>>>>>>>> commit,
>>>>>>>> sam460ex passed types that needed adjusting, but not sizes.
>>>>>>>>
>>>>>>>> spd_data_generate()'s contract is rather awkward:
>>>>>>>>
>>>>>>>>    If everything's fine, return non-null and don't
>>>>>>>> set an error.
>>>>>>>>
>>>>>>>>    Else, if memory size or type need adjusting,
>>>>>>>> return non-null and
>>>>>>>>    set an error describing the adjustment.
>>>>>>>>
>>>>>>>>    Else, return null and set an error reporting
>>>>>>>> why no data can be
>>>>>>>>    generated.
>>>>>>>>
>>>>>>>> Its callers treat the error as a warning even when null is
>>>>>>>> returned.
>>>>>>>> They don't create the "smbus-eeprom" device then. 
>>>>>>>> Suspicious.
>>>>>>>>
>>>>>>>> Since the previous commit, only "everything's fine" can actually
>>>>>>>> happen.  Drop the unused code and simplify the
>>>>>>>> callers.  This gets rid
>>>>>>>> of the error API violation.
>>>>>>>
>>>>>>> This leaves board code no chance to recover from values given by
>>>>>>> user that won't fit without duplicating checks that this function
>>>>>>> does. Also this will abort without giving meaningful errors if an
>>>>>>> invalid value does get through and result in a crash which is not
>>>>>>> used friendly. So I don't like this but if others think this is
>>>>>>> acceptable maybe at least unit test should be adjusted to make
>>>>>>> sure aborts cannot be triggered by user for values that are not
>>>>>>> usually tested during development.
>>>>>>
>>>>>> Agreed. Do you have an example (or more) to better show Markus this
>>>>>> code use? So we can add tests.
>>>>>
>>>>> After Markus's patches probably nothing uses it any more but this
>>>>> comes with the result that previously giving some random value such
>>>>> as -m 100 did produce a working sam460ex machine after some warnings
>>>>> but now it just thows back some errors to the user which may or may
>>>>> not be helpful to them.
>>>>>
>>>>>> Personally I'd use a script to generate a dumb static array of all
>>>>>> possible sizes...
>>>>>
>>>>> Maybe testing with the biggest valid value such as -m 2048 (that's
>>>>> commonly used probably) and an invalid value such as -m 100 might be
>>>>> enough. Testing all possible values might take too long and would
>>>>> not test what happens with invalid values. Ideally those invalud
>>>>> values should also work like before a0258e4afa but should at least
>>>>> give a meaningful warning so the user can fix the command line
>>>>> without too much head scratching. Actually that commit was from Igor
>>>>> not from Marcus so sorry for attributing that to Marcus too, I
>>>>> remembered wrong.
>>>>>
>>>>> By the way you could argue that on real machine you cannot plug
>>>>> certain combinations of memory modules so it's enough to model that
>>>>> but I think QEMU does not have to be that strict and also support
>>>>> configs that cannot happen on real hadware but would work. This
>>>>> might be useful for example if you have some ammount of memory to
>>>>> set aside for a VM on a host but that's not a size that exists in
>>>>> memory modules on real hardware. This also works on pc machine in
>>>>> qemu-system-i386 for example: it accepts -m 100 and does its best to
>>>>> create a machine with such unrealistic size. The sam460ex did the
>>>>> same (within SoC's limits) and before a0258e4afa -m 100 was fixed up
>>>>> to 96 MB which is now not possible due to change in QEMU internal
>>>>> APIs. This probably isn't important enough to worth the extra effort
>>>>> to support but would have been nice to preserve.
>>>>
>>>> Besides the above here's another use case of the fix ups that I wanted
>>>> to keep:
>>>>
>>>> cover.1592315226.git.balaton@eik.bme.hu/b5f4598529a77f15f554c593e9be2d0ff9e5fab3.1592315226.git.balaton@eik.bme.hu/">https://patchew.org/QEMU/cover.1592315226.git.balaton@eik.bme.hu/b5f4598529a77f15f554c593e9be2d0ff9e5fab3.1592315226.git.balaton@eik.bme.hu/
>>>>
>>>>
>>>> This board normally uses OpenBIOS which gets RAM size from fw_cfg and
>>>> so works with whatever amount of RAM (also Linux booted with -kernel
>>>> probably does not care) so any -memory value is valid. However some
>>>> may want to also use original firmware ROM for compatibility which
>>>> detects RAM reading SPD eeproms (the i2c emulation needed for that is
>>>> not working yet but once that's fixed this will be the case). I want
>>>> to add smbus_eeproms for this but do not want to just abort for cases
>>>> where -memory given by user cannot be covered with SPD data. Instead a
>>>> warning and covering as much RAM as possible should be enough (the ROM
>>>> will detect less RAM than given with -m
>>>> but that's OK and better than just bailing out without a message
>>>> tripping an assert). But I don't want to replicate in board code the
>>>> calculation and checks the spd_data_generate() function does anyway
>>>> (that would just puzzle reviewers for every use of this functions).
>>>>
>>>> Previously this was possible with my original spd_data_generate()
>>>> implementation. What's your suggestion to bring that functionality
>>>> back without breaking Error API? Maybe adding new parameters to tell
>>>> the spd_data_generate() which fixups are allowed?
>>>
>>> Quick reply without having thought through the issues at all: I'm not
>>> opposed to you doing work to enable additional or even arbitrary memory
>>> sizes where these actually work.  I'm first and foremost opposed to me
>>> wasting time on "improving" code that is not used for anything.  That's
>>> why I dumbed down spd_data_generate().
>>
>> I'm starting to understand Zoltan point. What I'm seeing is Zoltan using
>> a hobbyist code, that just happens to work for hobbyists, but get in the
>> way of enterprise quality standards.
> 
> This is not necessarily a conflict between hobbyist vs enterprise but
> more like different view on what the qemu-system-* CLI should be. I
> think the CLI is the main human interface of QEMU as it does not really
> provide a GUI for configuring or running VMs (as for example VirtualBox
> does, QEMU only has minimal GUI to view and control running VMs) so
> users are forced to use either the command line or maybe an external
> management frontend, but for simple things (like hobbyist use) that's an
> overkill and also not a good match as those are designed for enterprise
> use. (Also these hobbyist are on Windows or macOS where these management
> apps are not available and getting a working QEMU binary is already a
> challenge.)
> 
> The problem is that these management frontends don't have a proper API
> to control QEMU but abuse the CLI and QEMU monitor for this which are
> supposed to be human interfaces at the first place but changing the
> commands for the needs of management apps result in arcane command
> lines. Note that humans and management apps likely have different
> requirements so if you mean hobbyist = human and enterprise = management
> frontend then that's about what my problem is. I think humans and
> management apps could coexist using the same interfaces if these cannot
> be cleanly separated (as that would need either changing management apps
> to use something else than the main human interface or providing proper
> GUI or CLI frontend for humans) but if they use the same CLI then
> allowing some convenience commands to make the life of humans easier
> should not be forbidden. Running a VM should be simple and not require
> typing multiple lines of options just to result in an error that
> something is not what QEMU thinks is acceptable even though it could
> work and could be fixed. That's really annoying for a human but may be
> desirable for a management app so it does not need to check it got what
> it think it specified.
> 
>> Zoltan doesn't have the skills/time/motivation to rework its working
>> code to meet the enterprise quality level. Enterprise developers tried
>> to understand twice (first Igor, then Markus) the hobbyist use to get
>> it done safer, so it can stay maintained.
> 
> Of course I don't have time or motivation to make it enterprise quality
> when I work unpayed on this in my free time and for fun. I already spend
> too much time with this so while I try to make it good enough to be
> included upstream the direction is clearly different than what
> enterprise users need. But that's OK as the machines I work with are not
> really used in an enterprise setting and mostly used by hobbyists, but
> if some of the components or machines could be useful to enterprise
> people I expect them to put in the effort to get them to enterprise level.
> 
> But this probably does not apply to the very problem discussed here.
> When I've added new machines (apart from sam460ex also pegasos2 which is
> not upstream yet and now hopefully Mac machines soon too) these needed
> SPD eeproms because their firmwares detected RAM based on it. There were
> some already existing boards which emulated SPD but these were ad-hoc
> implementations without any commonality. To avoid increasing the mess by
> adding a few more independent SPD emulations that would get out of sync
> I've spent some time to come up with a common function that could be
> used by all these boards and the new ones I wanted to add. The goal of
> this function was to put SPD emulation in a single place and make it
> easy for board code to use it without needing to duplicate code.
> 
> Also Marcus mentioned uniformity between machines: Most machines, like
> pc ones accept any memory size such as -m 100 even though on real
> hardware it's not possible but can work with the firmware in QEMU that
> usually take this info from FW_CFG or something else and not resticted
> by SPD data. I wanted to do the same in sam460ex and allow it to use any
> memory size exactly for uniformity besides used convenience, even though
> that machine has some constraints so it required to fix up RAM size to
> meet those constraints. So -m 100 would result in 96 MB of RAM that the
> SoC and firmware can handle and is closest to what the user intended.
> This worked well until Igor changed memory allocation to memdev (which I
> don't even know what it is: some enterprise stuff not really needed for
> hobbyists but maybe could be useful e.g. to save guest memory image so
> why not) but this required getting rid of fix ups of memory size in
> boards (sam460ex wasn't the only one) beacuse memdev could not support
> this for some reason and Igor did not want to add that (even though I've
> proposed some designs, you can look up in patch review). So this broke
> fix ups, then Marcus noticed that errors reporting via err object cannot
> be used for warnings as I've tried to use so to fix it he just removed
> all the reamaining traces of it thereby making it more difficult to add
> SPD eeproms to mac_oldworld without duplicating the removed checks in
> board code which I wanted to avoid because:
> 
> 1. This is knowledge about SPD eeproms that should be in that func
> 2. Would duplicate non-trivial code in boards that would puzzle
> reviewers and is error prone too.
> 
>> Zoltan, I guess I understood your use and have an idea to rework it in
>> a way that everybody is happy, but as Markus said, since the freeze is
>> next week, I won't have time to get it done in this short amount of
>> time.
> 
> It's not urgent but if we can agree on something that's acceptable for
> everyone I may be able to submit a patch but don't want to put in effort
> if it will be turned down anyway due to nothing else than the current
> solution being acceptable based on principles over convenience. Arguing
> with Markus about it before got me that impression so I'd rather ask
> before wasting time with it.
> 
>> From the PPC460EX-NUB800T-AMCC-datasheet-11553412.pdf datasheet I
>> understand the 460EX can support "Up to 8 GB in four external banks",
>> but the SAM 460ex board only wires a single bank (to the SODIMM
>> connector). You want to use a virtual board with up-to 4 banks in
>> use, right?
> 
> No, the firmware won't check additional banks because it only checks the
> one wired. So what we need is to put as much RAM as possible on that
> SODIMM (and we can use that SoC can handle both DDR and DDR2) but since
> it's already broken and limited to valid SODIMM sizes due to memdev not
> supporting memory size fix ups fixing this again is not high priority.
> 
> What I'd like is reverting f26740c61a57f and fix that some other way so
> I don't have to duplicate size check in board code as can be seen in the
> patchew link above but could just call spd_data_generate() to do its
> job. This was discussed at the time that patch was in review you can
> read it here:
> 
> http://patchwork.ozlabs.org/project/qemu-devel/patch/20200420132826.8879-3-armbru@redhat.com/
> 
> 
> My points were not really considered then, now that I have another use
> case maybe it could be revisited and fixed. What I want is to be able to
> call spd_data_generate() from board code with whatever sizé (the
> board does not need to know about SPD limits and so cannot pre-check the
> size) and the function should return the largest possible size SPD and
> some indication if the size was not used completely. If Error cannot be
> used for this, return the message or error some other way but let the
> board code decide if it wants to abort or it can use the smaller SPD. Do
> not assert in the helper function. Maybe the DIMM type fix up can be
> dropped and only keep the size fix up so then we don't need to use error
> twice, the board could call the function again if a different type is
> also acceptable, since only sam460ex would need this I can do that there
> for type fixup and call spd_data_generate() again with DDR2 if first one
> with DDR could not fit all ram. But at least the asserts should be
> dropped for this and the size check brought back. Then adding SPD to
> mac_oldworld could also be done by calling spd_data_generate() instead
> of duplicating the checks this function does anyway. This board has
> three slots so if user says -m 1400 it would call spd_data_generate()
> with 1400 first, get back 512 SPD that it adds to first slot then calls
> spd_data_generate() again with 888, gets 512 again that it adds to 2nd
> slot and calls spd_data_generate() for last slot with 376 which would
> give 256 and 120 remaining that it may warn the user about but still
> continue because the SPD data is only used by a ROM from real hardware
> (that may be used for compatibility with some software) but the default
> OpenBIOS disregards SPD data and would still use 1400 so it's not an
> error to abort on. Simply if using a firmare ROM then only 1280 MB of
> the 1400 will be available due to its limitations but that's not a
> reason to force users to change their command line. Printing a warning
> is enough to hint they may use different value but aborting without an
> error message on an assert which is the current situation is not really
> a user friendly way.

I just noticed we have a MachineClass::fixup_ram_size() handler. There
is only one implementation (on s390x) which does a bit the opposite:
If the user asks for -m 1400, it will pad to a valid physical size,
so in your example to 1472 MiB. Then the guest can use only 1400 if it
is happy with it. 72 MiB are wasted, but this is still better than the
576 MiB wasted if we were using 2 GiB instead ;)

See commit 5c30ef937f5:

static ram_addr_t s390_fixup_ram_size(ram_addr_t sz)
{
    /* same logic as in sclp.c */
    int increment_size = 20;
    ram_addr_t newsz;

    while ((sz >> increment_size) > MAX_STORAGE_INCREMENTS) {
        increment_size++;
    }
    newsz = sz >> increment_size << increment_size;

    if (sz != newsz) {
        qemu_printf("Ram size %" PRIu64 "MB was fixed up to %" PRIu64
                    "MB to match machine restrictions. "
                    "Consider updating "
                    "the guest definition.\n", (uint64_t) (sz / MiB),
                    (uint64_t) (newsz / MiB));
    }
    return newsz;
}

> 
> Hopefully at least somebody will read it up to this point, sorry for
> writing that much but hopefully this explains my point of view.

Someone did ;) I know it is not always easy, English is not the default
language for various developers.

> 
> Regards,
> BALATON Zoltan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]