qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 04/11] hw/arm: Add NPCM730 and NPCM750 SoC models


From: Havard Skinnemoen
Subject: Re: [PATCH v5 04/11] hw/arm: Add NPCM730 and NPCM750 SoC models
Date: Tue, 14 Jul 2020 18:03:16 -0700

On Tue, Jul 14, 2020 at 10:11 AM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>
> On 7/14/20 6:01 PM, Markus Armbruster wrote:
> > Philippe Mathieu-Daudé <f4bug@amsat.org> writes:
> >
> >> +Markus
> >>
> >> On 7/14/20 2:44 AM, Havard Skinnemoen wrote:
> >>> On Mon, Jul 13, 2020 at 8:02 AM Cédric Le Goater <clg@kaod.org> wrote:
> >>>>
> >>>> On 7/9/20 2:36 AM, Havard Skinnemoen wrote:
> >>>>> The Nuvoton NPCM7xx SoC family are used to implement Baseboard
> >>>>> Management Controllers in servers. While the family includes four SoCs,
> >>>>> this patch implements limited support for two of them: NPCM730 (targeted
> >>>>> for Data Center applications) and NPCM750 (targeted for Enterprise
> >>>>> applications).
> >>>>>
> >>>>> This patch includes little more than the bare minimum needed to boot a
> >>>>> Linux kernel built with NPCM7xx support in direct-kernel mode:
> >>>>>
> >>>>>   - Two Cortex-A9 CPU cores with built-in periperhals.
> >>>>>   - Global Configuration Registers.
> >>>>>   - Clock Management.
> >>>>>   - 3 Timer Modules with 5 timers each.
> >>>>>   - 4 serial ports.
> >>>>>
> >>>>> The chips themselves have a lot more features, some of which will be
> >>>>> added to the model at a later stage.
> >>>>>
> >>>>> Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
> >>>>> Reviewed-by: Joel Stanley <joel@jms.id.au>
> >>>>> Signed-off-by: Havard Skinnemoen <hskinnemoen@google.com>
> >>>>> ---
> >> ...
> >>
> >>>>> +static void npcm7xx_realize(DeviceState *dev, Error **errp)
> >>>>> +{
> >>>>> +    NPCM7xxState *s = NPCM7XX(dev);
> >>>>> +    NPCM7xxClass *nc = NPCM7XX_GET_CLASS(s);
> >>>>> +    int i;
> >>>>> +
> >>>>> +    /* CPUs */
> >>>>> +    for (i = 0; i < nc->num_cpus; i++) {
> >>>>> +        object_property_set_int(OBJECT(&s->cpu[i]),
> >>>>> +                                arm_cpu_mp_affinity(i, 
> >>>>> NPCM7XX_MAX_NUM_CPUS),
> >>>>> +                                "mp-affinity", &error_abort);
> >>>>> +        object_property_set_int(OBJECT(&s->cpu[i]), 
> >>>>> NPCM7XX_GIC_CPU_IF_ADDR,
> >>>>> +                                "reset-cbar", &error_abort);
> >>>>> +        object_property_set_bool(OBJECT(&s->cpu[i]), true,
> >>>>> +                                 "reset-hivecs", &error_abort);
> >>>>> +
> >>>>> +        /* Disable security extensions. */
> >>>>> +        object_property_set_bool(OBJECT(&s->cpu[i]), false, "has_el3",
> >>>>> +                                 &error_abort);
> >>>>> +
> >>>>> +        qdev_realize(DEVICE(&s->cpu[i]), NULL, &error_abort);
> >>>>
> >>>> I would check the error:
> >>>>
> >>>>         if (!qdev_realize(DEVICE(&s->cpu[i]), NULL, errp)) {
> >>>>             return;
> >>>>         }
> >>>>
> >>>> same for the sysbus_realize() below.
> >>>
> >>> Hmm, I used to propagate these errors until Philippe told me not to
> >>> (or at least that's how I understood it).
> >>
> >> It was before Markus simplification API were merged, you had to
> >> propagate after each call, since this is a non hot-pluggable SoC
> >> I suggested to use &error_abort to simplify.
> >>
> >>> I'll be happy to do it
> >>> either way (and the new API makes it really easy to propagate errors),
> >>> but I worry that I don't fully understand when to propagate errors and
> >>> when not to.
> >>
> >> Markus explained it on the mailing list recently (as I found the doc
> >> not obvious). I can't find the thread. I suppose once the work result
> >> after the "Questionable aspects of QEMU Error's design" discussion is
> >> merged, the documentation will be clarified.
> >
> > The Error API evolved recently.  Please peruse the big comment in
> > include/qapi/error.h.  If still unsure, don't hesitate to ask here.
> >
> >> My rule of thumb so far is:
> >> - programming error (can't happen) -> &error_abort
> >
> > Correct.  Quote the big comment:
> >
> >  * Call a function aborting on errors:
> >  *     foo(arg, &error_abort);
> >  * This is more concise and fails more nicely than
> >  *     Error *err = NULL;
> >  *     foo(arg, &err);
> >  *     assert(!err); // don't do this
> >
> >> - everything triggerable by user or management layer (via QMP command)
> >>   -> &error_fatal, as we can't risk loose the user data, we need to
> >>   shutdown gracefully.
> >
> > Quote the big comment:
> >
> >  * Call a function treating errors as fatal:
> >  *     foo(arg, &error_fatal);
> >  * This is more concise than
> >  *     Error *err = NULL;
> >  *     foo(arg, &err);
> >  *     if (err) { // don't do this
> >  *         error_report_err(err);
> >  *         exit(1);
> >  *     }
> >
> > Terminating the process is generally fine during initial startup,
> > i.e. before the guest runs.
> >
> > It's generally not fine once the guest runs.  Errors need to be handled
> > more gracefully then.  A QMP command, for instance, should fail cleanly,
> > propagating the error to the monitor core, which then sends it to the
> > QMP client, and loops to process the next command.
> >
> >>> It makes sense to me to propagate errors from *_realize() and
> >>> error_abort on failure to set simple properties, but I'd like to know
> >>> if Philippe is on board with that.
> >
> > Realize methods must not use &error_fatal.  Instead, they should clean
> > up and fail.
> >
> > "Clean up" is the part we often neglect.  The big advantage of
> > &error_fatal is that you don't have to bother :)
> >
> > Questions?
>
> One on my side. So in this realize(), all &error_abort uses has
> to be replaced by local_err + propagate ...:
>
> static void npcm7xx_realize(DeviceState *dev, Error **errp)
> {
>     NPCM7xxState *s = NPCM7XX(dev);
>     NPCM7xxClass *nc = NPCM7XX_GET_CLASS(s);
>     int i;
>
>     /* CPUs */
>     for (i = 0; i < nc->num_cpus; i++) {
>         object_property_set_int(OBJECT(&s->cpu[i]),
>                                 arm_cpu_mp_affinity(i,
> NPCM7XX_MAX_NUM_CPUS),
>                                 "mp-affinity", &error_abort);
>         object_property_set_int(OBJECT(&s->cpu[i]), NPCM7XX_GIC_CPU_IF_ADDR,
>                                 "reset-cbar", &error_abort);
>         object_property_set_bool(OBJECT(&s->cpu[i]), true,
>                                  "reset-hivecs", &error_abort);
>
>         /* Disable security extensions. */
>         object_property_set_bool(OBJECT(&s->cpu[i]), false, "has_el3",
>                                  &error_abort);
>
>         qdev_realize(DEVICE(&s->cpu[i]), NULL, &error_abort);
>     }
>     [...]
>
> ... but the caller does:
>
> static void quanta_gsj_init(MachineState *machine)
> {
>     NPCM7xxState *soc;
>
>     soc = npcm7xx_create_soc(machine, QUANTA_GSJ_POWER_ON_STRAPS);
>     npcm7xx_connect_dram(soc, machine->ram);
>     qdev_realize(DEVICE(soc), NULL, &error_abort);
>                                     ^^^^^^^^^^^^
>     npcm7xx_load_kernel(machine, soc);
> }
>
> So we overload the code...
>
> My question: Do you confirm this is worth it to propagate?

Here's my understanding. Please let me know if it sounds right.

1. Internal code failing to set simple properties to predefined values
is a programming error, so error_abort is appropriate.
2. qdev_realize() may fail due to user input, so errors should be propagated.
3. machine init can't propagate errors any further, so all errors are
fatal. But if all realize() functions follow (1) and (2), only user
errors are propagated, so error_fatal should be used to produce a nice
error message rather than "Unexpected error, aborting!"

If any of this can ever be hot-plugged, then it means errors may
propagate somewhere other than the machine init code, so it becomes
extra important not to let bad user input crash the whole qemu
process. I don't know if this is a concern when none of these devices
can currently be hot-plugged.

For example, if the user tries to create a machine with 64 MB RAM, the
gcr device will report an error because it can't represent less than
128 MB of memory. Currently, this is reported as

$ ./arm-softmmu/qemu-system-arm -machine npcm750-evb -nographic -m 64
Unexpected error in npcm7xx_gcr_realize() at
/usr/local/google/home/hskinnemoen/qemu/for-upstream/hw/misc/npcm7xx_gcr.c:151:
qemu-system-arm: npcm7xx_gcr: DRAM size 67108864 is too small (128 MiB minimum)
Aborted

But if I change npcm7xx_realize() to propagate errors from
sysbus_realize(gcr), and change npcm750_evb_init() to use error_fatal
instead of error_abort, I get

$ ./arm-softmmu/qemu-system-arm -machine npcm750-evb -nographic -m 64
qemu-system-arm: npcm7xx_gcr: DRAM size 67108864 is too small (128 MiB minimum)

which seems less scary and more accurate.

Havard



reply via email to

[Prev in Thread] Current Thread [Next in Thread]