qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 09/11] hw/ssi: NPCM7xx Flash Interface Unit device model


From: Havard Skinnemoen
Subject: Re: [PATCH v5 09/11] hw/ssi: NPCM7xx Flash Interface Unit device model
Date: Mon, 13 Jul 2020 19:39:16 -0700

On Mon, Jul 13, 2020 at 10:39 AM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>
> On 7/12/20 7:42 AM, Havard Skinnemoen wrote:
> > On Thu, Jul 9, 2020 at 10:00 AM Philippe Mathieu-Daudé <f4bug@amsat.org> 
> > wrote:
> >> On 7/9/20 2:36 AM, Havard Skinnemoen wrote:
> >>> This implements a device model for the NPCM7xx SPI flash controller.
> >>>
> >>> Direct reads and writes, and user-mode transactions have been tested in
> >>> various modes. Protection features are not implemented yet.
> >>>
> >>> All the FIU instances are available in the SoC's address space,
> >>> regardless of whether or not they're connected to actual flash chips.
> >>>
> >>> Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
> >>> Reviewed-by: Cédric Le Goater <clg@kaod.org>
> >>> Signed-off-by: Havard Skinnemoen <hskinnemoen@google.com>
> >>> ---
> >>>  include/hw/arm/npcm7xx.h     |   2 +
> >>>  include/hw/ssi/npcm7xx_fiu.h | 100 +++++++
> >>>  hw/arm/npcm7xx.c             |  53 ++++
> >>>  hw/ssi/npcm7xx_fiu.c         | 510 +++++++++++++++++++++++++++++++++++
> >>>  hw/arm/Kconfig               |   1 +
> >>>  hw/ssi/Makefile.objs         |   1 +
> >>>  hw/ssi/trace-events          |   9 +
> >>>  7 files changed, 676 insertions(+)
> >>>  create mode 100644 include/hw/ssi/npcm7xx_fiu.h
> >>>  create mode 100644 hw/ssi/npcm7xx_fiu.c
> [...]
>
> >>> diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
> >>> index 4d227bb74b..c9ff3dab25 100644
> >>> --- a/hw/arm/npcm7xx.c
> >>> +++ b/hw/arm/npcm7xx.c
> >>> @@ -98,6 +98,37 @@ static const hwaddr npcm7xx_uart_addr[] = {
> >>>      0xf0004000,
> >>>  };
> >>>
> >>> +static const hwaddr npcm7xx_fiu0_flash_addr[] = {
> >>
> >> So per
> >> https://github.com/Nuvoton-Israel/bootblock/blob/master/SWC_HAL/Chips/npcm750/npcm750.h
> >> this is SPI0 on AHB18,
> >>
> >>> +    0x80000000,
> >>> +    0x88000000,
> >>
> >> CS0 & CS1,
> >>
> >> also listed:
> >>
> >> 0x90000000, // CS2
> >> 0x98000000, // CS3
> >
> > Confirmed with Nuvoton off-list that these do not exist. SPI0 only
> > supports two chip-selects for direct access.
>
> I suppose Novoton confirmed for the particular npcm750, but you aim
> to model the npcm7xx family. I doubt 2 similar IP blocks are that
> different ;) Anyway with a comment this is good.

OK. I'm mostly concerned about modeling the chips I know about for
now. If a chip with four AXI endpoints and four chipselects on SPI0
appears, it will be a fairly small change to move these arrays into
the SoC class struct, but I'd rather not do that without evidence that
such a chip exist.

The IP blocks may be identical for all I know, but they are not wired
up identically, and that's really what matters here in the SoC-level
model.

> >
> > I'll add comments.
> >
> >>> +};
> >>> +
> >>> +static const hwaddr npcm7xx_fiu3_flash_addr[] = {
> >>
> >> Ditto SPI3 on AHB3, and CS0 to CS3.
> >>
> >>> +    0xa0000000,
> >>> +    0xa8000000,
> >>> +    0xb0000000,
> >>> +    0xb8000000,
> >>> +};
> >>> +
> >>> +static const struct {
> >>> +    const char *name;
> >>> +    hwaddr regs_addr;
> >>> +    int cs_count;
> >>> +    const hwaddr *flash_addr;
> >>> +} npcm7xx_fiu[] = {
> >>> +    {
> >>> +        .name = "fiu0",
> >>> +        .regs_addr = 0xfb000000,
> >>> +        .cs_count = ARRAY_SIZE(npcm7xx_fiu0_flash_addr),
> >>
> >> Hmm without the datasheet, can't tell, but I'd expect 4 CS
> >> regardless.
> >
> > FIU0/SPI0 only has 2 chip selects.
> >
> >>> +        .flash_addr = npcm7xx_fiu0_flash_addr,
> >>> +    }, {
> >>> +        .name = "fiu3",
> >>> +        .regs_addr = 0xc0000000,
> >>> +        .cs_count = ARRAY_SIZE(npcm7xx_fiu3_flash_addr),
> >>> +        .flash_addr = npcm7xx_fiu3_flash_addr,
> >>> +    },
> >>> +};
> [...]
>
> >>> +
> >>> +    /* Flash chip model expects one transfer per dummy bit, not byte */
> >>> +    dummy_cycles =
> >>> +        (FIU_DRD_CFG_DBW(drd_cfg) * 8) >> FIU_DRD_CFG_ACCTYPE(drd_cfg);
> >>> +    for (i = 0; i < dummy_cycles; i++) {
> >>> +        ssi_transfer(fiu->spi, 0);
> >>
> >> Note to self, might worth a ssi_shift_dummy(count) generic method.
> >
> > I'm not a huge fan of this interface to be honest. It requires the
> > flash controller to have intimate knowledge of the flash chip, and if
> > it doesn't predict the dummy phase correctly, or the guest programs
> > the wrong number of dummy cycles, the end result is very different
> > from what you'll see on a real system. I'd love to see something like
> > a number-of-bits parameter to ssi_transfer instead.
>
> Do you mean like these?
>
> - ssi_transfer_bit(bool value);
> - ssi_shift_dummy_bits(size_t bits);
>
> Some interfaces allow bit shifting. SPI doesn't simply because
> nobody had the use :)

I mean I don't like how the semantics of ssi_transfer changes
implicitly in the middle of the transfer. One call shifts a whole
byte, the next 8 shift individual bits, then it's back to bytes again.
If a mistake is made, the flash controller might end up thinking it's
shifting 16 bits, but if the flash device only expects 8 dummy bits,
it will see 8 bits + 8 bytes for a total of 9 bytes of dummies. This
differs a lot from what would happen on a real device.

For example, I wrote up a test for the aspeed flash controller against
a Winbond flash chip in Dual I/O mode. The data ended up getting
shifted by 15 bytes because of, I think, bugs in the dummy logic on
both sides. I took a lot of head scratching to figure out what's going
on.

Here's where aspeed_smc figures it needs to send 2 dummy bytes (aka 16
dummy bits) for a DIO read:

https://elixir.bootlin.com/qemu/latest/source/hw/ssi/aspeed_smc.c#L803

And here's where the Winbond model decides to expect one dummy _bit_:

https://elixir.bootlin.com/qemu/latest/source/hw/block/m25p80.c#L862

The difference is 15 ssi_transfers, which is what I see in the test.

IMO they're both wrong -- they should both expect one dummy byte, or
four dummy cycles in x2 mode. But the real problem, also IMO, is that
they can't even agree on whether one ssi_transfer means one bit or one
byte.

While ssi_shift_dummy_bits is strictly nicer than open-coded loops of
ssi_transfer, it doesn't fix the underlying problem that the dummy
cycles need to be treated specially by both the flash controller model
and the device model, and if they're not 100% synchronized (which is
particularly interesting when you have flash chips with configurable
dummy cycles), the result is like nothing you'd ever see on a real
system.

> >
> >>> +    }
> >>> +
> >>> +    for (i = 0; i < size; i++) {
> >>> +        value |= ssi_transfer(fiu->spi, 0) << (8 * i);
> >>> +    }
> >>> +
> [...]
>
> >>> +static const MemoryRegionOps npcm7xx_fiu_flash_ops = {
> >>> +    .read = npcm7xx_fiu_flash_read,
> >>> +    .write = npcm7xx_fiu_flash_write,
> >>> +    .endianness = DEVICE_LITTLE_ENDIAN,
> >>> +    .valid = {
> >>> +        .min_access_size = 1,
> >>> +        .max_access_size = 8,
> >>
> >> Are you sure? Maybe, I can' tell.
> >
> > Real hardware supports 16 bytes, but there's no way to do more than 8
> > in emulation, I think?
>
> That would mean you can plug this device on a 128-bit wide bus,
> and you can transfer 128-bit in a single CPU operation.

Hmm, I see what you mean. It can do 16-byte bursts. I'm not sure if
that looks more like a single 16-byte access or four 4-byte accesses
on the SPI bus. I was assuming it's the former, but I could be wrong.

Havard



reply via email to

[Prev in Thread] Current Thread [Next in Thread]