[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d755a917-7763-764e-7030-3afd5154053c@ti.com>
Date: Thu, 17 Mar 2022 22:19:25 +0530
From: Vignesh Raghavendra <vigneshr@...com>
To: David Laight <David.Laight@...LAB.COM>,
"'Michael Walle'" <michael@...le.cc>
CC: Tudor Ambarus <tudor.ambarus@...rochip.com>,
"p.yadav@...com" <p.yadav@...com>,
"broonie@...nel.org" <broonie@...nel.org>,
"miquel.raynal@...tlin.com" <miquel.raynal@...tlin.com>,
"richard@....at" <richard@....at>,
"linux-mtd@...ts.infradead.org" <linux-mtd@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>,
"nicolas.ferre@...rochip.com" <nicolas.ferre@...rochip.com>
Subject: Re: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR
mode
On 17/03/22 4:40 pm, David Laight wrote:
> From: Vignesh Raghavendra
>> Sent: 17 March 2022 10:24
> ...
>> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
>> flash where DMA can pull data as if though you are reading from On chip RAM
>
> So the cpu does an MMIO read cycle to the controller which doesn't
> complete until (for the nibble-mode spi device I have):
> 1) Chipselect is asserted.
> 2) The 8-bit command has been clocked out.
> 3) The 32bit address have been clocked out (8 clocks in nibbles).
> 4) A few (probably 4) extra delay clocks are added.
> 5) The data is read - 8 clocks for 32bits in nibble mode.
> 6) Chipselect is removed.
>
> Now you can do long sequential reads without all the red tape.
> But a random read in nibble mode is about 30 clocks.
> 16 bit mode saves 6 clocks for the data and maybe 6 for the address?
>
> The controller could do 'clever stuff' for sequential reads.
> At a cost of slowing down random reads.
>
> So even at 400MHz it isn't that fast.
Random CPU reads would be inherently slow, its just how HW is.
But, there are cases like image load from flash and Filesystem over
flash which would use DMA to maximize performance, such cases would be
greatly affected if we do SW byte swap
>
> If the MMIO interface to the flash controller is PCIe you can
> add in a load of extra latency for the cpu read itself.
>
> While PCIe allows multiple read requests to be outstanding,
> the Intel cpu I've looked at serialise the reads from each
> cpu core (each cpu always uses the same TLP tag).
>
> Now longer read TLP help a lot (IIRC max is 256 bytes).
> But the x86 cpu will only generate read TLP for register reads.
> You need to use AVX512 registers (or cache line fetches) to
> get better throughput!
>
Direct CPU fetch from SPI would not be able to make use of full
Bandwidth for high speed flashes and its not the only usecase.
Regards
Vignesh
Powered by blists - more mailing lists