lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 17 Mar 2022 22:19:25 +0530
From:   Vignesh Raghavendra <vigneshr@...com>
To:     David Laight <David.Laight@...LAB.COM>,
        "'Michael Walle'" <michael@...le.cc>
CC:     Tudor Ambarus <tudor.ambarus@...rochip.com>,
        "p.yadav@...com" <p.yadav@...com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "miquel.raynal@...tlin.com" <miquel.raynal@...tlin.com>,
        "richard@....at" <richard@....at>,
        "linux-mtd@...ts.infradead.org" <linux-mtd@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>,
        "nicolas.ferre@...rochip.com" <nicolas.ferre@...rochip.com>
Subject: Re: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR
 mode



On 17/03/22 4:40 pm, David Laight wrote:
> From: Vignesh Raghavendra
>> Sent: 17 March 2022 10:24
> ...
>> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
>> flash where DMA can pull data as if though you are reading from On chip RAM
> 
> So the cpu does an MMIO read cycle to the controller which doesn't
> complete until (for the nibble-mode spi device I have):
> 1) Chipselect is asserted.
> 2) The 8-bit command has been clocked out.
> 3) The 32bit address have been clocked out (8 clocks in nibbles).
> 4) A few (probably 4) extra delay clocks are added.
> 5) The data is read - 8 clocks for 32bits in nibble mode.
> 6) Chipselect is removed.
> 
> Now you can do long sequential reads without all the red tape.
> But a random read in nibble mode is about 30 clocks.
> 16 bit mode saves 6 clocks for the data and maybe 6 for the address?
> 
> The controller could do 'clever stuff' for sequential reads.
> At a cost of slowing down random reads.
> 
> So even at 400MHz it isn't that fast.

Random CPU reads would be inherently slow, its just how HW is.

But, there are cases like image load from flash and Filesystem over
flash which would use DMA to maximize performance, such cases would be
greatly affected if we do SW byte swap

> 
> If the MMIO interface to the flash controller is PCIe you can
> add in a load of extra latency for the cpu read itself.
> 
> While PCIe allows multiple read requests to be outstanding,
> the Intel cpu I've looked at serialise the reads from each
> cpu core (each cpu always uses the same TLP tag).
> 
> Now longer read TLP help a lot (IIRC max is 256 bytes).
> But the x86 cpu will only generate read TLP for register reads.
> You need to use AVX512 registers (or cache line fetches) to
> get better throughput!
> 

Direct CPU fetch from SPI would not be able to make use of full
Bandwidth for high speed flashes and its not the only usecase.

Regards
Vignesh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ