lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Mar 2022 11:10:21 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Vignesh Raghavendra' <vigneshr@...com>,
        'Michael Walle' <michael@...le.cc>
CC:     Tudor Ambarus <tudor.ambarus@...rochip.com>,
        "p.yadav@...com" <p.yadav@...com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "miquel.raynal@...tlin.com" <miquel.raynal@...tlin.com>,
        "richard@....at" <richard@....at>,
        "linux-mtd@...ts.infradead.org" <linux-mtd@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>,
        "nicolas.ferre@...rochip.com" <nicolas.ferre@...rochip.com>
Subject: RE: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR
 mode

From: Vignesh Raghavendra
> Sent: 17 March 2022 10:24
...
> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
> flash where DMA can pull data as if though you are reading from On chip RAM

So the cpu does an MMIO read cycle to the controller which doesn't
complete until (for the nibble-mode spi device I have):
1) Chipselect is asserted.
2) The 8-bit command has been clocked out.
3) The 32bit address have been clocked out (8 clocks in nibbles).
4) A few (probably 4) extra delay clocks are added.
5) The data is read - 8 clocks for 32bits in nibble mode.
6) Chipselect is removed.

Now you can do long sequential reads without all the red tape.
But a random read in nibble mode is about 30 clocks.
16 bit mode saves 6 clocks for the data and maybe 6 for the address?

The controller could do 'clever stuff' for sequential reads.
At a cost of slowing down random reads.

So even at 400MHz it isn't that fast.

If the MMIO interface to the flash controller is PCIe you can
add in a load of extra latency for the cpu read itself.

While PCIe allows multiple read requests to be outstanding,
the Intel cpu I've looked at serialise the reads from each
cpu core (each cpu always uses the same TLP tag).

Now longer read TLP help a lot (IIRC max is 256 bytes).
But the x86 cpu will only generate read TLP for register reads.
You need to use AVX512 registers (or cache line fetches) to
get better throughput!

The alternative is getting the flash controller to issue
the read/write TLP for memory transfers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ