linux-kernel - Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <e8752d90-5087-4b02-92bc-b3636b5e705d@app.fastmail.com>
Date: Thu, 12 Jun 2025 13:15:46 +0200
From: "Arnd Bergmann" <arnd@...db.de>
To: "James Clark" <james.clark@...aro.org>,
 "Vladimir Oltean" <vladimir.oltean@....com>, "Frank Li" <Frank.li@....com>
Cc: "Vladimir Oltean" <olteanv@...il.com>, "Mark Brown" <broonie@...nel.org>,
 linux-spi@...r.kernel.org, imx@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA

On Thu, Jun 12, 2025, at 13:05, James Clark wrote:
> On 11/06/2025 10:01 am, Vladimir Oltean wrote:
>> On Tue, Jun 10, 2025 at 11:56:34AM -0400, Frank Li wrote:
>>> Can you add performance beneafit information after use non-coherent memory
>>> in commit message to let reviewer easily know your intention.
>> 
>> To expand on that, you can post the output of something like this
>> (before and after):
>> $ spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 10000000
>> where /dev/spidev1.0 is an unconnected chip select with a dummy entry in
>> the device tree.
>
> Coherent (before):
>
> rate: tx 385.8kbps, rx 385.8kbps
> rate: tx 1215.7kbps, rx 1215.7kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1844.0kbps, rx 1844.0kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1846.9kbps, rx 1846.9kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1843.2kbps, rx 1843.2kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Non-coherent (after):
>
> rate: tx 314.6kbps, rx 314.6kbps
> rate: tx 748.3kbps, rx 748.3kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1849.3kbps, rx 1849.3kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Ignoring anything less than 1800 as starting up, coherent has an average 
> of 1845.2kbps and non-coherent 1846.5kbps. Not sure if that's just noise 
> or an actual effect.

The extra cache flushes do introduce some overhead as well, so I
would expect the noncoherent case to be slightly slower for
small transfers, but the coherent case to be faster for large
transfers.

"--size 256" presumably means 256 bytes, i.e. four cachelines?
If it's easy to reproduce, can you check with smaller sizes
that still use the DMA codepath (e.g. 64 bytes) and much larger
transfers (e.g. 2048 bytes)?

      Arnd