[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <e8752d90-5087-4b02-92bc-b3636b5e705d@app.fastmail.com>
Date: Thu, 12 Jun 2025 13:15:46 +0200
From: "Arnd Bergmann" <arnd@...db.de>
To: "James Clark" <james.clark@...aro.org>,
"Vladimir Oltean" <vladimir.oltean@....com>, "Frank Li" <Frank.li@....com>
Cc: "Vladimir Oltean" <olteanv@...il.com>, "Mark Brown" <broonie@...nel.org>,
linux-spi@...r.kernel.org, imx@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA
On Thu, Jun 12, 2025, at 13:05, James Clark wrote:
> On 11/06/2025 10:01 am, Vladimir Oltean wrote:
>> On Tue, Jun 10, 2025 at 11:56:34AM -0400, Frank Li wrote:
>>> Can you add performance beneafit information after use non-coherent memory
>>> in commit message to let reviewer easily know your intention.
>>
>> To expand on that, you can post the output of something like this
>> (before and after):
>> $ spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 10000000
>> where /dev/spidev1.0 is an unconnected chip select with a dummy entry in
>> the device tree.
>
> Coherent (before):
>
> rate: tx 385.8kbps, rx 385.8kbps
> rate: tx 1215.7kbps, rx 1215.7kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1844.0kbps, rx 1844.0kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1846.9kbps, rx 1846.9kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1843.2kbps, rx 1843.2kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Non-coherent (after):
>
> rate: tx 314.6kbps, rx 314.6kbps
> rate: tx 748.3kbps, rx 748.3kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1849.3kbps, rx 1849.3kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Ignoring anything less than 1800 as starting up, coherent has an average
> of 1845.2kbps and non-coherent 1846.5kbps. Not sure if that's just noise
> or an actual effect.
The extra cache flushes do introduce some overhead as well, so I
would expect the noncoherent case to be slightly slower for
small transfers, but the coherent case to be faster for large
transfers.
"--size 256" presumably means 256 bytes, i.e. four cachelines?
If it's easy to reproduce, can you check with smaller sizes
that still use the DMA codepath (e.g. 64 bytes) and much larger
transfers (e.g. 2048 bytes)?
Arnd
Powered by blists - more mailing lists