[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9bf945b-9fc6-4829-addf-2fb7a7d4eb36@linaro.org>
Date: Tue, 1 Jul 2025 13:42:46 +0100
From: James Clark <james.clark@...aro.org>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: Vladimir Oltean <olteanv@...il.com>, Mark Brown <broonie@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Larisa Grigore <larisa.grigore@....com>,
Frank Li <Frank.li@....com>, Christoph Hellwig <hch@....de>,
linux-spi@...r.kernel.org, imx@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/6] spi: spi-fsl-dspi: Target mode improvements
On 30/06/2025 4:26 pm, Vladimir Oltean wrote:
> On Fri, Jun 27, 2025 at 11:21:36AM +0100, James Clark wrote:
>> Improve usability of target mode by reporting FIFO errors and increasing
>> the buffer size when DMA is used. While we're touching DMA stuff also
>> switch to non-coherent memory, although this is unrelated to target
>> mode.
>>
>> The first commit is marked as a fix because it can fix intermittent
>> issues with existing transfers, rather than the later fixes which
>> improve larger than FIFO target mode transfers which would have never
>> worked.
>>
>> With the combination of the commit to increase the DMA buffer size and
>> the commit to use non-coherent memory, the host mode performance figures
>> are as follows on S32G3:
>>
>> # spidev_test --device /dev/spidev1.0 --bpw 8 --size <test_size> --cpha --iter 10000000 --speed 10000000
>>
>> Coherent (4096 byte transfers): 6534 kbps
>> Non-coherent: 7347 kbps
>>
>> Coherent (16 byte transfers): 447 kbps
>> Non-coherent: 448 kbps
>>
>> Just for comparison running the same test in XSPI mode:
>>
>> 4096 byte transfers: 2143 kbps
>> 16 byte transfers: 637 kbps
>>
>> These tests required hacking S32G3 to use DMA in host mode, although
>> the figures should be representative of target mode too where DMA is
>> used. And the other devices that use DMA in host mode should see similar
>> improvements.
>>
>> Signed-off-by: James Clark <james.clark@...aro.org>
>> ---
>
> My test numbers on LS1028A:
>
> Baseline XSPI (unmodified driver):
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 8 --cpha --iter 10000000 --speed 10000000
> rate: tx 2710.6kbps, rx 2710.6kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 16 --cpha --iter 10000000 --speed 10000000
> rate: tx 3217.5kbps, rx 3217.5kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 4096 --cpha --iter 10000000 --speed 10000000
> rate: tx 5118.4kbps, rx 5118.4kbps
>
> Baseline DMA (modified just DSPI_XSPI_MODE -> DSPI_DMA_MODE):
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 8 --cpha --iter 10000000 --speed 10000000
> rate: tx 1359.5kbps, rx 1359.5kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 16 --cpha --iter 10000000 --speed 10000000
> rate: tx 1461.1kbps, rx 1461.1kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 4096 --cpha --iter 10000000 --speed 10000000
> rate: tx 1664.6kbps, rx 1664.6kbps
>
> Intermediary LS1028A DMA mode (using non-coherent buffers but still
> small DMA buffers, i.e. holding just 1 FIFO size worth of data):
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 8 --cpha --iter 10000000 --speed 10000000
> rate: tx 1345.1kbps, rx 1345.1kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 16 --cpha --iter 10000000 --speed 10000000
> rate: tx 1522.5kbps, rx 1522.5kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 4096 --cpha --iter 10000000 --speed 10000000
> rate: tx 1690.8kbps, rx 1690.8kbps
>
> Final LS1028A DMA mode (with the patch to send large messages as a
> single DMA buffer applied):
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 8 --cpha --iter 10000000 --speed 10000000
> rate: tx 2247.0kbps, rx 2247.0kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 16 --cpha --iter 10000000 --speed 10000000
> rate: tx 3477.4kbps, rx 3477.4kbps
> $ ./spidev_test --device /dev/spidev2.1 --bpw 8 --size 4096 --cpha --iter 10000000 --speed 10000000
> rate: tx 8978.4kbps, rx 8978.4kbps
>
> So after your patch set, DMA mode on LS1028A becomes more performant and
> should replace XSPI. This is an outstanding result. That can be done as
> follow-up work.
I wonder if latency could be higher despite increased throughput? It
probably wouldn't be a big enough increase that anyone would care. And
based on the structure of the driver if throughput is higher the latency
might even be lower.
Powered by blists - more mailing lists