[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <uw7plhwaat2mpwydjle57ppzubwgvhiq7bvtsort7fic5jpgls@ba4hze2gtfqp>
Date: Mon, 1 Dec 2025 13:58:02 +0900
From: Koichiro Den <den@...inux.co.jp>
To: dave.jiang@...el.com, ntb@...ts.linux.dev,
linux-kernel@...r.kernel.org
Cc: jdmason@...zu.us, allenbh@...il.com
Subject: Re: [PATCH 0/4] NTB: ntb_transport: DMA fixes and scalability
improvements
On Mon, Oct 27, 2025 at 09:43:27AM +0900, Koichiro Den wrote:
> This series contains two DMA-related fixes (Patch #1-2) and two scalability
> improvements (Patch #3-4) for ntb_transport. Behavior remains unchanged
> unless new module parameters are explicitly set.
>
> New module parameters
> =====================
>
> - use_tx_dma : Enable TX DMA independently (default: 0)
> - use_rx_dma : Enable RX DMA independently (default: 0)
> - num_tx_dma_chan : # of TX DMA channels per queue (default: 1)
> - num_rx_dma_chan : # of RX DMA channels per queue (default: 1)
>
> Note: legacy 'use_dma' switch is kept and prioritized higher.
> Enabling it always implies use_tx_dma=1 and use_rx_dma=1
> regardless of whether use_(tx|rx)_dma=0 is appended.
>
> Performance measurement
> =======================
>
> Tested on R-Car S4. With the following patchsets applied [1]:
>
> - [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW offsets
> (https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/)
> - [PATCH 0/2] Add 'tx_memcpy_offload' option to ntb_transport
> (https://lore.kernel.org/all/20251023072105.901707-1-den@valinux.co.jp/)
>
> throughput became bound by RX DMA service rate. Increasing the number of
> RX DMA channels (>1) improved throughput substantially:
>
> - use_rx_dma=1 num_rx_dma_chan=1
> ^^^^^^^^^^^^^^^^^
> (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>
> $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=1
> sockperf: == version #3.10-no.git ==
> [...]
> sockperf: Summary: Message Rate is 8636 [msg/sec], Packet Rate is about 388620 [pkt/sec] (45 ip frags / msg)
> sockperf: Summary: BandWidth is 538.630 MBps (4309.039 Mbps)
> ^^^^^^^^^^^^^
>
> - use_rx_dma=1 num_rx_dma_chan=2
> ^^^^^^^^^^^^^^^^^
> (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>
> $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=2
> sockperf: == version #3.10-no.git ==
> [...]
> sockperf: Summary: Message Rate is 14283 [msg/sec], Packet Rate is about 642735 [pkt/sec] (45 ip frags / msg)
> sockperf: Summary: BandWidth is 890.835 MBps (7126.680 Mbps)
> ^^^^^^^^^^^^^
>
> [1] Additional changes are required to use DMA on R-Car S4. Those will be
> posted separately.
>
>
> Koichiro Den (4):
> NTB: ntb_transport: Handle remapped contiguous region in vmalloc space
> NTB: ntb_transport: Ack DMA memcpy descriptors to avoid wait-list
> growth
> NTB: ntb_transport: Add module parameters use_tx_dma/use_rx_dma
> NTB: ntb_transport: Support multi-channel DMA via module parameters
>
> drivers/ntb/ntb_transport.c | 386 +++++++++++++++++++++++++-----------
> 1 file changed, 270 insertions(+), 116 deletions(-)
>
> --
> 2.48.1
>
Hi Dave,
As a quick update, this series is likely to be superseded by another work
on the "NTB transport backed by remote DW eDMA" series:
https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
On R-Car S4, the remote eDMA-based approach clearly outperforms the
existing architecture that relied on DMA_MEMCPY engine.
Do you think it would be worth moving this older series forward?
(I'm not sure whether there is an interest from others on this series,
perhaps using some other platforms other than R-Car S4.)
Thank you in advance,
Koichiro
Powered by blists - more mailing lists