[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <39d470c3-9e13-4ae6-9111-74ad7f4ef67b@intel.com>
Date: Mon, 1 Dec 2025 12:04:18 -0700
From: Dave Jiang <dave.jiang@...el.com>
To: Koichiro Den <den@...inux.co.jp>, ntb@...ts.linux.dev,
linux-kernel@...r.kernel.org
Cc: jdmason@...zu.us, allenbh@...il.com
Subject: Re: [PATCH 0/4] NTB: ntb_transport: DMA fixes and scalability
improvements
On 11/30/25 9:58 PM, Koichiro Den wrote:
> On Mon, Oct 27, 2025 at 09:43:27AM +0900, Koichiro Den wrote:
>> This series contains two DMA-related fixes (Patch #1-2) and two scalability
>> improvements (Patch #3-4) for ntb_transport. Behavior remains unchanged
>> unless new module parameters are explicitly set.
>>
>> New module parameters
>> =====================
>>
>> - use_tx_dma : Enable TX DMA independently (default: 0)
>> - use_rx_dma : Enable RX DMA independently (default: 0)
>> - num_tx_dma_chan : # of TX DMA channels per queue (default: 1)
>> - num_rx_dma_chan : # of RX DMA channels per queue (default: 1)
>>
>> Note: legacy 'use_dma' switch is kept and prioritized higher.
>> Enabling it always implies use_tx_dma=1 and use_rx_dma=1
>> regardless of whether use_(tx|rx)_dma=0 is appended.
>>
>> Performance measurement
>> =======================
>>
>> Tested on R-Car S4. With the following patchsets applied [1]:
>>
>> - [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW offsets
>> (https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/)
>> - [PATCH 0/2] Add 'tx_memcpy_offload' option to ntb_transport
>> (https://lore.kernel.org/all/20251023072105.901707-1-den@valinux.co.jp/)
>>
>> throughput became bound by RX DMA service rate. Increasing the number of
>> RX DMA channels (>1) improved throughput substantially:
>>
>> - use_rx_dma=1 num_rx_dma_chan=1
>> ^^^^^^^^^^^^^^^^^
>> (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>>
>> $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=1
>> sockperf: == version #3.10-no.git ==
>> [...]
>> sockperf: Summary: Message Rate is 8636 [msg/sec], Packet Rate is about 388620 [pkt/sec] (45 ip frags / msg)
>> sockperf: Summary: BandWidth is 538.630 MBps (4309.039 Mbps)
>> ^^^^^^^^^^^^^
>>
>> - use_rx_dma=1 num_rx_dma_chan=2
>> ^^^^^^^^^^^^^^^^^
>> (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>>
>> $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=2
>> sockperf: == version #3.10-no.git ==
>> [...]
>> sockperf: Summary: Message Rate is 14283 [msg/sec], Packet Rate is about 642735 [pkt/sec] (45 ip frags / msg)
>> sockperf: Summary: BandWidth is 890.835 MBps (7126.680 Mbps)
>> ^^^^^^^^^^^^^
>>
>> [1] Additional changes are required to use DMA on R-Car S4. Those will be
>> posted separately.
>>
>>
>> Koichiro Den (4):
>> NTB: ntb_transport: Handle remapped contiguous region in vmalloc space
>> NTB: ntb_transport: Ack DMA memcpy descriptors to avoid wait-list
>> growth
>> NTB: ntb_transport: Add module parameters use_tx_dma/use_rx_dma
>> NTB: ntb_transport: Support multi-channel DMA via module parameters
>>
>> drivers/ntb/ntb_transport.c | 386 +++++++++++++++++++++++++-----------
>> 1 file changed, 270 insertions(+), 116 deletions(-)
>>
>> --
>> 2.48.1
>>
>
> Hi Dave,
>
> As a quick update, this series is likely to be superseded by another work
> on the "NTB transport backed by remote DW eDMA" series:
> https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> On R-Car S4, the remote eDMA-based approach clearly outperforms the
> existing architecture that relied on DMA_MEMCPY engine.
Does it use a different transport?
>
> Do you think it would be worth moving this older series forward?
> (I'm not sure whether there is an interest from others on this series,
> perhaps using some other platforms other than R-Car S4.)
I guess it doesn't hurt. Jon?
>
> Thank you in advance,
>
> Koichiro
Powered by blists - more mailing lists