lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <39d470c3-9e13-4ae6-9111-74ad7f4ef67b@intel.com>
Date: Mon, 1 Dec 2025 12:04:18 -0700
From: Dave Jiang <dave.jiang@...el.com>
To: Koichiro Den <den@...inux.co.jp>, ntb@...ts.linux.dev,
 linux-kernel@...r.kernel.org
Cc: jdmason@...zu.us, allenbh@...il.com
Subject: Re: [PATCH 0/4] NTB: ntb_transport: DMA fixes and scalability
 improvements



On 11/30/25 9:58 PM, Koichiro Den wrote:
> On Mon, Oct 27, 2025 at 09:43:27AM +0900, Koichiro Den wrote:
>> This series contains two DMA-related fixes (Patch #1-2) and two scalability
>> improvements (Patch #3-4) for ntb_transport. Behavior remains unchanged
>> unless new module parameters are explicitly set.
>>
>> New module parameters
>> =====================
>>
>>   - use_tx_dma : Enable TX DMA independently (default: 0)
>>   - use_rx_dma : Enable RX DMA independently (default: 0)
>>   - num_tx_dma_chan : # of TX DMA channels per queue (default: 1)
>>   - num_rx_dma_chan : # of RX DMA channels per queue (default: 1)
>>
>>   Note: legacy 'use_dma' switch is kept and prioritized higher.
>>         Enabling it always implies use_tx_dma=1 and use_rx_dma=1
>> 	regardless of whether use_(tx|rx)_dma=0 is appended.
>>
>> Performance measurement
>> =======================
>>
>> Tested on R-Car S4. With the following patchsets applied [1]:
>>
>>   - [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW offsets
>>     (https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/)
>>   - [PATCH 0/2] Add 'tx_memcpy_offload' option to ntb_transport
>>     (https://lore.kernel.org/all/20251023072105.901707-1-den@valinux.co.jp/)
>>
>> throughput became bound by RX DMA service rate. Increasing the number of
>> RX DMA channels (>1) improved throughput substantially:
>>
>>   - use_rx_dma=1 num_rx_dma_chan=1
>>                  ^^^^^^^^^^^^^^^^^
>>     (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>>
>>     $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=1
>>     sockperf: == version #3.10-no.git == 
>>     [...]
>>     sockperf: Summary: Message Rate is 8636 [msg/sec], Packet Rate is about 388620 [pkt/sec] (45 ip frags / msg)
>>     sockperf: Summary: BandWidth is 538.630 MBps (4309.039 Mbps)
>>                                                   ^^^^^^^^^^^^^
>>
>>   - use_rx_dma=1 num_rx_dma_chan=2
>>                  ^^^^^^^^^^^^^^^^^
>>     (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
>>
>>     $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=2
>>     sockperf: == version #3.10-no.git == 
>>     [...]
>>     sockperf: Summary: Message Rate is 14283 [msg/sec], Packet Rate is about 642735 [pkt/sec] (45 ip frags / msg)
>>     sockperf: Summary: BandWidth is 890.835 MBps (7126.680 Mbps)
>>                                                   ^^^^^^^^^^^^^
>>
>> [1] Additional changes are required to use DMA on R-Car S4. Those will be
>>     posted separately.
>>
>>
>> Koichiro Den (4):
>>   NTB: ntb_transport: Handle remapped contiguous region in vmalloc space
>>   NTB: ntb_transport: Ack DMA memcpy descriptors to avoid wait-list
>>     growth
>>   NTB: ntb_transport: Add module parameters use_tx_dma/use_rx_dma
>>   NTB: ntb_transport: Support multi-channel DMA via module parameters
>>
>>  drivers/ntb/ntb_transport.c | 386 +++++++++++++++++++++++++-----------
>>  1 file changed, 270 insertions(+), 116 deletions(-)
>>
>> -- 
>> 2.48.1
>>
> 
> Hi Dave,
> 
> As a quick update, this series is likely to be superseded by another work
> on the "NTB transport backed by remote DW eDMA" series:
> https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> On R-Car S4, the remote eDMA-based approach clearly outperforms the
> existing architecture that relied on DMA_MEMCPY engine.

Does it use a different transport?

> 
> Do you think it would be worth moving this older series forward?
> (I'm not sure whether there is an interest from others on this series,
> perhaps using some other platforms other than R-Car S4.)

I guess it doesn't hurt. Jon?

> 
> Thank you in advance,
> 
> Koichiro


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ