lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <uw7plhwaat2mpwydjle57ppzubwgvhiq7bvtsort7fic5jpgls@ba4hze2gtfqp>
Date: Mon, 1 Dec 2025 13:58:02 +0900
From: Koichiro Den <den@...inux.co.jp>
To: dave.jiang@...el.com, ntb@...ts.linux.dev, 
	linux-kernel@...r.kernel.org
Cc: jdmason@...zu.us, allenbh@...il.com
Subject: Re: [PATCH 0/4] NTB: ntb_transport: DMA fixes and scalability
 improvements

On Mon, Oct 27, 2025 at 09:43:27AM +0900, Koichiro Den wrote:
> This series contains two DMA-related fixes (Patch #1-2) and two scalability
> improvements (Patch #3-4) for ntb_transport. Behavior remains unchanged
> unless new module parameters are explicitly set.
> 
> New module parameters
> =====================
> 
>   - use_tx_dma : Enable TX DMA independently (default: 0)
>   - use_rx_dma : Enable RX DMA independently (default: 0)
>   - num_tx_dma_chan : # of TX DMA channels per queue (default: 1)
>   - num_rx_dma_chan : # of RX DMA channels per queue (default: 1)
> 
>   Note: legacy 'use_dma' switch is kept and prioritized higher.
>         Enabling it always implies use_tx_dma=1 and use_rx_dma=1
> 	regardless of whether use_(tx|rx)_dma=0 is appended.
> 
> Performance measurement
> =======================
> 
> Tested on R-Car S4. With the following patchsets applied [1]:
> 
>   - [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW offsets
>     (https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/)
>   - [PATCH 0/2] Add 'tx_memcpy_offload' option to ntb_transport
>     (https://lore.kernel.org/all/20251023072105.901707-1-den@valinux.co.jp/)
> 
> throughput became bound by RX DMA service rate. Increasing the number of
> RX DMA channels (>1) improved throughput substantially:
> 
>   - use_rx_dma=1 num_rx_dma_chan=1
>                  ^^^^^^^^^^^^^^^^^
>     (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
> 
>     $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=1
>     sockperf: == version #3.10-no.git == 
>     [...]
>     sockperf: Summary: Message Rate is 8636 [msg/sec], Packet Rate is about 388620 [pkt/sec] (45 ip frags / msg)
>     sockperf: Summary: BandWidth is 538.630 MBps (4309.039 Mbps)
>                                                   ^^^^^^^^^^^^^
> 
>   - use_rx_dma=1 num_rx_dma_chan=2
>                  ^^^^^^^^^^^^^^^^^
>     (full command: $ sudo modprobe ntb_transport tx_memcpy_offload=1 use_rx_dma=1 num_rx_dma_chan=1 use_intr=1)
> 
>     $ sudo sockperf tp -i $SERVER_IP -m 65400 -t 10 # RX DMA n_chan=2
>     sockperf: == version #3.10-no.git == 
>     [...]
>     sockperf: Summary: Message Rate is 14283 [msg/sec], Packet Rate is about 642735 [pkt/sec] (45 ip frags / msg)
>     sockperf: Summary: BandWidth is 890.835 MBps (7126.680 Mbps)
>                                                   ^^^^^^^^^^^^^
> 
> [1] Additional changes are required to use DMA on R-Car S4. Those will be
>     posted separately.
> 
> 
> Koichiro Den (4):
>   NTB: ntb_transport: Handle remapped contiguous region in vmalloc space
>   NTB: ntb_transport: Ack DMA memcpy descriptors to avoid wait-list
>     growth
>   NTB: ntb_transport: Add module parameters use_tx_dma/use_rx_dma
>   NTB: ntb_transport: Support multi-channel DMA via module parameters
> 
>  drivers/ntb/ntb_transport.c | 386 +++++++++++++++++++++++++-----------
>  1 file changed, 270 insertions(+), 116 deletions(-)
> 
> -- 
> 2.48.1
> 

Hi Dave,

As a quick update, this series is likely to be superseded by another work
on the "NTB transport backed by remote DW eDMA" series:
https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
On R-Car S4, the remote eDMA-based approach clearly outperforms the
existing architecture that relied on DMA_MEMCPY engine.

Do you think it would be worth moving this older series forward?
(I'm not sure whether there is an interest from others on this series,
perhaps using some other platforms other than R-Car S4.)

Thank you in advance,

Koichiro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ