lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aS8Ou7YacTs2yLqk@lizhi-Precision-Tower-5810>
Date: Tue, 2 Dec 2025 11:07:23 -0500
From: Frank Li <Frank.li@....com>
To: Koichiro Den <den@...inux.co.jp>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
	dmaengine@...r.kernel.org, linux-kernel@...r.kernel.org,
	mani@...nel.org, kwilczynski@...nel.org, kishon@...nel.org,
	bhelgaas@...gle.com, corbet@....net, vkoul@...nel.org,
	jdmason@...zu.us, dave.jiang@...el.com, allenbh@...il.com,
	Basavaraj.Natikar@....com, Shyam-sundar.S-k@....com,
	kurt.schwemmer@...rosemi.com, logang@...tatee.com,
	jingoohan1@...il.com, lpieralisi@...nel.org, robh@...nel.org,
	jbrunet@...libre.com, fancer.lancer@...il.com, arnd@...db.de,
	pstanner@...hat.com, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v2 00/27] NTB transport backed by remote DW eDMA

On Tue, Dec 02, 2025 at 03:20:01PM +0900, Koichiro Den wrote:
> On Mon, Dec 01, 2025 at 05:02:57PM -0500, Frank Li wrote:
> > On Sun, Nov 30, 2025 at 01:03:38AM +0900, Koichiro Den wrote:
> > > Hi,
> > >
> > > This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
> > > goal is unchanged, i.e. to improve performance between RC and EP
> > > (with vNTB) over ntb_transport, but the approach has changed drastically.
> > > Based on the feedback from Frank Li in the v1 thread, in particular:
> > > https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
> > > this RFC v2 instead builds an NTB transport backed by remote eDMA
> > > architecture and reshapes the series around it. The RC->EP interruption
> > > is now achieved using a dedicated eDMA read channel, so the somewhat
> > > "hack"-ish approach in RFC v1 is no longer needed.
> > >
> > > Compared to RFC v1, this v2 series enables NTB transport backed by
> > > remote DW eDMA, so the current ntb_transport handling of Memory Window
> > > is no longer needed, and direct DMA transfers between EP and RC are
> > > used.
> > >
> > > I realize this is quite a large series. Sorry for the volume, but for
> > > the RFC stage I believe presenting the full picture in a single set
> > > helps with reviewing the overall architecture. Once the direction is
> > > agreed, I will respin it split by subsystem and topic.
> > >
> > >
> > ...
> > >
> > > - Before this change:
> > >
> > >   * ping
> > >     64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> > >
> > >   * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > >     [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
> > >     [  5]   0.00-10.01  sec   344 MBytes   288 Mbits/sec  3.483 ms  51/5555 (0.92%)  receiver
> > >     [  6]   0.00-10.01  sec   342 MBytes   287 Mbits/sec  3.814 ms  38/5517 (0.69%)  receiver
> > >     [SUM]   0.00-10.01  sec   686 MBytes   575 Mbits/sec  3.648 ms  89/11072 (0.8%)  receiver
> > >
> > >   * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > >     [  5]   0.00-10.03  sec   334 MBytes   279 Mbits/sec  3.164 ms  390/5731 (6.8%)  receiver
> > >     [  6]   0.00-10.03  sec   334 MBytes   279 Mbits/sec  2.416 ms  396/5741 (6.9%)  receiver
> > >     [SUM]   0.00-10.03  sec   667 MBytes   558 Mbits/sec  2.790 ms  786/11472 (6.9%)  receiver
> > >
> > >     Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> > >
> > > - After this change (use_remote_edma=1) [1]:
> > >
> > >   * ping
> > >     64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
> > >     64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
> > >
> > >   * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > >     [  5]   0.00-10.01  sec  3.54 GBytes  3.04 Gbits/sec  0.030 ms  0/58007 (0%)  receiver
> > >     [  6]   0.00-10.01  sec  3.71 GBytes  3.19 Gbits/sec  0.453 ms  0/60909 (0%)  receiver
> > >     [  9]   0.00-10.01  sec  3.85 GBytes  3.30 Gbits/sec  0.027 ms  0/63072 (0%)  receiver
> > >     [ 11]   0.00-10.01  sec  3.26 GBytes  2.80 Gbits/sec  0.070 ms  1/53512 (0.0019%)  receiver
> > >     [SUM]   0.00-10.01  sec  14.4 GBytes  12.3 Gbits/sec  0.145 ms  1/235500 (0.00042%)  receiver
> > >
> > >   * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > >     [  5]   0.00-10.03  sec  3.40 GBytes  2.91 Gbits/sec  0.104 ms  15467/71208 (22%)  receiver
> > >     [  6]   0.00-10.03  sec  3.08 GBytes  2.64 Gbits/sec  0.176 ms  12097/62609 (19%)  receiver
> > >     [  9]   0.00-10.03  sec  3.38 GBytes  2.90 Gbits/sec  0.270 ms  17212/72710 (24%)  receiver
> > >     [ 11]   0.00-10.03  sec  2.56 GBytes  2.19 Gbits/sec  0.200 ms  11193/53090 (21%)  receiver
> >
> > Almost 10x fast, 2.9G vs 279M? high light this one will bring more peopole
> > interesting about this topic.
>
> Thank you for the review!
>
> OK, I'll highlight this in the next iteration.
> By the way, my impression is that we can achieve even higher with this remote
> eDMA architecture.

eDMA can reduce one memory copy and longer TLP data length. Previously, I
tried use RDMA framework some year ago, but it is over complex and stop the
work.

>
> >
> > >     [SUM]   0.00-10.03  sec  12.4 GBytes  10.6 Gbits/sec  0.188 ms  55969/259617 (22%)  receiver
> > >
> > >   [1] configfs settings:
> > >       # modprobe pci_epf_vntb dyndbg=+pmf
> > >       # cd /sys/kernel/config/pci_ep/
> > >       # mkdir functions/pci_epf_vntb/func1
> > >       # echo 0x1912 >   functions/pci_epf_vntb/func1/vendorid
> > >       # echo 0x0030 >   functions/pci_epf_vntb/func1/deviceid
> > >       # echo 32 >       functions/pci_epf_vntb/func1/msi_interrupts
> > >       # echo 16 >       functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > >       # echo 128 >      functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > >       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > >       # echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > >       # echo 0x20000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > >       # echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> >
> > look like, you try to create sub-small mw windows.
> >
> > Is it more clean ?
> >
> > echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.0
> > echo 0x20000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.1
> >
> > so wm1.1 natively continue from prevous one.
>
> Thanks for the suggestion.
>
> I was trying to keep the sub-small mw windows referred to in the same way
> as normal windows for simplicity and readability, but I agree your proposal
> looks intuitive from a User-eXperience point of view.
>
> My only concern is that e.g. {mw1.0, mw1.1, mw2.0} may translate internally
> into something like {mw1, mw2, mw3} effectively, and that numbering
> mismatch might become confusing when reading or debugging the code.

If there are enough bars, you can try use one dedicate bar for EDMA register
space, LL space shared with bar0 (control bar) to reduce complex, and get
better performace firstly.

Frank

>
> -Koichiro
>
> >
> > Frank
> >
> > >       # echo 0x1912 >   functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > >       # echo 0x0030 >   functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > >       # echo 0x10 >     functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > >       # echo 0 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > >       # echo 4 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > >       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > >       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > >       # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > >       # echo 1 > controllers/e65d0000.pcie-ep/start
> > >
> > >
> > > Thanks for taking a look.
> > >
> > >
> > > Koichiro Den (27):
> > >   PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > >     access
> > >   PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > >   NTB: epf: Handle mwN_offset for inbound MW regions
> > >   PCI: endpoint: Add inbound mapping ops to EPC core
> > >   PCI: dwc: ep: Implement EPC inbound mapping support
> > >   PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > >   NTB: Add offset parameter to MW translation APIs
> > >   PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > >     present
> > >   NTB: ntb_transport: Support offsetted partial memory windows
> > >   NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > >   NTB: epf: vntb: Implement .get_pci_epc() callback
> > >   damengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > >     interrupts
> > >   NTB: ntb_transport: Use seq_file for QP stats debugfs
> > >   NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > >   NTB: ntb_transport: Dynamically determine qp count
> > >   NTB: ntb_transport: Introduce get_dma_dev() helper
> > >   NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > >   NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > >   PCI: dwc: ep: Cache MSI outbound iATU mapping
> > >   NTB: ntb_transport: Introduce remote eDMA backed transport mode
> > >   NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > >   ntb_netdev: Multi-queue support
> > >   NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > >   iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > >   iommu: ipmmu-vmsa: Add support for reserved regions
> > >   arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > >     eDMA
> > >   NTB: epf: Add an additional memory window (MW2) barno mapping on
> > >     Renesas R-Car
> > >
> > >  arch/arm64/boot/dts/renesas/Makefile          |    2 +
> > >  .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   46 +
> > >  .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
> > >  drivers/dma/dw-edma/dw-edma-core.c            |   28 +-
> > >  drivers/iommu/ipmmu-vmsa.c                    |    7 +-
> > >  drivers/net/ntb_netdev.c                      |  341 ++-
> > >  drivers/ntb/Kconfig                           |   11 +
> > >  drivers/ntb/Makefile                          |    3 +
> > >  drivers/ntb/hw/amd/ntb_hw_amd.c               |    6 +-
> > >  drivers/ntb/hw/epf/ntb_hw_epf.c               |  177 +-
> > >  drivers/ntb/hw/idt/ntb_hw_idt.c               |    3 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen1.c            |    6 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen1.h            |    2 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen3.c            |    3 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen4.c            |    6 +-
> > >  drivers/ntb/hw/mscc/ntb_hw_switchtec.c        |    6 +-
> > >  drivers/ntb/msi.c                             |    6 +-
> > >  drivers/ntb/ntb_edma.c                        |  628 ++++++
> > >  drivers/ntb/ntb_edma.h                        |  128 ++
> > >  .../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
> > >  drivers/ntb/test/ntb_perf.c                   |    4 +-
> > >  drivers/ntb/test/ntb_tool.c                   |    6 +-
> > >  .../pci/controller/dwc/pcie-designware-ep.c   |  287 ++-
> > >  drivers/pci/controller/dwc/pcie-designware.h  |    7 +
> > >  drivers/pci/endpoint/functions/pci-epf-vntb.c |  229 ++-
> > >  drivers/pci/endpoint/pci-epc-core.c           |   44 +
> > >  include/linux/ntb.h                           |   39 +-
> > >  include/linux/ntb_transport.h                 |   21 +
> > >  include/linux/pci-epc.h                       |   11 +
> > >  29 files changed, 3415 insertions(+), 523 deletions(-)
> > >  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > >  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > >  create mode 100644 drivers/ntb/ntb_edma.c
> > >  create mode 100644 drivers/ntb/ntb_edma.h
> > >  rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
> > >
> > > --
> > > 2.48.1
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ