lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lcd4tdtioldne3ixae5qdxqfj2tf47ox5m5423gl6nmhefpwyy@t5nkhl6n5pg4>
Date: Wed, 3 Dec 2025 17:43:24 +0900
From: Koichiro Den <den@...inux.co.jp>
To: Frank Li <Frank.li@....com>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org, 
	dmaengine@...r.kernel.org, linux-kernel@...r.kernel.org, mani@...nel.org, 
	kwilczynski@...nel.org, kishon@...nel.org, bhelgaas@...gle.com, corbet@....net, 
	vkoul@...nel.org, jdmason@...zu.us, dave.jiang@...el.com, allenbh@...il.com, 
	Basavaraj.Natikar@....com, Shyam-sundar.S-k@....com, kurt.schwemmer@...rosemi.com, 
	logang@...tatee.com, jingoohan1@...il.com, lpieralisi@...nel.org, robh@...nel.org, 
	jbrunet@...libre.com, fancer.lancer@...il.com, arnd@...db.de, pstanner@...hat.com, 
	elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v2 00/27] NTB transport backed by remote DW eDMA

On Tue, Dec 02, 2025 at 11:07:23AM -0500, Frank Li wrote:
> On Tue, Dec 02, 2025 at 03:20:01PM +0900, Koichiro Den wrote:
> > On Mon, Dec 01, 2025 at 05:02:57PM -0500, Frank Li wrote:
> > > On Sun, Nov 30, 2025 at 01:03:38AM +0900, Koichiro Den wrote:
> > > > Hi,
> > > >
> > > > This is RFC v2 of the NTB/PCI series for Renesas R-Car S4. The ultimate
> > > > goal is unchanged, i.e. to improve performance between RC and EP
> > > > (with vNTB) over ntb_transport, but the approach has changed drastically.
> > > > Based on the feedback from Frank Li in the v1 thread, in particular:
> > > > https://lore.kernel.org/all/aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810/
> > > > this RFC v2 instead builds an NTB transport backed by remote eDMA
> > > > architecture and reshapes the series around it. The RC->EP interruption
> > > > is now achieved using a dedicated eDMA read channel, so the somewhat
> > > > "hack"-ish approach in RFC v1 is no longer needed.
> > > >
> > > > Compared to RFC v1, this v2 series enables NTB transport backed by
> > > > remote DW eDMA, so the current ntb_transport handling of Memory Window
> > > > is no longer needed, and direct DMA transfers between EP and RC are
> > > > used.
> > > >
> > > > I realize this is quite a large series. Sorry for the volume, but for
> > > > the RFC stage I believe presenting the full picture in a single set
> > > > helps with reviewing the overall architecture. Once the direction is
> > > > agreed, I will respin it split by subsystem and topic.
> > > >
> > > >
> > > ...
> > > >
> > > > - Before this change:
> > > >
> > > >   * ping
> > > >     64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> > > >
> > > >   * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > > >     [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
> > > >     [  5]   0.00-10.01  sec   344 MBytes   288 Mbits/sec  3.483 ms  51/5555 (0.92%)  receiver
> > > >     [  6]   0.00-10.01  sec   342 MBytes   287 Mbits/sec  3.814 ms  38/5517 (0.69%)  receiver
> > > >     [SUM]   0.00-10.01  sec   686 MBytes   575 Mbits/sec  3.648 ms  89/11072 (0.8%)  receiver
> > > >
> > > >   * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > > >     [  5]   0.00-10.03  sec   334 MBytes   279 Mbits/sec  3.164 ms  390/5731 (6.8%)  receiver
> > > >     [  6]   0.00-10.03  sec   334 MBytes   279 Mbits/sec  2.416 ms  396/5741 (6.9%)  receiver
> > > >     [SUM]   0.00-10.03  sec   667 MBytes   558 Mbits/sec  2.790 ms  786/11472 (6.9%)  receiver
> > > >
> > > >     Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> > > >
> > > > - After this change (use_remote_edma=1) [1]:
> > > >
> > > >   * ping
> > > >     64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.48 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.03 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.931 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.910 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.07 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.986 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.910 ms
> > > >     64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.883 ms
> > > >
> > > >   * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > > >     [  5]   0.00-10.01  sec  3.54 GBytes  3.04 Gbits/sec  0.030 ms  0/58007 (0%)  receiver
> > > >     [  6]   0.00-10.01  sec  3.71 GBytes  3.19 Gbits/sec  0.453 ms  0/60909 (0%)  receiver
> > > >     [  9]   0.00-10.01  sec  3.85 GBytes  3.30 Gbits/sec  0.027 ms  0/63072 (0%)  receiver
> > > >     [ 11]   0.00-10.01  sec  3.26 GBytes  2.80 Gbits/sec  0.070 ms  1/53512 (0.0019%)  receiver
> > > >     [SUM]   0.00-10.01  sec  14.4 GBytes  12.3 Gbits/sec  0.145 ms  1/235500 (0.00042%)  receiver
> > > >
> > > >   * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > > >     [  5]   0.00-10.03  sec  3.40 GBytes  2.91 Gbits/sec  0.104 ms  15467/71208 (22%)  receiver
> > > >     [  6]   0.00-10.03  sec  3.08 GBytes  2.64 Gbits/sec  0.176 ms  12097/62609 (19%)  receiver
> > > >     [  9]   0.00-10.03  sec  3.38 GBytes  2.90 Gbits/sec  0.270 ms  17212/72710 (24%)  receiver
> > > >     [ 11]   0.00-10.03  sec  2.56 GBytes  2.19 Gbits/sec  0.200 ms  11193/53090 (21%)  receiver
> > >
> > > Almost 10x fast, 2.9G vs 279M? high light this one will bring more peopole
> > > interesting about this topic.
> >
> > Thank you for the review!
> >
> > OK, I'll highlight this in the next iteration.
> > By the way, my impression is that we can achieve even higher with this remote
> > eDMA architecture.
> 
> eDMA can reduce one memory copy and longer TLP data length. Previously, I
> tried use RDMA framework some year ago, but it is over complex and stop the
> work.

That's interesting. Thank you for the info.

> 
> >
> > >
> > > >     [SUM]   0.00-10.03  sec  12.4 GBytes  10.6 Gbits/sec  0.188 ms  55969/259617 (22%)  receiver
> > > >
> > > >   [1] configfs settings:
> > > >       # modprobe pci_epf_vntb dyndbg=+pmf
> > > >       # cd /sys/kernel/config/pci_ep/
> > > >       # mkdir functions/pci_epf_vntb/func1
> > > >       # echo 0x1912 >   functions/pci_epf_vntb/func1/vendorid
> > > >       # echo 0x0030 >   functions/pci_epf_vntb/func1/deviceid
> > > >       # echo 32 >       functions/pci_epf_vntb/func1/msi_interrupts
> > > >       # echo 16 >       functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > > >       # echo 128 >      functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > > >       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > > >       # echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > > >       # echo 0x20000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > > >       # echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> > >
> > > look like, you try to create sub-small mw windows.
> > >
> > > Is it more clean ?
> > >
> > > echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.0
> > > echo 0x20000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1.1
> > >
> > > so wm1.1 natively continue from prevous one.
> >
> > Thanks for the suggestion.
> >
> > I was trying to keep the sub-small mw windows referred to in the same way
> > as normal windows for simplicity and readability, but I agree your proposal
> > looks intuitive from a User-eXperience point of view.
> >
> > My only concern is that e.g. {mw1.0, mw1.1, mw2.0} may translate internally
> > into something like {mw1, mw2, mw3} effectively, and that numbering
> > mismatch might become confusing when reading or debugging the code.
> 
> If there are enough bars, you can try use one dedicate bar for EDMA register
> space, LL space shared with bar0 (control bar) to reduce complex, and get
> better performace firstly.

Thank you for the suggestion. Once I have the critical pieces (which we are
discussing in several threads for this RFCv2 series) sorted out and start
preparing the next iteration, I'll revisit this.

Koichiro

> 
> Frank
> 
> >
> > -Koichiro
> >
> > >
> > > Frank
> > >
> > > >       # echo 0x1912 >   functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > > >       # echo 0x0030 >   functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > > >       # echo 0x10 >     functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > > >       # echo 0 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > > >       # echo 4 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > > >       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > > >       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > > >       # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > > >       # echo 1 > controllers/e65d0000.pcie-ep/start
> > > >
> > > >
> > > > Thanks for taking a look.
> > > >
> > > >
> > > > Koichiro Den (27):
> > > >   PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > > >     access
> > > >   PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > > >   NTB: epf: Handle mwN_offset for inbound MW regions
> > > >   PCI: endpoint: Add inbound mapping ops to EPC core
> > > >   PCI: dwc: ep: Implement EPC inbound mapping support
> > > >   PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > > >   NTB: Add offset parameter to MW translation APIs
> > > >   PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > > >     present
> > > >   NTB: ntb_transport: Support offsetted partial memory windows
> > > >   NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > > >   NTB: epf: vntb: Implement .get_pci_epc() callback
> > > >   damengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > > >     interrupts
> > > >   NTB: ntb_transport: Use seq_file for QP stats debugfs
> > > >   NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > > >   NTB: ntb_transport: Dynamically determine qp count
> > > >   NTB: ntb_transport: Introduce get_dma_dev() helper
> > > >   NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > > >   NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > > >   PCI: dwc: ep: Cache MSI outbound iATU mapping
> > > >   NTB: ntb_transport: Introduce remote eDMA backed transport mode
> > > >   NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > > >   ntb_netdev: Multi-queue support
> > > >   NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > > >   iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > > >   iommu: ipmmu-vmsa: Add support for reserved regions
> > > >   arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > > >     eDMA
> > > >   NTB: epf: Add an additional memory window (MW2) barno mapping on
> > > >     Renesas R-Car
> > > >
> > > >  arch/arm64/boot/dts/renesas/Makefile          |    2 +
> > > >  .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   46 +
> > > >  .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
> > > >  drivers/dma/dw-edma/dw-edma-core.c            |   28 +-
> > > >  drivers/iommu/ipmmu-vmsa.c                    |    7 +-
> > > >  drivers/net/ntb_netdev.c                      |  341 ++-
> > > >  drivers/ntb/Kconfig                           |   11 +
> > > >  drivers/ntb/Makefile                          |    3 +
> > > >  drivers/ntb/hw/amd/ntb_hw_amd.c               |    6 +-
> > > >  drivers/ntb/hw/epf/ntb_hw_epf.c               |  177 +-
> > > >  drivers/ntb/hw/idt/ntb_hw_idt.c               |    3 +-
> > > >  drivers/ntb/hw/intel/ntb_hw_gen1.c            |    6 +-
> > > >  drivers/ntb/hw/intel/ntb_hw_gen1.h            |    2 +-
> > > >  drivers/ntb/hw/intel/ntb_hw_gen3.c            |    3 +-
> > > >  drivers/ntb/hw/intel/ntb_hw_gen4.c            |    6 +-
> > > >  drivers/ntb/hw/mscc/ntb_hw_switchtec.c        |    6 +-
> > > >  drivers/ntb/msi.c                             |    6 +-
> > > >  drivers/ntb/ntb_edma.c                        |  628 ++++++
> > > >  drivers/ntb/ntb_edma.h                        |  128 ++
> > > >  .../{ntb_transport.c => ntb_transport_core.c} | 1829 ++++++++++++++---
> > > >  drivers/ntb/test/ntb_perf.c                   |    4 +-
> > > >  drivers/ntb/test/ntb_tool.c                   |    6 +-
> > > >  .../pci/controller/dwc/pcie-designware-ep.c   |  287 ++-
> > > >  drivers/pci/controller/dwc/pcie-designware.h  |    7 +
> > > >  drivers/pci/endpoint/functions/pci-epf-vntb.c |  229 ++-
> > > >  drivers/pci/endpoint/pci-epc-core.c           |   44 +
> > > >  include/linux/ntb.h                           |   39 +-
> > > >  include/linux/ntb_transport.h                 |   21 +
> > > >  include/linux/pci-epc.h                       |   11 +
> > > >  29 files changed, 3415 insertions(+), 523 deletions(-)
> > > >  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > > >  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > > >  create mode 100644 drivers/ntb/ntb_edma.c
> > > >  create mode 100644 drivers/ntb/ntb_edma.h
> > > >  rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (59%)
> > > >
> > > > --
> > > > 2.48.1
> > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ