lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260118135440.1958279-1-den@valinux.co.jp>
Date: Sun, 18 Jan 2026 22:54:02 +0900
From: Koichiro Den <den@...inux.co.jp>
To: Frank.Li@....com,
	dave.jiang@...el.com,
	cassel@...nel.org,
	mani@...nel.org,
	kwilczynski@...nel.org,
	kishon@...nel.org,
	bhelgaas@...gle.com,
	geert+renesas@...der.be,
	robh@...nel.org,
	vkoul@...nel.org,
	jdmason@...zu.us,
	allenbh@...il.com,
	jingoohan1@...il.com,
	lpieralisi@...nel.org
Cc: linux-pci@...r.kernel.org,
	linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-renesas-soc@...r.kernel.org,
	devicetree@...r.kernel.org,
	dmaengine@...r.kernel.org,
	iommu@...ts.linux.dev,
	ntb@...ts.linux.dev,
	netdev@...r.kernel.org,
	linux-kselftest@...r.kernel.org,
	arnd@...db.de,
	gregkh@...uxfoundation.org,
	joro@...tes.org,
	will@...nel.org,
	robin.murphy@....com,
	magnus.damm@...il.com,
	krzk+dt@...nel.org,
	conor+dt@...nel.org,
	corbet@....net,
	skhan@...uxfoundation.org,
	andriy.shevchenko@...ux.intel.com,
	jbrunet@...libre.com,
	utkarsh02t@...il.com
Subject: [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA

Hi,

This is RFC v4 of the NTB/PCI/dmaengine series that introduces an
optional NTB transport variant where payload data is moved by a PCI
embedded-DMA engine (eDMA) residing on the endpoint side.

The primary target is Synopsys DesignWare PCIe endpoint controllers that
integrate a DesignWare eDMA instance (dw-edma). In the remote
embedded-DMA mode, payload is transferred by DMA directly between the
two systems' memory, and NTB Memory Windows are used primarily for
control/metadata and for exposing the endpoint eDMA resources (register
window + linked-list rings) to the host.

Compared to the existing cpu/dma memcpy-based implementation, this
approach avoids window-backed payload rings and the associated extra
copies, and it is less sensitive to scarce MW space. This also enables
scaling out to multiple queue pairs, which is particularly beneficial
for ntb_netdev. On R-Car S4, preliminary iperf3 results show 10~20x
throughput improvement. Latency improvements are also observed.

RFC history:
  RFC v3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
  RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
  RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/

Parts of RFC v3 series have already been split out and posted separately
(see "Kernel base / dependencies" section below). However, feedback on
the remaining parts led to substantial restructuring and code changes,
so I am sending an RFC v4 as a refreshed version of the full series.

RFC v4 is still a large, cross-subsystem series. At this RFC stage,
I am sending the full picture in a single set to make it easier to
review the overall direction and architecture. Once the direction is
agreed upon and no further large restructuring appears necessary, I will stop
posting the new RFC-tagged revisions and continue development on
separate threads, split by sub-topic.

Many thanks for all the reviews and feedback from multiple perspectives.


Software architecture overview (RFC v4)
=======================================

A major change in RFC v4 is the software layering and module split.

The existing memcpy-based transport and the new remote embedded-DMA
transport are implemented as two independent NTB client drivers on top
of a shared core library:

                       +--------------------+
                       | ntb_transport_core |
                       +--------------------+
                           ^            ^
                           |            |
        ntb_transport -----+            +----- ntb_transport_edma
       (cpu/dma memcpy)                   (remote embedded DMA transfer)
                                                       |
                                                       v
                                                 +-----------+
                                                 |  ntb_edma |
                                                 +-----------+
                                                       ^
                                                       |
                                               +----------------+
                                               |                |
                                          ntb_dw_edma         [...]

Key points:
  * ntb_transport_core provides the queue-pair abstraction used by upper
    layer clients (e.g. ntb_netdev).
  * ntb_transport is the legacy shared-memory transport client (CPU/DMA
    memcpy).
  * ntb_transport_edma is the remote embedded-DMA transport client.
  * ntb_transport_edma relies on an ntb_edma backend registry.
    This RFC provides an initial DesignWare backend (ntb_dw_edma).
  * Transport selection is per-NTB device via the standard
    driver_override mechanism. To enable that, this RFC adds
    driver_override support to ntb_bus. This allows mixing transports
    across multiple NTB ports and provides an explicit fallback path to
    the legacy transport.

So, if ntb_transport / ntb_transport_edma are built as loadable modules,
you can just run modprobe ntb_transport as before and the original cpu/dma
memcpy-based implementation will be active. If they are built-in, whether
ntb_transport or ntb_transport_edma are bound by default depends on
initcall order. Regarding how to switch the driver, please see Patch 34
("Documentation: driver-api: ntb: Document remote embedded-DMA transport")
for details.


Data flow overview (remote embedded-DMA transport)
==================================================

At a high level:
  * One MW is reserved as an "eDMA window". The endpoint exposes the
    eDMA register block plus LL descriptor rings through that window, so
    the peer can ioremap it and drive DMA reads remotely.
  * Remaining MWs carry only small control-plane rings used to exchange
    buffer addresses and completion information.
  * For RC->EP traffic, the RC drives endpoint DMA read channels through
    the peer-visible eDMA window.
  * For EP->RC traffic, the endpoint uses its local DMA write channels.

The following figures illustrate the data flow when ntb_netdev sits on
top of the transport:

     Figure 1. RC->EP traffic via ntb_netdev + ntb_transport_edma
                   backed by ntb_edma/ntb_dw_edma

             EP                                   RC
          phys addr                            phys addr
            space                                space
             +-+                                  +-+
             | |                                  | |
             | |                ||                | |
             +-+-----.          ||                | |
    EDMA REG | |      \     [A] ||                | |
             +-+----.  '---+-+  ||                | |
             | |     \     | |<---------[0-a]----------
             +-+-----------| |<----------[2]----------.
     EDMA LL | |           | |  ||                | | :
             | |           | |  ||                | | :
             +-+-----------+-+  ||  [B]           | | :
             | |                ||  ++            | | :
          ---------[0-b]----------->||----------------'
             | |            ++  ||  ||            | |
             | |            ||  ||  ++            | |
             | |            ||<----------[4]-----------
             | |            ++  ||                | |
             | |           [C]  ||                | |
          .--|#|<------------------------[3]------|#|<-.
          :  |#|                ||                |#|  :
         [5] | |                ||                | | [1]
          :  | |                ||                | |  :
          '->|#|                                  |#|--'
             |#|                                  |#|
             | |                                  | |

     Figure 2. EP->RC traffic via ntb_netdev + ntb_transport_edma
                  backed by ntb_edma/ntb_dw_edma

             EP                                   RC
          phys addr                            phys addr
            space                                space
             +-+                                  +-+
             | |                                  | |
             | |                ||                | |
             +-+                ||                | |
    EDMA REG | |                ||                | |
             +-+                ||                | |
    ^        | |                ||                | |
    :        +-+                ||                | |
    : EDMA LL| |                ||                | |
    :        | |                ||                | |
    :        +-+                ||  [C]           | |
    :        | |                ||  ++            | |
    :     -----------[4]----------->||            | |
    :        | |            ++  ||  ||            | |
    :        | |            ||  ||  ++            | |
    '----------------[2]-----||<--------[0-b]-----------
             | |            ++  ||                | |
             | |           [B]  ||                | |
          .->|#|--------[3]---------------------->|#|--.
          :  |#|                ||                |#|  :
         [1] | |                ||                | | [5]
          :  | |                ||                | |  :
          '--|#|                                  |#|<-'
             |#|                                  |#|
             | |                                  | |

    0-a. configure remote embedded DMA (program endpoint DMA registers)
    0-b. DMA-map and publish destination address (DAR)
    1.   network stack builds skb (copy from application/user memory)
    2.   consume DAR, DMA-map source address (SAR) and kick DMA transfer
    3.   DMA transfer (payload moves between RC/EP memory)
    4.   consume completion (commit)
    5.   network stack delivers data to application/user memory

    [A]: Dedicated MW that aggregates DMA regs and LL (peer ioremaps it)
    [B]: Control-plane ring buffer for "produce"
    [C]: Control-plane ring buffer for "consume"


Kernel base / dependencies
==========================

This series is based on:

  - next-20260114 (commit b775e489bec7)

plus the following seven unmerged patch series or standalone patches:

  - [PATCH v4 0/7] PCI: endpoint/NTB: Harden vNTB resource management
    https://lore.kernel.org/all/20251202072348.2752371-1-den@valinux.co.jp/

  - [PATCH v2 0/2] NTB: ntb_transport: debugfs cleanups
    https://lore.kernel.org/all/20260107042458.1987818-1-den@valinux.co.jp/

  - [PATCH v3 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
    https://lore.kernel.org/all/20260105-dma_prep_config-v3-0-a8480362fd42@nxp.com/

  - [PATCH v8 0/5] PCI: endpoint: BAR subrange mapping support
    https://lore.kernel.org/all/20260115084928.55701-1-den@valinux.co.jp/

  - [PATCH] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
    https://lore.kernel.org/all/20260105075606.1253697-1-den@valinux.co.jp/

  - [PATCH] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts
    https://lore.kernel.org/all/20260105075904.1254012-1-den@valinux.co.jp/

  - [PATCH v2 01/11] dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
    https://lore.kernel.org/imx/20260109-edma_ll-v2-1-5c0b27b2c664@nxp.com/
    (only this single commit is cherry-picked from the series)


Patch layout
============

  1. dw-edma / DesignWare EP helpers needed for remote embedded-DMA (export
     register/LL windows, IRQ routing control, etc.)

     Patch 01 : dmaengine: dw-edma: Export helper to get integrated register window
     Patch 02 : dmaengine: dw-edma: Add per-channel interrupt routing control
     Patch 03 : dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
     Patch 04 : dmaengine: dw-edma: Add notify-only channels support
     Patch 05 : dmaengine: dw-edma: Add a helper to query linked-list region

  2. NTB EPF/core + vNTB prep (mwN_offset + versioning, MSI vector
     management, new ntb_dev_ops helpers, driver_override, vntb glue)

     Patch 06 : NTB: epf: Add mwN_offset support and config region versioning
     Patch 07 : NTB: epf: Reserve a subset of MSI vectors for non-NTB users
     Patch 08 : NTB: epf: Provide db_vector_count/db_vector_mask callbacks
     Patch 09 : NTB: core: Add mw_set_trans_ranges() for subrange programming
     Patch 10 : NTB: core: Add .get_private_data() to ntb_dev_ops
     Patch 11 : NTB: core: Add .get_dma_dev() to ntb_dev_ops
     Patch 12 : NTB: core: Add driver_override support for NTB devices
     Patch 13 : PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
     Patch 14 : PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
     Patch 15 : PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()

  3. ntb_transport refactor/modularization and backend infrastructure

     Patch 16 : NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
     Patch 17 : NTB: ntb_transport: Dynamically determine qp count
     Patch 18 : NTB: ntb_transport: Use ntb_get_dma_dev()
     Patch 19 : NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
     Patch 20 : NTB: ntb_transport: Move internal types to ntb_transport_internal.h
     Patch 21 : NTB: ntb_transport: Export common helpers for modularization
     Patch 22 : NTB: ntb_transport: Split core library and default NTB client
     Patch 23 : NTB: ntb_transport: Add transport backend infrastructure
     Patch 24 : NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation

  4. ntb_edma backend registry + DesignWare backend + transport client

     Patch 25 : NTB: hw: Add remote eDMA backend registry and DesignWare backend
     Patch 26 : NTB: ntb_transport: Add remote embedded-DMA transport client

  5. ntb_netdev multi-queue support

     Patch 27 : ntb_netdev: Multi-queue support

  6. Renesas R-Car S4 enablement (IOMMU, DTs, quirks)

     Patch 28 : iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
     Patch 29 : iommu: ipmmu-vmsa: Add support for reserved regions
     Patch 30 : arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
     Patch 31 : NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
     Patch 32 : NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car

  7. Documentation updates

     Patch 33 : Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
     Patch 34 : Documentation: driver-api: ntb: Document remote embedded-DMA transport

  8. pci-epf-test / pci_endpoint_test / kselftest coverage for remote eDMA

     Patch 35 : PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
     Patch 36 : PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
     Patch 37 : misc: pci_endpoint_test: Add remote eDMA transfer test mode
     Patch 38 : selftests: pci_endpoint: Add remote eDMA transfer coverage


Tested on
=========

* 2x Renesas R-Car S4 Spider (RC<->EP connected with OCuLink cable)
* Kernel base as described above


Performance notes
=================

The primary motivation remains improving throughput/latency for ntb_transport
users (typically ntb_netdev). On R-Car S4, the earlier prototype (RFC v3)
showed roughly 10-20x throughput improvement in preliminary iperf3 tests and
lower ping RTT. I have not yet re-measured after the v4 refactor and
module split.


Changelog
=========

RFCv3->RFCv4 changes:
  - Major refactor of the transport layering:
    - Introduce ntb_transport_core as a shared library module.
    - Split the legacy shared-memory transport client (ntb_transport) and the
      remote embedded-DMA transport client (ntb_transport_edma).
    - Add driver_override support for ntb_bus and use it for per-port transport
      selection.
  - Introduce a vendor-agnostic remote embedded-DMA backend registry (ntb_edma)
    and add the initial DesignWare backend (ntb_dw_edma).
  - Rebase to next-20260114 and move several prerequisite/fixup patchsets into
    separate threads (listed above), including BAR subrange mapping support and
    dw-edma fixes.
  - Add PCI endpoint test coverage for the remote embedded-DMA path:
    - extend pci-epf-test / pci_endpoint_test
    - add a kselftest variant to exercise remote-eDMA transfers
    Note: to keep the changes as small as possible, I added a few #ifdefs
    in the main test code. Feedback on whether/how/to what extent this
    should be split into separate modules would be appreciated.
  - Expand documentation (Documentation/driver-api/ntb.rst) to describe transport
    variants, the new module structure, and the remote embedded-DMA data flow.
  - Addressed other feedbacks from the RFC v3 thread.

RFCv2->RFCv3 changes:
  - Architecture
    - Have EP side use its local write channels, while leaving RC side to
      use remote read channels.
    - Abstraction/HW-specific stuff encapsulation improved.
  - Added control/config region versioning for the vNTB/EPF control region
    so that mismatched RC/EP kernels fail early instead of silently using an
    incompatible layout.
  - Reworked BAR subrange / multi-region mapping support:
    - Dropped the v2 approach that added new inbound mapping ops in the EPC
      core.
    - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
      support BAR subrange inbound mapping via Address Match Mode IB iATU.
    - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
      when offsets are used.
  - Changed .get_pci_epc() to .get_private_data()
  - Dropped two commits from RFC v2 that should be submitted separately:
    (1) ntb_transport debugfs seq_file conversion
    (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
  - Added documentation updates.
  - Addressed assorted review nits from the RFC v2 thread (naming/structure).

RFCv1->RFCv2 changes:
  - Architecture
    - Drop the generic interrupt backend + DW eDMA test-interrupt backend
      approach and instead adopt the remote eDMA-backed ntb_transport mode
      proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
      mapping (Address Match Mode) infrastructure from RFC v1 is largely
      kept, with only minor refinements and code motion where necessary
      to fit the new transport-mode design.
  - For Patch 01
    - Rework the array_index_nospec() conversion to address review
      comments on "[RFC PATCH 01/25]".

RFCv3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/

Thank you for reviewing,


Koichiro Den (38):
  dmaengine: dw-edma: Export helper to get integrated register window
  dmaengine: dw-edma: Add per-channel interrupt routing control
  dmaengine: dw-edma: Poll completion when local IRQ handling is
    disabled
  dmaengine: dw-edma: Add notify-only channels support
  dmaengine: dw-edma: Add a helper to query linked-list region
  NTB: epf: Add mwN_offset support and config region versioning
  NTB: epf: Reserve a subset of MSI vectors for non-NTB users
  NTB: epf: Provide db_vector_count/db_vector_mask callbacks
  NTB: core: Add mw_set_trans_ranges() for subrange programming
  NTB: core: Add .get_private_data() to ntb_dev_ops
  NTB: core: Add .get_dma_dev() to ntb_dev_ops
  NTB: core: Add driver_override support for NTB devices
  PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
  PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
  PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
  NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
  NTB: ntb_transport: Dynamically determine qp count
  NTB: ntb_transport: Use ntb_get_dma_dev()
  NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
  NTB: ntb_transport: Move internal types to ntb_transport_internal.h
  NTB: ntb_transport: Export common helpers for modularization
  NTB: ntb_transport: Split core library and default NTB client
  NTB: ntb_transport: Add transport backend infrastructure
  NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
  NTB: hw: Add remote eDMA backend registry and DesignWare backend
  NTB: ntb_transport: Add remote embedded-DMA transport client
  ntb_netdev: Multi-queue support
  iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
  iommu: ipmmu-vmsa: Add support for reserved regions
  arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
    eDMA
  NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
  NTB: epf: Add an additional memory window (MW2) barno mapping on
    Renesas R-Car
  Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
    usage
  Documentation: driver-api: ntb: Document remote embedded-DMA transport
  PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
  PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
  misc: pci_endpoint_test: Add remote eDMA transfer test mode
  selftests: pci_endpoint: Add remote eDMA transfer coverage

 Documentation/PCI/endpoint/pci-vntb-howto.rst |   19 +-
 Documentation/driver-api/ntb.rst              |  193 ++
 arch/arm64/boot/dts/renesas/Makefile          |    2 +
 .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   37 +
 .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
 drivers/dma/dw-edma/dw-edma-core.c            |  207 +-
 drivers/dma/dw-edma/dw-edma-core.h            |   10 +
 drivers/dma/dw-edma/dw-edma-v0-core.c         |   26 +-
 drivers/iommu/ipmmu-vmsa.c                    |    7 +-
 drivers/misc/pci_endpoint_test.c              |  633 +++++
 drivers/net/ntb_netdev.c                      |  341 ++-
 drivers/ntb/Kconfig                           |   13 +
 drivers/ntb/Makefile                          |    2 +
 drivers/ntb/core.c                            |   68 +
 drivers/ntb/hw/Kconfig                        |    1 +
 drivers/ntb/hw/Makefile                       |    1 +
 drivers/ntb/hw/edma/Kconfig                   |   28 +
 drivers/ntb/hw/edma/Makefile                  |    5 +
 drivers/ntb/hw/edma/backend.c                 |   87 +
 drivers/ntb/hw/edma/backend.h                 |  102 +
 drivers/ntb/hw/edma/ntb_dw_edma.c             |  977 +++++++
 drivers/ntb/hw/epf/ntb_hw_epf.c               |  199 +-
 drivers/ntb/ntb_transport.c                   | 2458 +---------------
 drivers/ntb/ntb_transport_core.c              | 2523 +++++++++++++++++
 drivers/ntb/ntb_transport_edma.c              | 1110 ++++++++
 drivers/ntb/ntb_transport_internal.h          |  261 ++
 drivers/pci/controller/dwc/pcie-designware.c  |   26 +
 drivers/pci/endpoint/functions/pci-epf-test.c |  497 +++-
 drivers/pci/endpoint/functions/pci-epf-vntb.c |  380 ++-
 include/linux/dma/edma.h                      |  106 +
 include/linux/ntb.h                           |   88 +
 include/uapi/linux/pcitest.h                  |    3 +-
 .../pci_endpoint/pci_endpoint_test.c          |   17 +
 33 files changed, 7855 insertions(+), 2624 deletions(-)
 create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
 create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
 create mode 100644 drivers/ntb/hw/edma/Kconfig
 create mode 100644 drivers/ntb/hw/edma/Makefile
 create mode 100644 drivers/ntb/hw/edma/backend.c
 create mode 100644 drivers/ntb/hw/edma/backend.h
 create mode 100644 drivers/ntb/hw/edma/ntb_dw_edma.c
 create mode 100644 drivers/ntb/ntb_transport_core.c
 create mode 100644 drivers/ntb/ntb_transport_edma.c
 create mode 100644 drivers/ntb/ntb_transport_internal.h

-- 
2.51.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ