lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43d957e0-f52b-4ba8-aa87-cfb8472b8b67@infradead.org>
Date: Sat, 10 May 2025 19:18:27 -0700
From: Randy Dunlap <rdunlap@...radead.org>
To: Gregory Price <gourry@...rry.net>, linux-cxl@...r.kernel.org
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, dave@...olabs.net, jonathan.cameron@...wei.com,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 ira.weiny@...el.com, dan.j.williams@...el.com, corbet@....net
Subject: Re: [RFC PATCH v2 10/18] cxl: docs/linux/dax-driver documentation



On 4/30/25 11:10 AM, Gregory Price wrote:
> Add documentation on how the CXL driver interacts with the DAX driver.
> 
> Signed-off-by: Gregory Price <gourry@...rry.net>
> ---
>  Documentation/driver-api/cxl/index.rst        |   1 +
>  .../driver-api/cxl/linux/cxl-driver.rst       | 115 ++++++++++++++++--
>  .../driver-api/cxl/linux/dax-driver.rst       |  43 +++++++
>  3 files changed, 149 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/driver-api/cxl/linux/dax-driver.rst
> 

> diff --git a/Documentation/driver-api/cxl/linux/cxl-driver.rst b/Documentation/driver-api/cxl/linux/cxl-driver.rst
> index 486baf8551aa..1a354ea1cda4 100644
> --- a/Documentation/driver-api/cxl/linux/cxl-driver.rst
> +++ b/Documentation/driver-api/cxl/linux/cxl-driver.rst
> @@ -34,6 +34,32 @@ into a single memory region. The memory region has been converted to dax. ::
>      decoder1.0   decoder5.0  endpoint5   port1  region0
>      decoder2.0   decoder5.1  endpoint6   port2  root0
>  
> +
> +.. kernel-render:: DOT
> +   :alt: Digraph of CXL fabric describing host-bridge interleaving
> +   :caption: Diagraph of CXL fabric with a host-bridge interleave memory region
> +
> +   digraph foo {
> +     "root0" -> "port1";
> +     "root0" -> "port3";
> +     "root0" -> "decoder0.0";
> +     "port1" -> "endpoint5";
> +     "port3" -> "endpoint6";
> +     "port1" -> "decoder1.0";
> +     "port3" -> "decoder3.0";
> +     "endpoint5" -> "decoder5.0";
> +     "endpoint6" -> "decoder6.0";
> +     "decoder0.0" -> "region0";
> +     "decoder0.0" -> "decoder1.0";
> +     "decoder0.0" -> "decoder3.0";
> +     "decoder1.0" -> "decoder5.0";
> +     "decoder3.0" -> "decoder6.0";
> +     "decoder5.0" -> "region0";
> +     "decoder6.0" -> "region0";
> +     "region0" -> "dax_region0";
> +     "dax_region0" -> "dax0.0";
> +   }
> +
>  For this section we'll explore the devices present in this configuration, but
>  we'll explore more configurations in-depth in example configurations below.
>  
> @@ -41,7 +67,7 @@ Base Devices
>  ------------
>  Most devices in a CXL fabric are a `port` of some kind (because each
>  device mostly routes request from one device to the next, rather than
> -provide a manageable service).
> +provide a direct service).
>  
>  Root
>  ~~~~
> @@ -53,6 +79,8 @@ The Root contains links to:
>  
>  * `Host Bridge Ports` defined by ACPI CEDT CHBS.
>  
> +* `Downstream Ports` typically connected to `Host Bridge Ports`

Add ending '.' for consistency.

> +
>  * `Root Decoders` defined by ACPI CEDT CFMWS.
>  
>  ::
> @@ -150,6 +178,27 @@ device configuration data. ::
>      driver    label_storage_size  pmem         serial
>      firmware  numa_node           ram          subsystem
>  
> +A Memory Device is a discrete base object that is not a port.  While it the
> +physical device it belongs to may host an `endpoint`, this relationship is

I have some parsing trouble with the sentence above. Maybe s/it the/the/.

> +not captured in sysfs.
> +
> +Port Relationships
> +~~~~~~~~~~~~~~~~~~
> +In our example described above, there are four host bridges attached to the
> +root, and two of the host bridges have one endpoint attached.
> +
> +.. kernel-render:: DOT
> +   :alt: Digraph of CXL fabric describing host-bridge interleaving
> +   :caption: Diagraph of CXL fabric with a host-bridge interleave memory region
> +
> +   digraph foo {
> +     "root0"    -> "port1";
> +     "root0"    -> "port2";
> +     "root0"    -> "port3";
> +     "root0"    -> "port4";
> +     "port1" -> "endpoint5";
> +     "port3" -> "endpoint6";
> +   }
>  
>  Decoders
>  --------
> @@ -322,6 +371,29 @@ settings (granularity and ways must be the same).
>  Endpoint decoders are created during :code:`cxl_endpoint_port_probe` in the
>  :code:`cxl_port` driver, and is created based on a PCI device's DVSEC registers.
>  
> +Decoder Relationships
> +~~~~~~~~~~~~~~~~~~~~~
> +In our example described above, there is one root decoder which routes memory
> +accesses over two host bridges.  Each host bridge has a decoder which routes
> +access to their singular endpoint targets.  Each endpoint has an decoder which

                                                                 a decoder

> +translates HPA to DPA and services the memory request.
> +
> +The driver validates relationships between ports by decoder programming, so
> +we can think of decoders being related in a similarly hierarchical fashion to
> +ports.
> +
> +.. kernel-render:: DOT
> +   :alt: Digraph of hierarchical relationship between root, switch, and endpoint decoders.
> +   :caption: Diagraph of CXL root, switch, and endpoint decoders.
> +
> +   digraph foo {
> +     "root0"    -> "decoder0.0";
> +     "decoder0.0" -> "decoder1.0";
> +     "decoder0.0" -> "decoder3.0";
> +     "decoder1.0" -> "decoder5.0";
> +     "decoder3.0" -> "decoder6.0";
> +   }
> +
>  Regions
>  -------
>  
> @@ -348,6 +420,17 @@ The interleave settings in a `Memory Region` describe the configuration of the
>  `Interleave Set` - and are what can be expected to be seen in the endpoint
>  interleave settings.
>  
> +.. kernel-render:: DOT
> +   :alt: Digraph of CXL memory region relationships between root and endpoint decoders.
> +   :caption: Regions are created based on root decoder configurations. Endpoint decoders
> +             must be programmed with the same interleave settings as the region.
> +
> +   digraph foo {
> +     "root0"    -> "decoder0.0";
> +     "decoder0.0" -> "region0";
> +     "region0" -> "decoder5.0";
> +     "region0" -> "decoder6.0";
> +   }
>  
>  DAX Region
>  ~~~~~~~~~~
> @@ -360,7 +443,6 @@ for more details. ::
>      dax0.0      devtype  modalias   uevent
>      dax_region  driver   subsystem
>  
> -
>  Mailbox Interfaces
>  ------------------
>  A mailbox command interface for each device is exposed in ::
> @@ -418,17 +500,30 @@ the relationships between a decoder and it's parent.
>  
>  For example, in a `Cross-Link First` interleave setup with 16 endpoints
>  attached to 4 host bridges, linux expects the following ways/granularity
> -across the root, host bridge, and endpoints respectively. ::
> +across the root, host bridge, and endpoints respectively.
> +
> +.. flat-table:: 4x4 cross-link first interleave settings
> +
> +  * - decoder
> +    - ways
> +    - granularity
>  
> -                   ways   granularity
> -  root              4        256
> -  host bridge       4       1024
> -  endpoint         16        256
> +  * - root
> +    - 4
> +    - 256
> +
> +  * - host bridge
> +    - 4
> +    - 1024
> +
> +  * - endpoint
> +    - 16
> +    - 256
>  
>  At the root, every a given access will be routed to the
>  :code:`((HPA / 256) % 4)th` target host bridge. Within a host bridge, every
> -:code:`((HPA / 1024) % 4)th` target endpoint.  Each endpoint will translate
> -the access based on the entire 16 device interleave set.
> +:code:`((HPA / 1024) % 4)th` target endpoint.  Each endpoint translates based
> +on the entire 16 device interleave set.
>  
>  Unbalanced interleave sets are not supported - decoders at a similar point
>  in the hierarchy (e.g. all host bridge decoders) must have the same ways and
> @@ -467,7 +562,7 @@ In this example, the CFMWS defines two discrete non-interleaved 4GB regions
>  for each host bridge, and one interleaved 8GB region that targets both. This
>  would result in 3 root decoders presenting in the root. ::
>  
> -  # ls /sys/bus/cxl/devices/root0
> +  # ls /sys/bus/cxl/devices/root0/decoder*
>      decoder0.0  decoder0.1  decoder0.2
>  
>    # cat /sys/bus/cxl/devices/decoder0.0/target_list start size
> diff --git a/Documentation/driver-api/cxl/linux/dax-driver.rst b/Documentation/driver-api/cxl/linux/dax-driver.rst
> new file mode 100644
> index 000000000000..5063d2b675b4
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/linux/dax-driver.rst
> @@ -0,0 +1,43 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +DAX Driver Operation
> +====================
> +The `Direct Access Device` driver was originally designed to provide a
> +memory-like access mechanism to memory-like block-devices.  It was
> +extended to support CXL Memory Devices, which provide user-configured
> +memory devices.
> +
> +The CXL subsystem depends on the DAX subsystem to generate either:

                                                  to either:

> +
> +- A file-like interface to userland via :code:`/dev/daxN.Y`, or

   - Generate a file-like interface ...

> +- Engaging the memory-hotplug interface to add CXL memory to page allocator.

   - Engage the ...

> +
> +The DAX subsystem exposes this ability through the `cxl_dax_region` driver.
> +A `dax_region` provides the translation between a CXL `memory_region` and
> +a `DAX Device`.
> +
> +DAX Device
> +==========
> +A `DAX Device` is a file-like interface exposed in :code:`/dev/daxN.Y`. A
> +memory region exposed via dax device can be accessed via userland software
> +via the :code:`mmap()` system-call.  The result is direct mappings to the
> +CXL capacity in the task's page tables.
> +
> +Users wishing to manually handle allocation of CXL memory should use this
> +interface.
> +
> +kmem conversion
> +===============
> +The :code:`dax_kmem` driver converts a `DAX Device` into a series of `hotplug
> +memory blocks` managed by :code:`kernel/memory-hotplug.c`.  This capacity
> +will be exposed to the kernel page allocator in the user-selected memory
> +zone.
> +
> +The :code:`memmap_on_memory` setting (both global and DAX device local) dictate

                                                                           dictates

> +where the kernell will allocate the :code:`struct folio` descriptors for this

             kernel

> +memory will come from.  If :code:`memmap_on_memory` is set, memory hotplug
> +will set aside a portion of the memory block capacity to allocate folios.  If
> +unset, the memory is allocated via a normal :code:`GFP_KERNEL` allocation -
> +and as a result will most likely land on the local NUM node of the cpu executing

s/cpu/CPU/ preferably.

> +the hotplug operation.

-- 
~Randy


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ