lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f958920-4cc3-42aa-9553-74b3b0a96751@infradead.org>
Date: Sat, 10 May 2025 19:28:37 -0700
From: Randy Dunlap <rdunlap@...radead.org>
To: Gregory Price <gourry@...rry.net>, linux-cxl@...r.kernel.org
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, dave@...olabs.net, jonathan.cameron@...wei.com,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 ira.weiny@...el.com, dan.j.williams@...el.com, corbet@....net
Subject: Re: [RFC PATCH v2 13/18] cxl: docs/allocation/page-allocator



On 4/30/25 11:10 AM, Gregory Price wrote:
> Document some interesting interactions that occur when exposing CXL
> memory capacity to page allocator.
> 
> Signed-off-by: Gregory Price <gourry@...rry.net>
> ---
>  .../cxl/allocation/page-allocator.rst         | 86 +++++++++++++++++++
>  Documentation/driver-api/cxl/index.rst        |  1 +
>  2 files changed, 87 insertions(+)
>  create mode 100644 Documentation/driver-api/cxl/allocation/page-allocator.rst
> 
> diff --git a/Documentation/driver-api/cxl/allocation/page-allocator.rst b/Documentation/driver-api/cxl/allocation/page-allocator.rst
> new file mode 100644
> index 000000000000..f5b21d3eb63f
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/allocation/page-allocator.rst
> @@ -0,0 +1,86 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==================
> +The Page Allocator
> +==================
> +
> +The kernel page allocator services all general page allocation requests, such
> +as :code:`kmalloc`.  CXL configuration steps affect the behavior of the page
> +allocator based on the selected `Memory Zone` and `NUMA node` the capacity is
> +placed in.
> +
> +This section mostly focuses on how these configurations affect the page
> +allocator (as of Linux v6.15) rather than the overall page allocator behavior.
> +
> +NUMA nodes and mempolicy
> +========================
> +Unless a task explicitly registers a mempolicy, the default memory policy
> +of the linux kernel is to allocate memory from the `local NUMA node` first,
> +and fall back to other nodes only if the local node is pressured.
> +
> +Generally, we expect to see local DRAM and CXL memory on separate NUMA nodes,
> +with the CXL memory being non-local.  Technically, however, it is possible
> +for a compute node to have no local DRAM, and for CXL memory to be the
> +`local` capacity for that compute node.
> +
> +
> +Memory Zones
> +============
> +CXL capacity may be onlined in :code:`ZONE_NORMAL` or :code:`ZONE_MOVABLE`.
> +
> +As of v6.15, the page allocator attempts to allocate from the highest
> +available and compatible ZONE for an allocation from the local node first.
> +
> +An example of a `zone incompatibility` is attempting to service an allocation
> +marked :code:`GFP_KERNEL` from :code:`ZONE_MOVABLE`.  Kernel allocations are
> +typically not migratable, and as a result can only be serviced from
> +:code:`ZONE_NORMAL` or lower.
> +
> +To simplify this, the page allocator will prefer :code:`ZONE_MOVABLE` over
> +:code:`ZONE_NORMAL` by default, but if :code:`ZONE_MOVABLE` is depleted, it
> +will fallback to allocate from :code:`ZONE_NORMAL`.
> +
> +
> +Zone and Node Quirks
> +====================
> +Lets consider a configuration where the local DRAM capacity is largely onlined

   Let's

> +into :code:`ZONE_NORMAL`, with no :code:`ZONE_MOVABLE` capacity present. The
> +CXL capacity has the opposite configuration - all onlined in
> +:code:`ZONE_MOVABLE`.
> +
> +Under the default allocation policy, the page allocator will completely skip
> +:code:`ZONE_MOVABLE` has a valid allocation target.  This is because, as of

                        as

> +Linux v6.15, the page allocator does approximately the following: ::
> +
> +  for (each zone in local_node):
> +
> +    for (each node in fallback_order):
> +
> +      attempt_allocation(gfp_flags);
> +
> +Because the local node does not have :code:`ZONE_MOVABLE`, the CXL node is
> +functionally unreachable for direct allocation.  As a result, the only way
> +for CXL capacity to be used is via `demotion` in the reclaim path.
> +
> +This configuration also means that if the DRAM ndoe has :code:`ZONE_MOVABLE`
> +capacity - when that capacity is depleted, the page allocator will actually
> +prefer CXL :code:`ZONE_MOVABLE` pages over DRAM :code:`ZONE_NORMAL` pages.
> +
> +We may wish to invert these configurations in future Linux versions.
> +
> +If `demotion` and `swap` are disabled, Linux will begin to cause OOM crashes
> +when the DRAM nodes are depleted. This will be covered amore in depth in the

                                                          more

> +reclaim section.
> +
> +
> +CGroups and CPUSets
> +===================
> +Finally, assuming CXL memory is reachable via the page allocation (i.e. onlined
> +in :code:`ZONE_NORMAL`), the :code:`cpusets.mems_allowed` may be used by
> +containers to limit the accessibility of certain NUMA nodes for tasks in that
> +container.  Users may wish to utilize this in multi-tenant systems where some
> +tasks prefer not to use slower memory.
> +
> +In the reclaim section we'll discuss some limitations of this interface to
> +prevent demotions of shared data to CXL memory (if demotions are enabled).
> +

>  .. only::  subproject and html

-- 
~Randy


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ