lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2abca2ae-d53a-4324-b74f-0f189b41f1ae@infradead.org>
Date: Sat, 10 May 2025 19:31:17 -0700
From: Randy Dunlap <rdunlap@...radead.org>
To: Gregory Price <gourry@...rry.net>, linux-cxl@...r.kernel.org
Cc: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, dave@...olabs.net, jonathan.cameron@...wei.com,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 ira.weiny@...el.com, dan.j.williams@...el.com, corbet@....net
Subject: Re: [RFC PATCH v2 14/18] cxl: docs/allocation/reclaim



On 4/30/25 11:10 AM, Gregory Price wrote:
> Document a bit about how reclaim interacts with various CXL
> configurations.
> 
> Signed-off-by: Gregory Price <gourry@...rry.net>
> ---
>  .../driver-api/cxl/allocation/reclaim.rst     | 51 +++++++++++++++++++
>  Documentation/driver-api/cxl/index.rst        |  1 +
>  2 files changed, 52 insertions(+)
>  create mode 100644 Documentation/driver-api/cxl/allocation/reclaim.rst
> 
> diff --git a/Documentation/driver-api/cxl/allocation/reclaim.rst b/Documentation/driver-api/cxl/allocation/reclaim.rst
> new file mode 100644
> index 000000000000..f37c8b1cc3bd
> --- /dev/null
> +++ b/Documentation/driver-api/cxl/allocation/reclaim.rst
> @@ -0,0 +1,51 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=======
> +Reclaim
> +=======
> +Another way CXL memory can be utilized *indirectly* is via the reclaim system
> +in :code:`mm/vmscan.c`.  Reclaim is engaged when memory capacity on the system
> +becomes pressured based on global and cgroup-local `watermark` settings.
> +
> +In this section we won't discuss the `watermark` configurations, just how CXL
> +memory can be consumed by various pieces of reclaim system.
> +
> +Demotion
> +========
> +By default, the reclaim system will prefer swap (or zswap) when reclaiming
> +memory.  Enabling :code:`kernel/mm/numa/demotion_enabled` will cause vmscan
> +to opportunistically prefer distant NUMA nodes to swap or zswap, if capacity
> +is available.
> +
> +Demotion engages the :code:`mm/memory_tier.c` component to determine the
> +next demotion node.  The next demotion node is based on the :code:`HMAT`
> +or :code:`CDAT` performance data.
> +
> +cpusets.mems_allowed quirk
> +--------------------------
> +In Linux v6.15 and below, demotion does not respect :code:`cpusets.mems_allowed`
> +when migrating pages.  As a result, if demotion is enabled, vmscan cannot
> +guarantee isolation of a container's memory from nodes not set in mems_allowed.
> +
> +In Linux v6.XX and up, demotion does attempt to respect
> +:code:`cpusets.mems_allowed`; however, certain classes of shared memory
> +originally instantiated by another cgroup (such as common libraries - e.g.
> +libc) may still be demoted.  As a result, the mems_allowed interface still
> +cannot provide perfect isolation from the remote nodes.
> +
> +ZSwap and Node Preference
> +=========================
> +In Linux v6.15 and below, ZSwap allocates memory from the local node of the
> +processor for the new pages being compressed.  Since pages being compressed
> +are typically cold, the result is a cold page becomes promoted - only to
> +be later demoted as it ages off the LRU.
> +
> +In Linux v6.XX, ZSwap tries to prefer the node of the page being compressed
> +as the allocation target for the compression page.  This helps prevernt

                                                                  prevent

> +thrashing.
> +
> +Demotion with ZSwap
> +===================
> +When enabling both Demotion and ZSwap, you create a situation where ZSwap
> +will prefer the slowest form of CXL memory by default until that tier of
> +memory is exausted.

             exhausted.


-- 
~Randy


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ