lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251208062943.68824-1-sj@kernel.org>
Date: Sun,  7 Dec 2025 22:29:04 -0800
From: SeongJae Park <sj@...nel.org>
To: 
Cc: SeongJae Park <sj@...nel.org>,
	"Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Hildenbrand <david@...nel.org>,
	Jann Horn <jannh@...gle.com>,
	Jonathan Corbet <corbet@....net>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...e.com>,
	Mike Rapoport <rppt@...nel.org>,
	Pedro Falcato <pfalcato@...e.de>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	damon@...ts.linux.dev,
	linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: [RFC PATCH v3 00/37] mm/damon: introduce per-CPUs/threads/write/read monitoring

Extend DAMON for monitoring accesses generated by given CPUs/threads
and/or for writes.  It is aimed to be used for general NUMA-aware page
migration, cache-aware scheduling and live migration target VM decision.
This lengthy patch series does that in three parts.

The first part extends DAMON API to let any kernel component report
their observed access events to DAMON.

The second part adds a hacky change to change_protection() and page
fault handler for reporting page faults on DAMON-specified sampling
pages to DAMON, using the report API that is implemented by the first
part.  Please read the "Hacks on NUMA Hinting Fault" section below for
my apology and clarification about why I'm doing this.

The third part extends DAMON on the page fault based sampling to allow
doing the monitoring for accesses generated by specific CPUs and/or
threads, or for writes.

Note that this RFC, especially the hack of page fault handler is not
aiming to be upstreamed as-is.  This RFC is shared rather for giving an
example of ideas that will be discussed in a session [0] of the special
purpose memory microconf at LPC'25, and a stable interface for early
testers.

Background
----------

Existing DAMON operations set implementations, namely paddr, vaddr, and
fvaddr, use Accessed bits of page tables as the main source of the
access information.  Accessed bits have some restrictions.  For example,
it cannot tell which CPU, GPU or thread made the access, whether the
access was read or write, and which part of the mapped entity was really
accessed.

Depending on the use case, the limitations can be problematic.  Because
the issue stems from the nature of the page table Accessed bit, utilizing
access information from different sources can mitigate the issue.  Page
faults, memory access instructions sampling interrupts, system calls, or
any information from other kernel space friends such as subsystems or
device drivers of CXL or GPUs could be examples of the different
sources.

DAMON separates its core and operation set layer for easy extensions.
The core layer handles high level work such as access information
sampling target setup and region-based overhead/accuracy control.  The
operation set layer executes the low level (sampling-purpose) access
information handling.  And DAMON API callers can implement and use their
operation set.  That is one of the ways to extend DAMON to use the
different sources.  The core layer features will still be available with
the new sources, without additional changes.

Nevertheless, the current interface between the core and the operation
set layers is optimized for the Accessed bits case.  Specifically, the
interface asks the operation set if a given part of memory has been
accessed or not in a given time period (last sampling interval).  It is
easy for the Accessed bit use case, since the information is stored in
page tables.  Operation set can simply read the current value of the
Accessed bit.

For some sources other than Accessed bits, such as page faults or
instruction sampling interrupts, the operation set may need to collect
and keep the access information in its internal memory until the core
layer asks the access information.  Only after answering the question,
the information could be dropped.

Implementing such operation set internal memory management would be not
very trivial.  Also it could end up multiple similar operation set
implementations having their own internal memory management code that is
unnecessarily duplicated.

Core Layer Changes for Reporting-based Monitoring
-------------------------------------------------

Optimize such possible duplicated efforts, by updating DAMON core layer
to support real time access reporting.  The updated interface allows
operations set implementations to report their information to the core
layer, on their preferred schedule.  DAMON core layer will handle the
reports by managing meta data and updating the final monitoring results
(DAMON regions) accordingly.

For flexible control of the reports from different access check
primitives (or, sources), add a new data structure to DAMON core API,
namely damon_sample_control.  The data structure can be used to
selectively using the low level access check primitives (e.g., page
table accessed bit and page fault events), and filtering generated
samples based on additional information on the samples, including
access-generator CPU and/or thread, and whether the access was for write
or read.

Hacks on NUMA Hinting Faults
----------------------------

Hack NUMA hinting faults code in change_protection() and page fault
handler, to make the first DAMON access reporter.  Update
change_protection() to install the NUMA hinting faults-purpose
protection on arbitrary pages, and do the protection install for
DAMON-desired access check sample pages.  Update NUMA hinting faults
handling code to report the information to DAMON, when NUMA balancing is
turned off.

This is never upstreamable design and implementation.  Actually concerns
about this were raised in the previous version of this series.
Unfortunately I had no time to address those.  As a result, this version
is not addressing any of the concerns.  Please forgive me for polluting
your inbox with this immature patch.  But please know that I'm not
ignoring the previous concerns.  I'm sharing it as-is though, to get
feedback on DAMON-side changes first.  I will establish discussions with
all stakeholders including NUMA balancing and MM core maintainers, after
the DAMON-side changes discussion is more progressed.

Per-CPUs/threads/write/read Monitoring
--------------------------------------

Extend the data structure for access check samples filtering,
damon_sample_control, for filtering reported data access sample results
based on the source CPUs/threads of the access, and whether the access
was for write.  Expose the damon_sample_control to DAMON sysfs
interface, so that DAMON ABI users can also utilize the features.

Expected Users: NUMA Page Migrations, VM Live Migration and Scheduling
----------------------------------------------------------------------

We have ongoing public/private discussions of expected use cases of this
patch series.  We expect the per-CPUs monitoring can be useful for
NUMA-aware page migrations.  AWS has shown their interest in using
write-only monitoring for finding the best live migration target VM.
Some folks showed interest in per-threads monitoring for L3 cache
utilization-aware threads scheduling.

Also I believe this can be extended for not only per-CPU but any access
entities including GPU-like accelerators, who expose their memory as
NUMA nodes in some setups.  With that, I think we could make a holistic
and efficient access-aware NUMA pages migration system.

Patches Sequence
----------------

The first twelve patches (patches 1-12) are for the first (extending
DAMON for reporting-based access monitoring) and second (adding the
gross hack for stealing NUMA_HINT_FUALT on page fault handling) parts.
As a result, it makes DAMON be able to do page fault events based
monitoring.

The following thirteen patches (patches 13-25) are for implementing
per-CPUs access monitoring.  It implements the framework for doing
access reports filtering based on additional information such as
access-origin CPU, and implements it for the CPU information.

The next seven patches (patches 26-32) are for implementing per-threads
access monitoring.  It extends the access reports filtering for the
threads based information.

Final five patches (patches 33-37) are for implementing read/write-only
monitoring.  It again extends the access reports filtering for the
purpose.

Plan for Dropping RFC
---------------------

This RFC is having pretty immature and dirty hacks.  This is never
upstreamable as-is.  I'm sharing this, though, for the following
reasons.

Firstly, to discuss the overall idea and DAMON-side design.  The idea
was floating around for a long time, and recently has been more specific
with 'damon_report_access() API plan [1] that discussed at LSFMMBPF'25.
We will also discuss this focusing on NUMA-aware page migration use
case, on special purpose memory management microconf at LPC'25.

Secondly, some people started testing the early version of the
implementation on my damon/next tree.  The implementation is hacky,
having only an experimental interface with no documentation at all.
This RFC is for giving a more stable interface and documentation to such
early testers.

I expect final upstreaming of this series will take a long time.  The
NUMA hinting fault part hack is the most challenging in my opinion.  And
this version is not addressing any concern about it that was raised to
the previous version.  Please know that I'm not ignoring the concerns,
but only having capacity limitations at the moment.  I will establish
discussions with all stakeholders including maintainers of NUMA
balancing and MM core, by LSFMMBPF'26.  Only after we make a good
alignment with all stakeholders, will this be able to be upstreamed.

Revision History
----------------

Changes from RFC v2
(https://lore.kernel.org/20250727201813.53858-1-sj@kernel.org)
- Use damon_sample_control instead of new ops (paddr_fault)
- Implement per-CPUs,threads, write-only monitoring.

Changes from RFC v1
(https://lore.kernel.org/20250629201443.52569-1-sj@kernel.org)
- Fixup report reading logic for access absence accounting.
- Implement page faults based operations set (paddr_fault).

[0] https://lpc.events/event/19/contributions/2066/
[1] https://lwn.net/Articles/1016525/

SeongJae Park (37):
  mm/damon/core: implement damon_report_access()
  mm/damon: define struct damon_sample_control
  mm/damon/core: commit damon_sample_control
  mm/damon/core: implement damon_report_page_fault()
  mm/{mprotect,memory}: (no upstream-aimed hack) implement MM_CP_DAMON
  mm/damon/paddr: support page fault access check primitive
  mm/damon/core: apply access reports to high level snapshot
  mm/damon/sysfs: implement monitoring_attrs/sample/ dir
  mm/damon/sysfs: implement sample/primitives/ dir
  mm/damon/sysfs: connect primitives directory with core
  Docs/mm/damon/design: document page fault sampling primitive
  Docs/admin-guide/mm/damon/usage: document sample primitives dir
  mm/damon: extend damon_access_report for origin CPU reporting
  mm/damon/core: report access origin cpu of page faults
  mm/damon: implement sample filter data structure for cpus-only
    monitoring
  mm/damon/core: implement damon_sample_filter manipulations
  mm/damon/core: commit damon_sample_filters
  mm/damon/core: apply sample filter to access reports
  mm/damon/sysfs: implement sample/filters/ directory
  mm/damon/sysfs: implement sample filter directory
  mm/damon/sysfs: implement type, matching, allow files under sample
    filter dir
  mm/damon/sysfs: implement cpumask file under sample filter dir
  mm/damon/sysfs: connect sample filters with core layer
  Docs/mm/damon/design: document sample filters
  Docs/admin-guide/mm/damon/usage: document sample filters dir
  mm/damon: extend damon_access_report for access-origin thread info
  mm/damon/core: report access-generated thread id of the fault event
  mm/damon: extend damon_sample_filter for threads
  mm/damon/core: support threads type sample filter
  mm/damon/sysfs: support thread based access sample filtering
  Docs/mm/damon/design: document threads type sample filter
  Docs/admin-guide/mm/damon/usage: document tids_arr file
  mm/damon: support reporting write access
  mm/damon/core: report whether the page fault was for writing
  mm/damon/core: support write access sample filter
  mm/damon/sysfs: support write-type access sample filter
  Docs/mm/damon/design: document write access sample filter type

 Documentation/admin-guide/mm/damon/usage.rst |  43 +-
 Documentation/mm/damon/design.rst            |  76 +++
 include/linux/damon.h                        | 133 ++++
 include/linux/mm.h                           |   1 +
 mm/damon/core.c                              | 339 +++++++++-
 mm/damon/paddr.c                             |  66 +-
 mm/damon/sysfs.c                             | 622 +++++++++++++++++++
 mm/memory.c                                  |  60 +-
 mm/mprotect.c                                |   5 +
 9 files changed, 1338 insertions(+), 7 deletions(-)


base-commit: 120d322d058f56f6cb92115b5a589ee9b4f07664
-- 
2.47.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ