lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1762531708.git.lorenzo.stoakes@oracle.com>
Date: Fri,  7 Nov 2025 16:11:45 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Jonathan Corbet <corbet@....net>, David Hildenbrand <david@...hat.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
        Zi Yan <ziy@...dia.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Lance Yang <lance.yang@...ux.dev>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, Andrei Vagin <avagin@...il.com>
Subject: [PATCH v3 0/8] introduce VM_MAYBE_GUARD and make it sticky

Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.

This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.

This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.

The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.

It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.

To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:

a. if set in one VMA and not in another still permit those VMAs to be
   merged (if otherwise compatible).

b. When they are merged, the resultant VMA must have the flag set.

The VMA logic is updated to propagate these flags correctly.

Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.

We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves this
issue.

Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.

The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.

This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.

Finally we introduce extensive VMA userland tests to assert that the sticky
VMA logic behaves correctly as well as guard region self tests to assert
that smaps visibility is correctly implemented.


v3:
* Propagated tags thanks Vlastimil & Pedro! :)
* Fixed doc nit as per Pedro.
* Added vma_flag_test_atomic() in preparation for fixing
  retract_page_tables() (see below). We make this not require any locks, as
  we serialise on the page table lock in retract_page_tables().
* Split the atomic flag enablement and actually setting the flag for guard
  install into two separate commits so we clearly separate the various VMA
  flag implementation details and us enabling this feature.
* Mentioned setting anon_vma for anonymous mappings in commit message as
  per Vlastimil.
* Fixed an issue with retract_page_tables() whereby madvise(...,
  MADV_COLLAPSE) relies upon file-backed VMAs not being collapsed due to
  the UFFD WP VMA flag being set or the VMA having vma->anon_vma set
  (i.e. being a MAP_PRIVATE file-backed VMA). This was updated to also
  check for VM_MAYBE_GUARD.
* Introduced MADV_COLLAPSE self test to assert that the behaviour is
  correct. I first reproduced the issue locally and then adapted the test
  to assert that this no longer occurs.
* Mentioned KCSAN permissiveness in commit message as per Pedro.
* Mentioned mmap/VMA read lock excluding mmap/VMA write lock and thus
  avoiding meaningful RMW races in commit message as per Vlastimil.
* Mentioned previous unconditional vma->anon_vma installation on guard
  region installation as per Vlastimil.
* Avoided having merging compromised by reordering patches such that the
  sticky VMA functionality is implemented prior to VM_MAYBE_GUARD being
  utilised upon guard region installation, rendering Vlastimil's request to
  mention this in a commit message unnecessary.
* Separated out sticky and copy on fork patches as per Pedro.
* Added VM_PFNMAP, VM_MIXEDMAP, VM_UFFD_WP to VM_COPY_ON_FORK to make
  things more consistent and clean.
* Added mention of why generally VM_STICKY should be VM_COPY_ON_FORK in
  copy on fork patch.

v2:
* Separated out userland VMA tests for sticky behaviour as per Suren.
* Added the concept of atomic writable VMA flags as per Pedro and Vlastimil.
* Made VM_MAYBE_GUARD an atomic writable flag so we don't have to take a VMA
  write lock in madvise() as per Pedro and Vlastimil.
https://lore.kernel.org/all/cover.1762422915.git.lorenzo.stoakes@oracle.com/

v1:
https://lore.kernel.org/all/cover.1761756437.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (8):
  mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
  mm: add atomic VMA flags and set VM_MAYBE_GUARD as such
  mm: implement sticky VMA flags
  mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one
  mm: set the VM_MAYBE_GUARD flag on guard region install
  tools/testing/vma: add VMA sticky userland tests
  tools/testing/selftests/mm: add MADV_COLLAPSE test case
  tools/testing/selftests/mm: add smaps visibility guard region test

 Documentation/filesystems/proc.rst         |   5 +-
 fs/proc/task_mmu.c                         |   1 +
 include/linux/mm.h                         | 102 ++++++++++++
 include/trace/events/mmflags.h             |   1 +
 mm/khugepaged.c                            |  72 +++++---
 mm/madvise.c                               |  22 ++-
 mm/memory.c                                |  14 +-
 mm/vma.c                                   |  22 +--
 tools/testing/selftests/mm/guard-regions.c | 185 +++++++++++++++++++++
 tools/testing/selftests/mm/vm_util.c       |   5 +
 tools/testing/selftests/mm/vm_util.h       |   1 +
 tools/testing/vma/vma.c                    |  89 ++++++++--
 tools/testing/vma/vma_internal.h           |  56 +++++++
 13 files changed, 511 insertions(+), 64 deletions(-)

--
2.51.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ