lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1741185865.git.lorenzo.stoakes@oracle.com>
Date: Wed,  5 Mar 2025 14:55:06 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
        Paul Moore <paul@...l-moore.com>,
        Stephen Smalley <stephen.smalley.work@...il.com>,
        Ondrej Mosnacek <omosnace@...hat.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        selinux@...r.kernel.org
Subject: [RFC PATCH 0/2] mm: introduce anon_vma flags, reduce kernel allocs

VMA resources are scarce. This is a data structure whose weight we wish to
reduce (certainly as slab allocations are unreclaimable and - for now -
unmigratable).

So adding additional fields is generally unviable, and VMA flags are
equally as contended, and prevent VMA merge, further impacting overhead.

We can however make use of the time-honoured kernel tradition of grabbing
bits where we can.

Since we can rely upon anon_vma allocations being at least system
word-aligned, we have a handful of bits in the vma->anon_vma available to
use as flags.

In this series we establish doing so, and immediately use this to solve a
problem encountered as part of the guard region feature
(MADV_GUARD_INSTALL, MADV_GUARD_REMOVE).

We absolutely must preserve guard regions over fork, however it turns out
the only reasonable means of doing so is to establish an anon_vma even if
the VMA is unfaulted.

This creates unnecessary overhead, a problem extenuated by the extension of
this functionality to file-backed regions, where such-allocated memory may
never be utilised or freed until the end of the VMA's lifetime.

We can avoid this if we have a means of indicating to fork that we wish to
copy page tables without having to have this overhead.

Having flags available in vma->anon_vma allows us to do so - we can
therefore introduce a flag, ANON_VMA_UNFAULTED, which indicates that this
is the case.

We introduce wrapper functions to mask off these bits, and nearly every
part of the kernel behaves precisely the same as a result, with only the
desired change in behaviour in the forking logic.

On fault, or any operation that actually requires an established anon_vma,
the ANON_VMA_UNFAULTED flag is cleared and replaced by an actual anon_vma.

An additional advantage of having this mechanism is that we can also remove
this flag, should no 'real' anon_vma be established, and the user is
executing MADV_GUARD_REMOVE on the whole VMA, meaning we can prevent future
unneeded page table operations.

A benefit of this change, aside from saving kernel memory allocations, is
that THP page collapse is no longer impacted if we apply guard regions then
remove them in their entirety from a VMA, as otherwise the immediate
collapse of aligned page tables in retract_page_tables() cannot proceed.

Lorenzo Stoakes (2):
  mm: introduce anon_vma flags and use wrapper functions
  mm/madvise: utilise anon_vma unfaulted flag on guard region install

 fs/coredump.c                    |  2 +-
 include/linux/mm_types.h         | 67 ++++++++++++++++++++-
 include/linux/rmap.h             |  4 +-
 kernel/fork.c                    |  4 +-
 mm/debug.c                       |  6 +-
 mm/huge_memory.c                 |  4 +-
 mm/khugepaged.c                  | 12 ++--
 mm/ksm.c                         | 16 +++---
 mm/madvise.c                     | 49 ++++++++++------
 mm/memory.c                      |  6 +-
 mm/mmap.c                        |  2 +-
 mm/mprotect.c                    |  2 +-
 mm/mremap.c                      |  8 +--
 mm/rmap.c                        | 42 +++++++-------
 mm/swapfile.c                    |  2 +-
 mm/userfaultfd.c                 |  2 +-
 mm/vma.c                         | 99 +++++++++++++++++++++++++-------
 mm/vma.h                         |  6 +-
 security/selinux/hooks.c         |  2 +-
 tools/testing/vma/vma.c          | 95 +++++++++++++++---------------
 tools/testing/vma/vma_internal.h | 78 ++++++++++++++++++++++---
 21 files changed, 358 insertions(+), 150 deletions(-)

--
2.48.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ