lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4985328-dbfa-4c47-9cf9-12aa89ba9798@suse.cz>
Date: Fri, 18 Oct 2024 18:10:37 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Matthew Wilcox <willy@...radead.org>, "Paul E . McKenney"
 <paulmck@...nel.org>, Jann Horn <jannh@...gle.com>,
 David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Muchun Song <muchun.song@...ux.dev>,
 Richard Henderson <richard.henderson@...aro.org>,
 Ivan Kokshaysky <ink@...assic.park.msu.ru>, Matt Turner
 <mattst88@...il.com>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
 "James E . J . Bottomley" <James.Bottomley@...senPartnership.com>,
 Helge Deller <deller@....de>, Chris Zankel <chris@...kel.net>,
 Max Filippov <jcmvbkbc@...il.com>, Arnd Bergmann <arnd@...db.de>,
 linux-alpha@...r.kernel.org, linux-mips@...r.kernel.org,
 linux-parisc@...r.kernel.org, linux-arch@...r.kernel.org,
 Shuah Khan <shuah@...nel.org>, Christian Brauner <brauner@...nel.org>,
 linux-kselftest@...r.kernel.org, Sidhartha Kumar
 <sidhartha.kumar@...cle.com>, Jeff Xu <jeffxu@...omium.org>,
 Christoph Hellwig <hch@...radead.org>, Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH 0/4] implement lightweight guard pages

+CC linux-api (also should on future revisions)

On 10/17/24 22:42, Lorenzo Stoakes wrote:
> Userland library functions such as allocators and threading implementations
> often require regions of memory to act as 'guard pages' - mappings which,
> when accessed, result in a fatal signal being sent to the accessing
> process.
> 
> The current means by which these are implemented is via a PROT_NONE mmap()
> mapping, which provides the required semantics however incur an overhead of
> a VMA for each such region.
> 
> With a great many processes and threads, this can rapidly add up and incur
> a significant memory penalty. It also has the added problem of preventing
> merges that might otherwise be permitted.
> 
> This series takes a different approach - an idea suggested by Vlasimil
> Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
> provenance becomes a little tricky to ascertain after this - please forgive
> any omissions!)  - rather than locating the guard pages at the VMA layer,
> instead placing them in page tables mapping the required ranges.
> 
> Early testing of the prototype version of this code suggests a 5 times
> speed up in memory mapping invocations (in conjunction with use of
> process_madvise()) and a 13% reduction in VMAs on an entirely idle android
> system and unoptimised code.
> 
> We expect with optimisation and a loaded system with a larger number of
> guard pages this could significantly increase, but in any case these
> numbers are encouraging.
> 
> This way, rather than having separate VMAs specifying which parts of a
> range are guard pages, instead we have a VMA spanning the entire range of
> memory a user is permitted to access and including ranges which are to be
> 'guarded'.
> 
> After mapping this, a user can specify which parts of the range should
> result in a fatal signal when accessed.
> 
> By restricting the ability to specify guard pages to memory mapped by
> existing VMAs, we can rely on the mappings being torn down when the
> mappings are ultimately unmapped and everything works simply as if the
> memory were not faulted in, from the point of view of the containing VMAs.
> 
> This mechanism in effect poisons memory ranges similar to hardware memory
> poisoning, only it is an entirely software-controlled form of poisoning.
> 
> Any poisoned region of memory is also able to 'unpoisoned', that is, to
> have its poison markers removed.
> 
> The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
> which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
> poisoning.
> 
> Poisoning can be performed across multiple VMAs and any existing mappings
> will be cleared, that is zapped, before installing the poisoned page table
> mappings.
> 
> There is no concept of 'nested' poisoning, multiple attempts to poison a
> range will, after the first poisoning, have no effect.
> 
> Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
> memory, so a user can safely unpoison a range of memory and clear only
> poison page table mappings leaving the rest intact.
> 
> The actual mechanism by which the page table entries are specified makes
> use of existing logic - PTE markers, which are used for the userfaultfd
> UFFDIO_POISON mechanism.
> 
> Unfortunately PTE_MARKER_POISONED is not suited for the guard page
> mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
> handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
> logic to handle it.
> 
> We also extend the generic page walk mechanism to allow for installation of
> PTEs (carefully restricted to memory management logic only to prevent
> unwanted abuse).
> 
> We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
> remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
> specified for a VMA which implies a total removal of memory
> characteristics).
> 
> It's important to note that the guard page implementation is emphatically
> NOT a security feature, so a user can remove the poisoning if they wish. We
> simply implement it in such a way as to provide the least surprising
> behaviour.
> 
> An extensive set of self-tests are provided which ensure behaviour is as
> expected and additionally self-documents expected behaviour of poisoned
> ranges.
> 
> Suggested-by: Vlastimil Babka <vbabka@...e.cz>

Please fix the domain typo (also in patch 3 :)

Thanks for implementing this,
Vlastimil

> Suggested-by: Jann Horn <jannh@...gle.com>
> Suggested-by: David Hildenbrand <david@...hat.com>
> 
> v1
> * Un-RFC'd as appears no major objections to approach but rather debate on
>   implementation.
> * Fixed issue with arches which need mmu_context.h and
>   tlbfush.h. header imports in pagewalker logic to be able to use
>   update_mmu_cache() as reported by the kernel test bot.
> * Added comments in page walker logic to clarify who can use
>   ops->install_pte and why as well as adding a check_ops_valid() helper
>   function, as suggested by Christoph.
> * Pass false in full parameter in pte_clear_not_present_full() as suggested
>   by Jann.
> * Stopped erroneously requiring a write lock for the poison operation as
>   suggested by Jann and Suren.
> * Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
>   consistent with how this is used elsewhere in the kernel as suggested by
>   Jann.
> * Avoid returning -EAGAIN if we are raced on page faults, just keep looping
>   and duck out if a fatal signal is pending or a conditional reschedule is
>   needed, as suggested by Jann.
> * Avoid needlessly splitting huge PUDs and PMDs by specifying
>   ACTION_CONTINUE, as suggested by Jann.
> 
> RFC
> https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
> 
> Lorenzo Stoakes (4):
>   mm: pagewalk: add the ability to install PTEs
>   mm: add PTE_MARKER_GUARD PTE marker
>   mm: madvise: implement lightweight guard page mechanism
>   selftests/mm: add self tests for guard page feature
> 
>  arch/alpha/include/uapi/asm/mman.h       |    3 +
>  arch/mips/include/uapi/asm/mman.h        |    3 +
>  arch/parisc/include/uapi/asm/mman.h      |    3 +
>  arch/xtensa/include/uapi/asm/mman.h      |    3 +
>  include/linux/mm_inline.h                |    2 +-
>  include/linux/pagewalk.h                 |   18 +-
>  include/linux/swapops.h                  |   26 +-
>  include/uapi/asm-generic/mman-common.h   |    3 +
>  mm/hugetlb.c                             |    3 +
>  mm/internal.h                            |    6 +
>  mm/madvise.c                             |  168 ++++
>  mm/memory.c                              |   18 +-
>  mm/mprotect.c                            |    3 +-
>  mm/mseal.c                               |    1 +
>  mm/pagewalk.c                            |  200 ++--
>  tools/testing/selftests/mm/.gitignore    |    1 +
>  tools/testing/selftests/mm/Makefile      |    1 +
>  tools/testing/selftests/mm/guard-pages.c | 1168 ++++++++++++++++++++++
>  18 files changed, 1564 insertions(+), 66 deletions(-)
>  create mode 100644 tools/testing/selftests/mm/guard-pages.c
> 
> --
> 2.46.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ