lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240531191304.it.853-kees@kernel.org>
Date: Fri, 31 May 2024 12:14:52 -0700
From: Kees Cook <kees@...nel.org>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Kees Cook <kees@...nel.org>,
	"GONG, Ruiqi" <gongruiqi@...weicloud.com>,
	Christoph Lameter <cl@...ux.com>,
	Pekka Enberg <penberg@...nel.org>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	jvoisin <julien.voisin@...tri.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Hyeonggon Yoo <42.hyeyoo@...il.com>,
	Xiu Jianfeng <xiujianfeng@...wei.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Kent Overstreet <kent.overstreet@...ux.dev>,
	Jann Horn <jannh@...gle.com>,
	Matteo Rizzo <matteorizzo@...gle.com>,
	Thomas Graf <tgraf@...g.ch>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	linux-hardening@...r.kernel.org,
	netdev@...r.kernel.org
Subject: [PATCH v4 0/6] slab: Introduce dedicated bucket allocator

Hi,

 v4:
  - Rebase to v6.10-rc1
  - Add CONFIG_SLAB_BUCKETS to turn off the feature
 v3: https://lore.kernel.org/lkml/20240424213019.make.366-kees@kernel.org/
 v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@kernel.org/
 v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@kernel.org/

For the cover letter, I'm repeating commit log for patch 4 here, which has
additional clarifications and rationale since v2:

    Dedicated caches are available for fixed size allocations via
    kmem_cache_alloc(), but for dynamically sized allocations there is only
    the global kmalloc API's set of buckets available. This means it isn't
    possible to separate specific sets of dynamically sized allocations into
    a separate collection of caches.
    
    This leads to a use-after-free exploitation weakness in the Linux
    kernel since many heap memory spraying/grooming attacks depend on using
    userspace-controllable dynamically sized allocations to collide with
    fixed size allocations that end up in same cache.
    
    While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
    against these kinds of "type confusion" attacks, including for fixed
    same-size heap objects, we can create a complementary deterministic
    defense for dynamically sized allocations that are directly user
    controlled. Addressing these cases is limited in scope, so isolation these
    kinds of interfaces will not become an unbounded game of whack-a-mole. For
    example, pass through memdup_user(), making isolation there very
    effective.
    
    In order to isolate user-controllable sized allocations from system
    allocations, introduce kmem_buckets_create(), which behaves like
    kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like
    kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for
    where caller tracking is needed. Introduce kmem_buckets_valloc() for
    cases where vmalloc callback is needed.
    
    Allows for confining allocations to a dedicated set of sized caches
    (which have the same layout as the kmalloc caches).
    
    This can also be used in the future to extend codetag allocation
    annotations to implement per-caller allocation cache isolation[1] even
    for dynamic allocations.
    
    Memory allocation pinning[2] is still needed to plug the Use-After-Free
    cross-allocator weakness, but that is an existing and separate issue
    which is complementary to this improvement. Development continues for
    that feature via the SLAB_VIRTUAL[3] series (which could also provide
    guard pages -- another complementary improvement).
    
    Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
    Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
    Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@google.com/ [3]

After the core implementation are 2 patches that cover the most heavily
abused "repeat offenders" used in exploits. Repeating those details here:

    The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
    use-after-free type confusion flaws in the kernel for both read and
    write primitives. Avoid having a user-controlled size cache share the
    global kmalloc allocator by using a separate set of kmalloc buckets.
    
    Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
    Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
    Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
    Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
    Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
    Link: https://zplin.me/papers/ELOISE.pdf [6]
    Link: https://syst3mfailure.io/wall-of-perdition/ [7]

    Both memdup_user() and vmemdup_user() handle allocations that are
    regularly used for exploiting use-after-free type confusion flaws in
    the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
    respectively).
    
    Since both are designed for contents coming from userspace, it allows
    for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
    buckets so these allocations do not share caches with the global kmalloc
    buckets.
    
    Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
    Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
    Link: https://etenal.me/archives/1336 [3]
    Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]

Thanks!

-Kees


Kees Cook (6):
  mm/slab: Introduce kmem_buckets typedef
  mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets
    argument
  mm/slab: Introduce kmem_buckets_create() and family
  ipc, msg: Use dedicated slab buckets for alloc_msg()
  mm/util: Use dedicated slab buckets for memdup_user()

 include/linux/slab.h | 70 ++++++++++++++++++++++++++++-------
 ipc/msgutil.c        | 13 ++++++-
 lib/rhashtable.c     |  2 +-
 mm/Kconfig           | 15 ++++++++
 mm/slab.h            |  6 ++-
 mm/slab_common.c     | 87 ++++++++++++++++++++++++++++++++++++++++++--
 mm/slub.c            | 34 ++++++++++++-----
 mm/util.c            | 29 +++++++++++----
 8 files changed, 217 insertions(+), 39 deletions(-)

-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ