lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJuCfpEw1fsuQjDcS23yZ-WE+WoA0oKArs=Q=G14Bh-509AYgA@mail.gmail.com>
Date: Thu, 12 Sep 2024 08:58:48 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Kees Cook <kees@...nel.org>
Cc: Vlastimil Babka <vbabka@...e.cz>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	Christoph Lameter <cl@...ux.com>, Pekka Enberg <penberg@...nel.org>, David Rientjes <rientjes@...gle.com>, 
	Joonsoo Kim <iamjoonsoo.kim@....com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org, 
	"GONG, Ruiqi" <gongruiqi@...weicloud.com>, Jann Horn <jannh@...gle.com>, 
	Matteo Rizzo <matteorizzo@...gle.com>, jvoisin <julien.voisin@...tri.org>, 
	Xiu Jianfeng <xiujianfeng@...wei.com>, linux-kernel@...r.kernel.org, 
	linux-hardening@...r.kernel.org
Subject: Re: [PATCH 5/5] slab: Allocate and use per-call-site caches

On Wed, Sep 11, 2024 at 3:30 PM Kees Cook <kees@...nel.org> wrote:
>
> On Thu, Aug 29, 2024 at 10:03:56AM -0700, Suren Baghdasaryan wrote:
> > On Fri, Aug 9, 2024 at 12:33 AM Kees Cook <kees@...nel.org> wrote:
> > >
> > > Use separate per-call-site kmem_cache or kmem_buckets. These are
> > > allocated on demand to avoid wasting memory for unused caches.
> > >
> > > A few caches need to be allocated very early to support allocating the
> > > caches themselves: kstrdup(), kvasprintf(), and pcpu_mem_zalloc(). Any
> > > GFP_ATOMIC allocations are currently left to be allocated from
> > > KMALLOC_NORMAL.
> > >
> > > With a distro config, /proc/slabinfo grows from ~400 entries to ~2200.
> > >
> > > Since this feature (CONFIG_SLAB_PER_SITE) is redundant to
> > > CONFIG_RANDOM_KMALLOC_CACHES, mark it a incompatible. Add Kconfig help
> > > text that compares the features.
> > >
> > > Improvements needed:
> > > - Retain call site gfp flags in alloc_tag meta field to:
> > >   - pre-allocate all GFP_ATOMIC caches (since their caches cannot
> > >     be allocated on demand unless we want them to be GFP_ATOMIC
> > >     themselves...)
> >
> > I'm currently working on a feature to identify allocations with
> > __GFP_ACCOUNT known at compile time (similar to how you handle the
> > size in the previous patch). Might be something you can reuse/extend.
>
> Great, yes! I'd love to check it out.
>
> > >   - Separate MEMCG allocations as well
> >
> > Do you mean allocations with __GFP_ACCOUNT or something else?
>
> I do, yes.
>
> > > +static void alloc_tag_site_init_early(struct codetag *ct)
> > > +{
> > > +       /* Explicitly initialize the caches needed to initialize caches. */
> > > +       if (strcmp(ct->function, "kstrdup") == 0 ||
> > > +           strcmp(ct->function, "kvasprintf") == 0 ||
> > > +           strcmp(ct->function, "pcpu_mem_zalloc") == 0)
> >
> > I hope we can find a better way to distinguish these allocations.
> > Maybe have a specialized hook for them, like alloc_hooks_early() which
> > sets a bit inside ct->flags to distinguish them?
>
> That might be possible. I'll see how that ends up looking. I don't want
> to even further fragment the alloc_hooks_... variants.
>
> >
> > > +               alloc_tag_site_init(ct, false);
> > > +
> > > +       /* TODO: pre-allocate GFP_ATOMIC caches here. */
> >
> > You could pre-allocate GFP_ATOMIC caches during
> > alloc_tag_module_load() only if gfp_flags are known at compile time I
> > think. I guess for the dynamic case choose_slab() will fall back to
> > kmalloc_slab()?
>
> Right, yes. I'd do it like the size checking: if we know at compile
> time, we can depend on it, otherwise it's a run-time fallback.
>
> >
> > > @@ -175,8 +258,21 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
> > >
> > >                 if (WARN(counter.bytes,
> > >                          "%s:%u module %s func:%s has %llu allocated at module unload",
> > > -                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> > > +                        ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) {
> > >                         module_unused = false;
> > > +               }
> > > +#ifdef CONFIG_SLAB_PER_SITE
> > > +               else if (tag->meta.sized) {
> > > +                       /* Remove the allocated caches, if possible. */
> > > +                       void *p = READ_ONCE(tag->meta.cache);
> > > +
> > > +                       WRITE_ONCE(tag->meta.cache, NULL);
> >
> > I'm guessing you are not using try_cmpxchg() the same way you did in
> > alloc_tag_site_init() because a race with any other user is impossible
> > at the module unload time? If so, a comment mentioning that would be
> > good.
>
> Correct. It should not be possible. But yes, I will add a comment.
>
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 855c63c3270d..4f01cb6dd32e 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -302,7 +302,20 @@ config SLAB_PER_SITE
> > >         default SLAB_FREELIST_HARDENED
> > >         select SLAB_BUCKETS
> > >         help
> > > -         Track sizes of kmalloc() call sites.
> > > +         As a defense against shared-cache "type confusion" use-after-free
> > > +         attacks, every kmalloc()-family call allocates from a separate
> > > +         kmem_cache (or when dynamically sized, kmem_buckets). Attackers
> > > +         will no longer be able to groom malicious objects via similarly
> > > +         sized allocations that share the same cache as the target object.
> > > +
> > > +         This increases the "at rest" kmalloc slab memory usage by
> > > +         roughly 5x (around 7MiB), and adds the potential for greater
> > > +         long-term memory fragmentation. However, some workloads
> > > +         actually see performance improvements when single allocation
> > > +         sites are hot.
> >
> > I hope you provide the performance and overhead data in the cover
> > letter when you post v1.
>
> That's my plan. It's always odd choosing workloads, but we do seem to
> have a few 'regular' benchmarks (hackbench, kernel builds, etc). Is
> there anything in particular you'd want to see?

I have a stress test implemented as a loadable module to benchmark
slab and page allocation times (just a tight loop and timing it). I
can clean it up a bit and share with you.

>
> > > +static __always_inline
> > > +struct kmem_cache *choose_slab(size_t size, kmem_buckets *b, gfp_t flags,
> > > +                              unsigned long caller)
> > > +{
> > > +#ifdef CONFIG_SLAB_PER_SITE
> > > +       struct alloc_tag *tag = current->alloc_tag;
> > > +
> > > +       if (!b && tag && tag->meta.sized &&
> > > +           kmalloc_type(flags, caller) == KMALLOC_NORMAL &&
> > > +           (flags & GFP_ATOMIC) != GFP_ATOMIC) {
> >
> > What if allocation is GFP_ATOMIC but a previous allocation from the
> > same location (same tag) happened without GFP_ATOMIC and
> > tag->meta.cache was allocated. Why not use that existing cache?
> > Same if the tag->meta.cache was pre-allocated.
>
> Maybe I was being too conservative in my understanding -- I thought that
> I couldn't use those caches on the chance that they may already be full?
> Or is that always the risk, ad GFP_ATOMIC deals with that? If it would
> be considered safe attempt the allocation from the existing cache, then
> yeah, I can adjust this check.

Well, you fall back to kmalloc_slab() which also might be full. So,
how would using an existing cache be different?

>
> Thanks for looking these over!
>
> -Kees
>
> --
> Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ