lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81a71d95-fa22-a423-b94a-4bc35e1454f9@huaweicloud.com>
Date:   Sun, 25 Jun 2023 19:25:25 +0800
From:   "GONG, Ruiqi" <gongruiqi@...weicloud.com>
To:     Kees Cook <keescook@...omium.org>, Vlastimil Babka <vbabka@...e.cz>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        David Rientjes <rientjes@...gle.com>,
        Pekka Enberg <penberg@...nel.org>,
        Christoph Lameter <cl@...ux.com>, Tejun Heo <tj@...nel.org>,
        Dennis Zhou <dennis@...nel.org>,
        Alexander Potapenko <glider@...gle.com>,
        Marco Elver <elver@...gle.com>, Jann Horn <jannh@...gle.com>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Alexander Lobakin <aleksander.lobakin@...el.com>,
        Pedro Falcato <pedro.falcato@...il.com>,
        Paul Moore <paul@...l-moore.com>,
        James Morris <jmorris@...ei.org>,
        "Serge E . Hallyn" <serge@...lyn.com>,
        Wang Weiyang <wangweiyang2@...wei.com>,
        Xiu Jianfeng <xiujianfeng@...wei.com>, linux-mm@...ck.org,
        linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org,
        gongruiqi1@...wei.com
Subject: Re: [PATCH v3 1/1] Randomized slab caches for kmalloc()



On 2023/06/22 2:21, Kees Cook wrote:
> On Fri, Jun 16, 2023 at 07:18:43PM +0800, GONG, Ruiqi wrote:
>> [...]
>>
>> Signed-off-by: GONG, Ruiqi <gongruiqi@...weicloud.com>
>> Co-developed-by: Xiu Jianfeng <xiujianfeng@...wei.com>
>> Signed-off-by: Xiu Jianfeng <xiujianfeng@...wei.com>
> 
> I think this looks really good. Thanks for the respin! Some
> nits/comments/questions below, but I think this can land and get
> incrementally improved. Please consider it:
> 
> Reviewed-by: Kees Cook <keescook@...omium.org>
> 

Thanks, Kees!

>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index 791f7453a04f..b7a5387f0dad 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -19,6 +19,9 @@
>>  #include <linux/workqueue.h>
>>  #include <linux/percpu-refcount.h>
>>  
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +#include <linux/hash.h>
>> +#endif
> 
> I think this can just be included unconditionally, yes?
> 

True. Will change it.

>> [...]
>> +extern unsigned long random_kmalloc_seed;
>> +
>> +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
>>  {
>>  	/*
>>  	 * The most common case is KMALLOC_NORMAL, so test for it
>>  	 * with a single branch for all the relevant flags.
>>  	 */
>>  	if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +		return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed,
>> +						      CONFIG_RANDOM_KMALLOC_CACHES_BITS);
>> +#else
>>  		return KMALLOC_NORMAL;
>> +#endif
> 
> The commit log talks about having no runtime lookup, but that's not
> entirely true, given this routine. And xor and a hash_64... I wonder how
> expensive this is compared to some kind of constant expression that
> could be computed at build time... (the xor should stay, but that's
> "cheap").
> 

To be precise, currently the random selection is static during each time
the system starts and runs, but not across different system startups. In
the commit log, I've added the following paragraph to explain this
feature, and will expand it a bit in the next version:

"Meanwhile, the static random selection is further enhanced with a
per-boot random seed, which prevents the attacker from finding a usable
kmalloc that happens to pick the same cache with the vulnerable
subsystem/module by analyzing the open source code."

As for the build-time hashing, I think theoretically it could be
achieved, as long as we can have a compile-time random number generator.
However afaik the compiler has no support on this at the moment, and I
can only find a few discussions about this (in the C++ community).

>>  
>>  	/*
>>  	 * At least one of the flags has to be set. Their priorities in
>> @@ -577,7 +589,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>>  
>>  		index = kmalloc_index(size);
>>  		return kmalloc_trace(
>> -				kmalloc_caches[kmalloc_type(flags)][index],
>> +				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>>  				flags, size);
>>  	}
>>  	return __kmalloc(size, flags);
>> @@ -593,7 +605,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
>>  
>>  		index = kmalloc_index(size);
>>  		return kmalloc_node_trace(
>> -				kmalloc_caches[kmalloc_type(flags)][index],
>> +				kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>>  				flags, node, size);
>>  	}
>>  	return __kmalloc_node(size, flags, node);
> 
> The use of _RET_IP_ is generally fine here, but I wonder about some of
> the allocation wrappers (like devm_kmalloc(), etc). I think those aren't
> being bucketed correctly? Have you checked that?
> 

Yes, I checked the distribution of used slab caches by booting the
kernel in QEMU, and /proc/slabinfo shows that they are in general evenly
spread among the copies.

I think in most cases, hashing on _RET_IP_ can effectively diverge
allocation in different subsystems/modules into different caches. For
example, using devm_kmalloc() on different places will acquire slab obj
on different cache copies:

xxx_func () {
  devm_kmalloc() {
------------  always inlined alloc_dr()  ---------------
    __kmalloc_node_track_caller(..., _RET_IP_)
--------------------------------------------------------
  }
  next inst. of devm_kmalloc()     // where _RET_IP_ takes
}

There are cases like sk_alloc(), where the wrapping is deep and all
struct sock would gather into a few caches:

sk_alloc() {
  sk_prot_alloc() {
------------  always inlined kmalloc()  -----------------
    kmalloc_trace(... kmalloc_type(flags, _RET_IP_) ...) // A
    __kmalloc(...) {
      __do_kmalloc_node(..., _RET_IP_)                   // B
    }
--------------------------------------------------------
    next inst. of kmalloc()                             // where B takes
  }
  next inst. of sk_prot_alloc()                         // where A takes
}

But it's still better than nothing. Currently _RET_IP_ is the best
option I can think of, and in general it works.

>> [...]
>> @@ -776,12 +781,44 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
>>  #define KMALLOC_RCL_NAME(sz)
>>  #endif
>>  
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b
>> +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz)
>> +#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 1
>> +#define KMA_RAND_1(sz)                  .name[KMALLOC_RANDOM_START +  0] = "kmalloc-random-01-" #sz,
> 
> I wonder if this name is getting too long? Should "random" be "rnd" ?
> *shrug*
> 

Okay. Will do.

>> [...]
>> +#define KMA_RAND_16(sz) KMA_RAND_15(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz,
> 
> And if we wanted to save another character, this could be numbered 0-f,
> but I defer these aesthetics to Vlastimil. :)

Same with me ;)

> 
> -Kees
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ