[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202110180817.A2C3AE34@keescook>
Date: Mon, 18 Oct 2021 08:19:23 -0700
From: Kees Cook <keescook@...omium.org>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
KVM list <kvm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Sean Christopherson <seanjc@...gle.com>,
Willy Tarreau <w@....eu>,
syzbot+e0de2333cbf95ea473e8@...kaller.appspotmail.com
Subject: Re: [PATCH] mm: allow huge kvmalloc() calls if they're accounted to
memcg
On Sun, Oct 17, 2021 at 01:17:56AM +0200, Paolo Bonzini wrote:
> On 16/10/21 20:10, Linus Torvalds wrote:
> > That said, I also do wonder if we could possibly change "kvcalloc()"
> > to avoid the warning. The reason I didn't like your patch is that
> > kvmalloc_node() only takes a "size_t", and the overflow condition
> > there is that "MAX_INT".
> >
> > But the "kvcalloc()" case that takes a "number of elements and size"
> > should _conceptually_ warn not when the total size overflows, but when
> > either number or the element size overflows.
>
> That makes sense, but the number could still overflow in KVM's case; the
> size is small, just 8, it's the count that's humongous. In general,
> users of kvcalloc of kvmalloc_array *should* not be doing
> multiplications (that's the whole point of the functions), and that
> lowers a lot the risk of overflows, but the safest way is to provide
> a variant that does not warn. See the (compile-tested only) patch
> below.
>
> Pulling the WARN in the inline function is a bit ugly. For kvcalloc()
> and kvmalloc_array(), one of the two is almost always constant, but
> it is unlikely that the compiler eliminates both. The impact on a
> localyesconfig build seems to be minimal though (about 150 bytes
> larger out of 20 megabytes of code).
>
> Paolo
>
> ---------------- 8< -----------------
> From: Paolo Bonzini <pbonzini@...hat.com>
> Subject: [PATCH] mm: add kvmalloc variants that do not to warn
>
> Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> restricted memory allocation with 'kvmalloc()' to sizes that fit
> in an 'int', to protect against trivial integer conversion issues.
> However, the WARN triggers with KVM when it allocates ancillary page
> data, whose size essentially depends on whatever userspace has passed to
> the KVM_SET_USER_MEMORY_REGION ioctl. The warnings are quickly found by
> syzkaller, but they can also happen with huge but real-world VMs.
> The largest allocation that KVM can do is 8 bytes per page of guest
> memory, meaning a 1 TiB memslot will cause a warning even outside fuzzing.
> In fact, Google already has VMs that create 1.5 TiB memslots (12 TiB of
> total guest memory spread across 8 virtual NUMA nodes).
>
> For kvcalloc() and kvmalloc_array(), Linus suggested warning if either
> the number or the size are big. However, this would only move the
> goalpost for KVM's warning without fully avoiding it. Therefore,
> provide a "double underscore" version of kvcalloc(), kvmalloc_array()
> and kvmalloc_node() that omits the check.
>
> Cc: Willy Tarreau <w@....eu>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
Ah, so memcg wasn't doing sanity checks?
Is there a cheap way to resolve the question "does this much memory
exist"? The "__" versions end up lacking context for why they're "__"
versions. I.e. do we want something more descriptive, like
__huge_kvmalloc_node() or __unbounded_kvmalloc_node()?
-Kees
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..92aba7327bd8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -799,7 +799,15 @@ static inline int is_vmalloc_or_module_addr(const void *x)
> }
> #endif
> -extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
> +extern void *__kvmalloc_node(size_t size, gfp_t flags, int node);
> +static inline void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> + /* Don't even allow crazy sizes */
> + if (WARN_ON(size > INT_MAX))
> + return NULL;
> + return __kvmalloc_node(size, flags, node);
> +}
> +
> static inline void *kvmalloc(size_t size, gfp_t flags)
> {
> return kvmalloc_node(size, flags, NUMA_NO_NODE);
> @@ -813,14 +821,31 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
> return kvmalloc(size, flags | __GFP_ZERO);
> }
> -static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> +static inline void *__kvmalloc_array(size_t n, size_t size, gfp_t flags)
> {
> size_t bytes;
> if (unlikely(check_mul_overflow(n, size, &bytes)))
> return NULL;
> - return kvmalloc(bytes, flags);
> + return __kvmalloc_node(bytes, flags, NUMA_NO_NODE);
> +}
> +
> +static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> +{
> + /*
> + * Don't allow crazy sizes here, either. For 64-bit,
> + * this also lets the compiler avoid the overflow check.
> + */
> + if (WARN_ON(size > INT_MAX || n > INT_MAX))
> + return NULL;
> +
> + return __kvmalloc_array(n, size, flags);
> +}
> +
> +static inline void *__kvcalloc(size_t n, size_t size, gfp_t flags)
> +{
> + return __kvmalloc_array(n, size, flags | __GFP_ZERO);
> }
> static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
> diff --git a/mm/util.c b/mm/util.c
> index 499b6b5767ed..0406709d8097 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -558,7 +558,7 @@ EXPORT_SYMBOL(vm_mmap);
> *
> * Return: pointer to the allocated memory of %NULL in case of failure
> */
> -void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +void *__kvmalloc_node(size_t size, gfp_t flags, int node)
> {
> gfp_t kmalloc_flags = flags;
> void *ret;
> @@ -593,14 +593,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> if (ret || size <= PAGE_SIZE)
> return ret;
> - /* Don't even allow crazy sizes */
> - if (WARN_ON_ONCE(size > INT_MAX))
> - return NULL;
> -
> return __vmalloc_node(size, 1, flags, node,
> __builtin_return_address(0));
> }
> -EXPORT_SYMBOL(kvmalloc_node);
> +EXPORT_SYMBOL(__kvmalloc_node);
> /**
> * kvfree() - Free memory.
>
> > So I would also accept a patch that just changes how "kvcalloc()"
> > works (or how "kvmalloc_array()" works).
> >
> > It's a bit annoying how we've ended up losing that "n/size"
> > information by the time we hit kvmalloc().
> >
> > Linus
> >
>
--
Kees Cook
Powered by blists - more mailing lists