[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202110180809.66AE562E97@keescook>
Date: Mon, 18 Oct 2021 08:09:59 -0700
From: Kees Cook <keescook@...omium.org>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: torvalds@...ux-foundation.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, seanjc@...gle.com,
Willy Tarreau <w@....eu>,
syzbot+e0de2333cbf95ea473e8@...kaller.appspotmail.com
Subject: Re: [PATCH] mm: allow huge kvmalloc() calls if they're accounted to
memcg
On Sat, Oct 16, 2021 at 02:51:30AM -0400, Paolo Bonzini wrote:
> Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> restricted memory allocation with 'kvmalloc()' to sizes that fit
> in an 'int', to protect against trivial integer conversion issues.
>
> However, the WARN triggers with KVM, when it allocates ancillary page
> data whose size essentially depends on whatever userspace has passed to
> the KVM_SET_USER_MEMORY_REGION ioctl. The warnings are easily raised by
> syzkaller, but the largest allocation that KVM can do is 8 bytes per page
> of guest memory; therefore, a 1 TiB memslot will cause a warning even
> outside fuzzing, and those allocations are known to happen in the wild.
> Google for example already has VMs that create 1.5tb memslots (12tb of
> total guest memory spread across 8 virtual NUMA nodes).
>
> Use memcg accounting as evidence that the crazy large allocations are
> expected---in which case, it is indeed a good idea to have them
> properly accounted---and exempt them from the warning.
Will memcg always have a "sane" upper bound? If so, yeah, this seems a
better solution than dropping the WARN completely. :)
Reviewed-by: Kees Cook <keescook@...omium.org>
-Kees
>
> Cc: Willy Tarreau <w@....eu>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Reported-by: syzbot+e0de2333cbf95ea473e8@...kaller.appspotmail.com
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> ---
> Linus, what do you think of this? It is a bit of a hack,
> but the reasoning in the commit message does make at least
> some sense.
>
> The alternative would be to just use __vmalloc in KVM, and add
> __vcalloc too. The two underscores would suggest that something
> "different" is going on, but I wonder what you prefer between
> this and having a __vcalloc with 2-3 uses in the whole source.
>
> mm/util.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index 499b6b5767ed..31fca4a999c6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -593,8 +593,12 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> if (ret || size <= PAGE_SIZE)
> return ret;
>
> - /* Don't even allow crazy sizes */
> - if (WARN_ON_ONCE(size > INT_MAX))
> + /*
> + * Don't even allow crazy sizes unless memcg accounting is
> + * request. We take that as a sign that huge allocations
> + * are indeed expected.
> + */
> + if (likely(!(flags & __GFP_ACCOUNT)) && WARN_ON_ONCE(size > INT_MAX))
> return NULL;
>
> return __vmalloc_node(size, 1, flags, node,
> --
> 2.27.0
>
--
Kees Cook
Powered by blists - more mailing lists