linux-kernel - Re: [PATCH] mm: allow huge kvmalloc() calls if they're accounted to memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <202110180809.66AE562E97@keescook>
Date:   Mon, 18 Oct 2021 08:09:59 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     torvalds@...ux-foundation.org, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, seanjc@...gle.com,
        Willy Tarreau <w@....eu>,
        syzbot+e0de2333cbf95ea473e8@...kaller.appspotmail.com
Subject: Re: [PATCH] mm: allow huge kvmalloc() calls if they're accounted to
 memcg

On Sat, Oct 16, 2021 at 02:51:30AM -0400, Paolo Bonzini wrote:
> Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> restricted memory allocation with 'kvmalloc()' to sizes that fit
> in an 'int', to protect against trivial integer conversion issues.
> 
> However, the WARN triggers with KVM, when it allocates ancillary page
> data whose size essentially depends on whatever userspace has passed to
> the KVM_SET_USER_MEMORY_REGION ioctl.  The warnings are easily raised by
> syzkaller, but the largest allocation that KVM can do is 8 bytes per page
> of guest memory; therefore, a 1 TiB memslot will cause a warning even
> outside fuzzing, and those allocations are known to happen in the wild.
> Google for example already has VMs that create 1.5tb memslots (12tb of
> total guest memory spread across 8 virtual NUMA nodes).
> 
> Use memcg accounting as evidence that the crazy large allocations are
> expected---in which case, it is indeed a good idea to have them
> properly accounted---and exempt them from the warning.

Will memcg always have a "sane" upper bound? If so, yeah, this seems a
better solution than dropping the WARN completely. :)

Reviewed-by: Kees Cook <keescook@...omium.org>

-Kees

> 
> Cc: Willy Tarreau <w@....eu>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Reported-by: syzbot+e0de2333cbf95ea473e8@...kaller.appspotmail.com
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> ---
> 	Linus, what do you think of this?  It is a bit of a hack,
> 	but the reasoning in the commit message does make at least
> 	some sense.
> 
> 	The alternative would be to just use __vmalloc in KVM, and add
> 	__vcalloc too.	The two underscores would suggest that something
> 	"different" is going on, but I wonder what you prefer between
> 	this and having a __vcalloc with 2-3 uses in the whole source.
> 
>  mm/util.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 499b6b5767ed..31fca4a999c6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -593,8 +593,12 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> -	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> +	/*
> +	 * Don't even allow crazy sizes unless memcg accounting is
> +	 * request.  We take that as a sign that huge allocations
> +	 * are indeed expected.
> +	 */
> +	if (likely(!(flags & __GFP_ACCOUNT)) && WARN_ON_ONCE(size > INT_MAX))
>  		return NULL;
>  
>  	return __vmalloc_node(size, 1, flags, node,
> -- 
> 2.27.0
> 

-- 
Kees Cook