linux-kernel - Re: [PATCH] mm/vmalloc: do not output a spurious warning when huge vmalloc() fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 6 Jun 2023 09:24:33 +0100
From:   Lorenzo Stoakes <lstoakes@...il.com>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Michal Hocko <mhocko@...e.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Baoquan He <bhe@...hat.com>,
        Christoph Hellwig <hch@...radead.org>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        Linux btrfs <linux-btrfs@...r.kernel.org>,
        Linux Regressions <regressions@...ts.linux.dev>,
        Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>, a1bert@...as.cz,
        Forza <forza@...nline.net>
Subject: Re: [PATCH] mm/vmalloc: do not output a spurious warning when huge
 vmalloc() fails

On Tue, Jun 06, 2023 at 10:17:02AM +0200, Uladzislau Rezki wrote:
> On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote:
> >
> > On 6/5/23 22:11, Lorenzo Stoakes wrote:
> > > In __vmalloc_area_node() we always warn_alloc() when an allocation
> > > performed by vm_area_alloc_pages() fails unless it was due to a pending
> > > fatal signal.
> > >
> > > However, huge page allocations instigated either by vmalloc_huge() or
> > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
> > > kvmalloc_node()) always falls back to order-0 allocations if the huge page
> > > allocation fails.
> > >
> > > This renders the warning useless and noisy, especially as all callers
> > > appear to be aware that this may fallback. This has already resulted in at
> > > least one bug report from a user who was confused by this (see link).
> > >
> > > Therefore, simply update the code to only output this warning for order-0
> > > pages when no fatal signal is pending.
> > >
> > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@...il.com>
> >
> > I think there are more reports of same thing from the btrfs context, that
> > appear to be a 6.3 regression
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=217466
> > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/
> >
> I had a look at that report. The btrfs complains due to the
> fact that a high-order page(1 << 9) can not be obtained. In the
> vmalloc code we do not fall to 0-order allocator if there is
> a request of getting a high-order.

This isn't true, we _do_ fallback to order-0 (this is the basis of my patch), in
__vmalloc_node_range():-

	/* Allocate physical pages and map them into vmalloc space. */
	ret = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
	if (!ret)
		goto fail;

...

fail:
	if (shift > PAGE_SHIFT) {
		shift = PAGE_SHIFT;
		align = real_align;
		size = real_size;
		goto again;
	}

With the order being derived from shift, and __vmalloc_area_node() only being
called from __vmalloc_node_range().

>
> I provided a patch to fallback if a high-order. A reproducer, after
> applying the patch, started to get oppses in another places.
>
> IMO, we should fallback even for high-order requests. Because it is
> highly likely it can not be accomplished.
>
> Any thoughts?
>
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..7a06452f7807 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>                         page = alloc_pages(alloc_gfp, order);
>                 else
>                         page = alloc_pages_node(nid, alloc_gfp, order);
> +
>                 if (unlikely(!page)) {
> -                       if (!nofail)
> -                               break;
> +                       if (nofail)
> +                               alloc_gfp |= __GFP_NOFAIL;
>
> -                       /* fall back to the zero order allocations */
> -                       alloc_gfp |= __GFP_NOFAIL;
> -                       order = 0;
> -                       continue;
> +                       /* Fall back to the zero order allocations. */
> +                       if (order || nofail) {
> +                               order = 0;
> +                               continue;
> +                       }
> +
> +                       break;
>                 }
>
>                 /*
> <snip>
>
>
>
> --
> Uladzislau Rezki

I saw that, it seems to be duplicating the same thing as the original fallback
code is (which was originally designed to permit higher order non-__GFP_NOFAIL
allocations before trying order-0 __GFP_NOFAIL).

I don't think it is really useful to change this as it confuses that logic and
duplicates something we already do.

Honestly though moreover I think this whole area needs some refactoring.