linux-kernel - Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210519195214.GA2343@pc638.lan>
Date:   Wed, 19 May 2021 21:52:14 +0200
From:   Uladzislau Rezki <urezki@...il.com>
To:     Mel Gorman <mgorman@...e.de>, Christoph Hellwig <hch@...radead.org>
Cc:     Uladzislau Rezki <urezki@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Hillf Danton <hdanton@...a.com>,
        Michal Hocko <mhocko@...e.com>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in
 __vmalloc_area_node()

> On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > > +	/*
> > > > +	 * If not enough pages were obtained to accomplish an
> > > > +	 * allocation request, free them via __vfree() if any.
> > > > +	 */
> > > > +	if (area->nr_pages != nr_small_pages) {
> > > > +		warn_alloc(gfp_mask, NULL,
> > > > +			"vmalloc size %lu allocation failure: "
> > > > +			"page order %u allocation failed",
> > > > +			area->nr_pages * PAGE_SIZE, page_order);
> > > > +		goto fail;
> > > > +	}
> > > 
> > > From reading __alloc_pages_bulk not allocating all pages is something
> > > that cn happen fairly easily.  Shouldn't we try to allocate the missing
> > > pages manually and/ore retry here?
> > > 
> >
> > It is a good point. The bulk-allocator, as i see, only tries to access
> > to pcp-list and falls-back to a single allocator once it fails, so the
> > array may not be fully populated.
> > 
> 
> Partially correct. It does allocate via the pcp-list but the pcp-list will
> be refilled if it's empty so if the bulk allocator returns fewer pages
> than requested, it may be due to hitting watermarks or the local zone is
> depleted. It does not take any special action to correct the situation
> or stall e.g.  wake kswapd, enter direct reclaim, allocate from a remote
> node etc.
> 
> If no pages were allocated, it'll try allocate at least one page via a
> single allocation request in case the bulk failure would push the zone
> over the watermark but 1 page does not. That path as a side-effect would
> also wake kswapd.
> 
OK. A single page allocator can enter a slow path i mean direct reclaim,
etc to adjust watermarks.

> > In that case probably it makes sense to manually populate it using
> > single page allocator.
> > 
> > Mel, could you please also comment on it?
> > 
> 
> It is by design because it's unknown if callers can recover or if so,
> how they want to recover and the primary intent behind the bulk allocator
> was speed. In the case of network, it only wants some pages quickly so as
> long as it gets 1, it makes progress. For the sunrpc user, it's willing
> to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
> should be as I do not have a good handle on workloads that are sensitive
> to vmalloc performance. The obvious option would be to loop and allocate
> single pages with alloc_pages_node understanding that the additional
> pages may take longer to allocate.
> 
I got it. At least we should fall-back for a single allocator, that is how
we used to allocate before(now it is for high-order pages). If it also fails
to obtain a page we are done.

Basically a single-page allocator is more permissive so it is a higher
chance to success. Therefore a fallback to it makes sense.

Thanks.

--
Vlad Rezki