linux-kernel - Re: [PATCH] mm/gup: restore the ability to pin more than 2GB at a time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241031002552.GB10193@nvidia.com>
Date: Wed, 30 Oct 2024 21:25:52 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: John Hubbard <jhubbard@...dia.com>
Cc: David Hildenbrand <david@...hat.com>,
	Alistair Popple <apopple@...dia.com>,
	Christoph Hellwig <hch@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
	linux-stable@...r.kernel.org,
	Vivek Kasireddy <vivek.kasireddy@...el.com>,
	Dave Airlie <airlied@...hat.com>, Gerd Hoffmann <kraxel@...hat.com>,
	Matthew Wilcox <willy@...radead.org>, Peter Xu <peterx@...hat.com>,
	Arnd Bergmann <arnd@...db.de>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	Dongwon Kim <dongwon.kim@...el.com>,
	Hugh Dickins <hughd@...gle.com>,
	Junxiao Chang <junxiao.chang@...el.com>,
	Mike Kravetz <mike.kravetz@...cle.com>,
	Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH] mm/gup: restore the ability to pin more than 2GB at a
 time

On Wed, Oct 30, 2024 at 05:17:25PM -0700, John Hubbard wrote:
> On 10/30/24 5:02 PM, Jason Gunthorpe wrote:
> > On Wed, Oct 30, 2024 at 11:34:49AM -0700, John Hubbard wrote:
> > 
> > >  From a very high level design perspective, it's not yet clear to me
> > > that there is either a "preferred" or "not recommended" aspect to
> > > pinning in batches vs. all at once here, as long as one stays
> > > below the type (int, long, unsigned...) limits of the API. Batching
> > > seems like what you do if the internal implementation is crippled
> > > and unable to meet its API requirements. So the fact that many
> > > callers do batching is sort of "tail wags dog".
> > 
> > No.. all things need to do batching because nothing should be storing
> > a linear struct page array that is so enormous. That is going to
> > create vmemap pressure that is not desirable.
> 
> Are we talking about the same allocation size here? It's not 2GB. It
> is enough folio pointers to cover 2GB of memory, so 4MB.

Is 2GB a hard limit? I was expecting this was a range that had upper
bounds of 100GB's like for rdma.. Then it is 400MB, and yeah, that is
not great.

> That high level guidance makes sense, but here we are attempting only
> a 4MB physically contiguous allocation, and if larger than that, then
> it goes to vmalloc() which is merely virtually contiguous.

AFAIK any contiguous allocation beyond 4K basically doesn't work
reliably in a server environment due to fragmentation.

So you are always using the vmemap..

> I'm writing this because your adjectives make me suspect that you
> are referring to a 2GB allocation. But this is orders of magnitude
> smaller.

Even 4MB I would wonder about getting it split to PAGE_SIZE chunks
instead of vmemmap, but I don't know what it is being used for.

Jason