[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241031002552.GB10193@nvidia.com>
Date: Wed, 30 Oct 2024 21:25:52 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: John Hubbard <jhubbard@...dia.com>
Cc: David Hildenbrand <david@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Christoph Hellwig <hch@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
linux-stable@...r.kernel.org,
Vivek Kasireddy <vivek.kasireddy@...el.com>,
Dave Airlie <airlied@...hat.com>, Gerd Hoffmann <kraxel@...hat.com>,
Matthew Wilcox <willy@...radead.org>, Peter Xu <peterx@...hat.com>,
Arnd Bergmann <arnd@...db.de>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Dongwon Kim <dongwon.kim@...el.com>,
Hugh Dickins <hughd@...gle.com>,
Junxiao Chang <junxiao.chang@...el.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH] mm/gup: restore the ability to pin more than 2GB at a
time
On Wed, Oct 30, 2024 at 05:17:25PM -0700, John Hubbard wrote:
> On 10/30/24 5:02 PM, Jason Gunthorpe wrote:
> > On Wed, Oct 30, 2024 at 11:34:49AM -0700, John Hubbard wrote:
> >
> > > From a very high level design perspective, it's not yet clear to me
> > > that there is either a "preferred" or "not recommended" aspect to
> > > pinning in batches vs. all at once here, as long as one stays
> > > below the type (int, long, unsigned...) limits of the API. Batching
> > > seems like what you do if the internal implementation is crippled
> > > and unable to meet its API requirements. So the fact that many
> > > callers do batching is sort of "tail wags dog".
> >
> > No.. all things need to do batching because nothing should be storing
> > a linear struct page array that is so enormous. That is going to
> > create vmemap pressure that is not desirable.
>
> Are we talking about the same allocation size here? It's not 2GB. It
> is enough folio pointers to cover 2GB of memory, so 4MB.
Is 2GB a hard limit? I was expecting this was a range that had upper
bounds of 100GB's like for rdma.. Then it is 400MB, and yeah, that is
not great.
> That high level guidance makes sense, but here we are attempting only
> a 4MB physically contiguous allocation, and if larger than that, then
> it goes to vmalloc() which is merely virtually contiguous.
AFAIK any contiguous allocation beyond 4K basically doesn't work
reliably in a server environment due to fragmentation.
So you are always using the vmemap..
> I'm writing this because your adjectives make me suspect that you
> are referring to a 2GB allocation. But this is orders of magnitude
> smaller.
Even 4MB I would wonder about getting it split to PAGE_SIZE chunks
instead of vmemmap, but I don't know what it is being used for.
Jason
Powered by blists - more mailing lists