lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21ee9aff-a9d5-495c-9e5e-38e9d25b11cd@nvidia.com>
Date: Wed, 30 Oct 2024 17:17:25 -0700
From: John Hubbard <jhubbard@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: David Hildenbrand <david@...hat.com>, Alistair Popple
 <apopple@...dia.com>, Christoph Hellwig <hch@...radead.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
 linux-stable@...r.kernel.org, Vivek Kasireddy <vivek.kasireddy@...el.com>,
 Dave Airlie <airlied@...hat.com>, Gerd Hoffmann <kraxel@...hat.com>,
 Matthew Wilcox <willy@...radead.org>, Peter Xu <peterx@...hat.com>,
 Arnd Bergmann <arnd@...db.de>, Daniel Vetter <daniel.vetter@...ll.ch>,
 Dongwon Kim <dongwon.kim@...el.com>, Hugh Dickins <hughd@...gle.com>,
 Junxiao Chang <junxiao.chang@...el.com>,
 Mike Kravetz <mike.kravetz@...cle.com>, Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH] mm/gup: restore the ability to pin more than 2GB at a
 time

On 10/30/24 5:02 PM, Jason Gunthorpe wrote:
> On Wed, Oct 30, 2024 at 11:34:49AM -0700, John Hubbard wrote:
> 
>>  From a very high level design perspective, it's not yet clear to me
>> that there is either a "preferred" or "not recommended" aspect to
>> pinning in batches vs. all at once here, as long as one stays
>> below the type (int, long, unsigned...) limits of the API. Batching
>> seems like what you do if the internal implementation is crippled
>> and unable to meet its API requirements. So the fact that many
>> callers do batching is sort of "tail wags dog".
> 
> No.. all things need to do batching because nothing should be storing
> a linear struct page array that is so enormous. That is going to
> create vmemap pressure that is not desirable.

Are we talking about the same allocation size here? It's not 2GB. It
is enough folio pointers to cover 2GB of memory, so 4MB.

That's not really much pressure.

> 
> For instance rdma pins in batches and copies the pins into a scatter
> list and never has an allocation over PAGE_SIZE.
> 
> iommufd transfers them into a radix tree.
> 
> It is not so much that there is a limit, but that good kernel code
> just *shouldn't* be allocating gigantic contiguous memory arrays at
> all.

That high level guidance makes sense, but here we are attempting only
a 4MB physically contiguous allocation, and if larger than that, then
it goes to vmalloc() which is merely virtually contiguous.

I'm writing this because your adjectives make me suspect that you
are referring to a 2GB allocation. But this is orders of magnitude
smaller.

thanks,
-- 
John Hubbard


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ