[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG48ez2Sn=w=e9AoNC6giMHP=ndRa0aYNn5oO3VeYBFC8Pg60A@mail.gmail.com>
Date: Tue, 26 Nov 2024 21:43:24 +0100
From: Jann Horn <jannh@...gle.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Boqun Feng <boqun.feng@...il.com>, Alice Ryhl <aliceryhl@...gle.com>,
Abdiel Janulgue <abdiel.janulgue@...il.com>, rust-for-linux@...r.kernel.org,
Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <benno.lossin@...ton.me>, Andreas Hindborg <a.hindborg@...nel.org>,
Trevor Gross <tmgross@...ch.edu>, Danilo Krummrich <dakr@...nel.org>,
Wedson Almeida Filho <wedsonaf@...il.com>, Valentin Obst <kernel@...entinobst.de>,
open list <linux-kernel@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>,
"open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>, airlied@...hat.com
Subject: Re: [PATCH v3 0/2] rust: page: Add support for existing struct page mappings
On Tue, Nov 26, 2024 at 9:31 PM Jann Horn <jannh@...gle.com> wrote:
> On Wed, Nov 20, 2024 at 6:02 PM Matthew Wilcox <willy@...radead.org> wrote:
> > On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> > > On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > > > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@...radead.org> wrote:
> > > > > We don't have a fully formed destination yet, so I can't give you a
> > > > > definite answer to a lot of questions. Obviously I don't want to hold
> > > > > up the Rust project in any way, but I need to know that what we're trying
> > > > > to do will be expressible in Rust.
> > > > >
> > > > > Can we avoid referring to a page's refcount?
> > > >
> > > > I don't think this patch needs the refcount at all, and the previous
> > > > version did not expose it. This came out of the advice to use put_page
> > > > over free_page. Does this mean that we should switch to put_page but
> > > > not use get_page?
> >
> > Did I advise using put_page() over free_page()? I hope I didn't say
> > that. I don't see a reason why binder needs to refcount its pages (nor
> > use a mapcount on them), but I don't fully understand binder so maybe
> > it does need a refcount.
>
> I think that was me, at
> <https://lore.kernel.org/all/CAG48ez32zWt4mcfA+y2FnzzNmFe-0ns9XQgp=QYeFpRsdiCAnw@mail.gmail.com/>.
> Looking at the C binder version, binder_install_single_page() installs
> pages into userspace page tables in a VM_MIXEDMAP mapping using
> vm_insert_page(), and when you do that with pages from the page
> allocator, userspace can grab references to them through GUP-fast (and
> I think also through GUP). (See how vm_insert_page() and
> vm_get_page_prot() don't use pte_mkspecial(), which is pretty much the
> only thing that can stop GUP-fast on most architectures.)
>
> My understanding is that the combination VM_IO|VM_MIXEDMAP would stop
> normal GUP, but currently the only way to block GUP-fast is to use
> VM_PFNMAP. (Which, as far as I understand, is also why GPU drivers use
> VM_PFNMAP so much.) Maybe we should change that, so that VM_IO and/or
> VM_MIXEDMAP blocks GUP in the region and causes installed PTEs to be
> marked with pte_mkspecial()?
>
> I am not entirely sure about this stuff, but I was recently looking at
> net/packet/af_packet.c, and I tested that vmsplice() can grab
> references to the high-order compound pages that
> alloc_one_pg_vec_page() allocates with __get_free_pages(GFP_KERNEL |
> __GFP_COMP | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY, order),
> packet_mmap() inserts with vm_insert_page(), and free_pg_vec() drops
> with free_pages(). (But that all happens to actually work fine,
> free_pages() actually handles refcounted compound pages properly.)
And also, the C binder driver wants to free pages in its shrinker
callback, but those pages might still be mapped into userspace. Binder
tries to zap such userspace mappings, but it does that by absolute
virtual address instead of going through the rmap (see
binder_alloc_free_page()), so it will miss page mappings in VMAs that
have been mremap()'d (though legitimate userspace never does that with
binder VMAs) or are concurrently being torn down by munmap(); so
currently the thing that keeps this from falling apart is that if page
mappings are left over somewhere, the page refcount ensures that this
userspace-mapped page doesn't get freed.
(I think the C binder code does its job, but is not exactly a great
model for how to write a clean driver that integrates nicely with the
rest of the kernel.)
Powered by blists - more mailing lists