linux-kernel - Re: [PATCH 19/21] binder: perform page allocation outside of locks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZWmNpxPXZSxdmDE1@google.com>
Date:   Fri, 1 Dec 2023 07:39:19 +0000
From:   Carlos Llamas <cmllamas@...gle.com>
To:     Alice Ryhl <aliceryhl@...gle.com>
Cc:     Arve Hjønnevåg <arve@...roid.com>,
        Christian Brauner <brauner@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        kernel-team@...roid.com, linux-kernel@...r.kernel.org,
        Martijn Coenen <maco@...roid.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Todd Kjos <tkjos@...roid.com>
Subject: Re: [PATCH 19/21] binder: perform page allocation outside of locks

On Tue, Nov 07, 2023 at 09:08:43AM +0000, Alice Ryhl wrote:
> I would really like a comment on each function explaining that:
> 
>  * The binder_allocate_page_range function ensures that existing pages
>    will not be reclaimed by the shrinker.
>  * The binder_get_page_range function ensures that missing pages are
>    allocated and inserted.

Ok, I think I rather go for a better naming than compensating through
comments, so I came up with the following names:
 - binder_lru_freelist_{add,del}()
 - binder_install_buffer_pages()

There will be more details in the v2. The new names give a clear
separation of the scope of these function.

> >  	mmap_write_lock(alloc->mm);
> > +	if (lru_page->page_ptr)
> > +		goto out;
> 
> Another comment that I'd like to see somewhere is one that says
> something along these lines:
> 
>     Multiple processes may call `binder_get_user_page_remote` on the
>     same page in parallel. When this happens, one of them will allocate
>     the page and insert it, and the other process will use the mmap
>     write lock to wait for the insertion to complete. This means that we
>     can't use a mmap read lock here.
> 

I've added a shorter version of this to v2, thanks.

> > +	/* mark page insertion complete and safe to acquire */
> > +	smp_store_release(&lru_page->page_ptr, page);
> > [snip]
> > +		/* check if page insertion is marked complete by release */
> > +		if (smp_load_acquire(&page->page_ptr))
> > +			continue;
> 
> We already discussed this when I asked you to make this an acquire /
> release operation so that it isn't racy, but it could use a comment
> explaining its purpose.

I've wrapped these calls into inline functions with better names in v2.
The purpose should now be evident.

> 
> >  	mmap_write_lock(alloc->mm);
> > +	if (lru_page->page_ptr)
> > +		goto out;
> > +
> >  	if (!alloc->vma) {
> >  		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> >  		ret = -ESRCH;
> >  		goto out;
> >  	}
> >  
> >  	page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> >  	if (!page) {
> >  		pr_err("%d: failed to allocate page\n", alloc->pid);
> >  		ret = -ENOMEM;
> >  		goto out;
> >  	}
> 
> Maybe it would be worth to allocate the page before taking the mmap
> write lock? It has the disadvantage that you may have to immediately
> deallocate it if we trigger the `if (lru_page->page_ptr) goto out`
> branch, but that shouldn't happen that often, and it would reduce the
> amount of time we spend holding the mmap write lock.

If we sleep on alloc_page() then chances are that having other tasks
allocating more pages could create more memory pressure. In some cases
this would be unecessary (e.g. if it's the same page). I do think this
could happen often since buffer requests tend to be < PAGE_SIZE and
adjecent too. I'll look into this with more detail and send a follow up
patch if needed. Thanks!

--
Carlos Llamas