[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9005b167-db08-c967-463b-5e0e092cbb6c@suse.cz>
Date: Thu, 14 Apr 2022 19:15:02 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Yang Shi <shy828301@...il.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Jann Horn <jannh@...gle.com>, Michal Hocko <mhocko@...nel.org>,
Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
Roman Gushchin <guro@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Christoph Hellwig <hch@....de>,
Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
Liang Zhang <zhangliang5@...wei.com>,
Pedro Gomes <pedrodemargomes@...il.com>,
Oded Gabbay <oded.gabbay@...il.com>, linux-mm@...ck.org
Subject: Re: [PATCH v3 14/16] mm: support GUP-triggered unsharing of anonymous
pages
On 3/29/22 18:04, David Hildenbrand wrote:
> Whenever GUP currently ends up taking a R/O pin on an anonymous page that
> might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault
> on the page table entry will end up replacing the mapped anonymous page
> due to COW, resulting in the GUP pin no longer being consistent with the
> page actually mapped into the page table.
>
> The possible ways to deal with this situation are:
> (1) Ignore and pin -- what we do right now.
> (2) Fail to pin -- which would be rather surprising to callers and
> could break user space.
> (3) Trigger unsharing and pin the now exclusive page -- reliable R/O
> pins.
>
> We want to implement 3) because it provides the clearest semantics and
> allows for checking in unpin_user_pages() and friends for possible BUGs:
> when trying to unpin a page that's no longer exclusive, clearly
> something went very wrong and might result in memory corruptions that
> might be hard to debug. So we better have a nice way to spot such
> issues.
>
> To implement 3), we need a way for GUP to trigger unsharing:
> FAULT_FLAG_UNSHARE. FAULT_FLAG_UNSHARE is only applicable to R/O mapped
> anonymous pages and resembles COW logic during a write fault. However, in
> contrast to a write fault, GUP-triggered unsharing will, for example, still
> maintain the write protection.
>
> Let's implement FAULT_FLAG_UNSHARE by hooking into the existing write fault
> handlers for all applicable anonymous page types: ordinary pages, THP and
> hugetlb.
>
> * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that has been
> marked exclusive in the meantime by someone else, there is nothing to do.
> * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that's not
> marked exclusive, it will try detecting if the process is the exclusive
> owner. If exclusive, it can be set exclusive similar to reuse logic
> during write faults via page_move_anon_rmap() and there is nothing
> else to do; otherwise, we either have to copy and map a fresh,
> anonymous exclusive page R/O (ordinary pages, hugetlb), or split the
> THP.
>
> This commit is heavily based on patches by Andrea.
>
> Co-developed-by: Andrea Arcangeli <aarcange@...hat.com>
> Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
> Signed-off-by: David Hildenbrand <david@...hat.com>
Acked-by: Vlastimil Babka <vbabka@...e.cz>
Modulo a nit and suspected logical bug below.
<snip>
> @@ -3072,6 +3082,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> * mmu page tables (such as kvm shadow page tables), we want the
> * new page to be mapped directly into the secondary page table.
> */
> + BUG_ON(unshare && pte_write(entry));
> set_pte_at_notify(mm, vmf->address, vmf->pte, entry);
> update_mmu_cache(vma, vmf->address, vmf->pte);
> if (old_page) {
> @@ -3121,7 +3132,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> free_swap_cache(old_page);
> put_page(old_page);
> }
> - return page_copied ? VM_FAULT_WRITE : 0;
> + return page_copied && !unshare ? VM_FAULT_WRITE : 0;
Could be just me but I would prefer (page_copied && !unshare) as I rarely
see these operators together like this to remember their relative priority
very well.
> oom_free_new:
> put_page(new_page);
> oom:
<snip>
> @@ -4515,8 +4550,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
> /* `inline' is required to avoid gcc 4.1.2 build error */
> static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
> {
> + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE;
> +
> if (vma_is_anonymous(vmf->vma)) {
> - if (userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd))
> + if (unlikely(unshare) &&
Is this condition flipped, should it be "likely(!unshare)"? As the similar
code in do_wp_page() does.
> + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd))
> return handle_userfault(vmf, VM_UFFD_WP);
> return do_huge_pmd_wp_page(vmf);
> }
> @@ -4651,10 +4689,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> update_mmu_tlb(vmf->vma, vmf->address, vmf->pte);
> goto unlock;
> }
> - if (vmf->flags & FAULT_FLAG_WRITE) {
> + if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
> if (!pte_write(entry))
> return do_wp_page(vmf);
> - entry = pte_mkdirty(entry);
> + else if (likely(vmf->flags & FAULT_FLAG_WRITE))
> + entry = pte_mkdirty(entry);
> }
> entry = pte_mkyoung(entry);
> if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
Powered by blists - more mailing lists