[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200824143010.GG24877@quack2.suse.cz>
Date: Mon, 24 Aug 2020 16:30:10 +0200
From: Jan Kara <jack@...e.cz>
To: Kirill Tkhai <ktkhai@...tuozzo.com>
Cc: Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
"Maya B . Gokhale" <gokhale2@...l.gov>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Marty Mcfadden <mcfadden8@...l.gov>,
Kirill Shutemov <kirill@...temov.name>,
Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
Jan Kara <jack@...e.cz>,
Andrea Arcangeli <aarcange@...hat.com>,
Christoph Hellwig <hch@....de>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
On Mon 24-08-20 11:36:22, Kirill Tkhai wrote:
> On 22.08.2020 02:49, Peter Xu wrote:
> > From: Linus Torvalds <torvalds@...ux-foundation.org>
> >
> > How about we just make sure we're the only possible valid user fo the
> > page before we bother to reuse it?
> >
> > Simplify, simplify, simplify.
> >
> > And get rid of the nasty serialization on the page lock at the same time.
> >
> > Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> > [peterx: add subject prefix]
> > Signed-off-by: Peter Xu <peterx@...hat.com>
> > ---
> > mm/memory.c | 59 +++++++++++++++--------------------------------------
> > 1 file changed, 17 insertions(+), 42 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 602f4283122f..cb9006189d22 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2927,50 +2927,25 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
> > * not dirty accountable.
> > */
> > if (PageAnon(vmf->page)) {
> > - int total_map_swapcount;
> > - if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) ||
> > - page_count(vmf->page) != 1))
> > + struct page *page = vmf->page;
> > +
> > + /* PageKsm() doesn't necessarily raise the page refcount */
>
> No, this is wrong. PageKSM() always raises refcount.
OK, then I'm confused. The comment before get_ksm_page() states:
* get_ksm_page: checks if the page indicated by the stable node
* is still its ksm page, despite having held no reference to it.
* In which case we can trust the content of the page, and it
* returns the gotten page; but if the page has now been zapped,
* remove the stale node from the stable tree and return NULL.
...
* You would expect the stable_node to hold a reference to the ksm page.
* But if it increments the page's count, swapping out has to wait for
* ksmd to come around again before it can free the page, which may take
* seconds or even minutes: much too unresponsive. So instead we use a
* "keyhole reference": access to the ksm page from the stable node peeps
* out through its keyhole to see if that page still holds the right key,
* pointing back to this stable node.
So this all seems to indicate that KSM doesn't hold a proper page reference
and relies on anyone making page writeable to change page->mapping so that
KSM notices this and doesn't use the page anymore... Am I missing
something?
> There was another
> problem: KSM may raise refcount without lock_page(), and only then it
> takes the lock. See get_ksm_page(GET_KSM_PAGE_NOLOCK) for the details.
>
> So, reliable protection against parallel access requires to freeze page
> counter, which is made in reuse_ksm_page().
OK, this as well.
Honza
>
> > + if (PageKsm(page) || page_count(page) != 1)
> > + goto copy;
> > + if (!trylock_page(page))
> > + goto copy;
> > + if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
> > + unlock_page(page);
> > goto copy;
> > - if (!trylock_page(vmf->page)) {
> > - get_page(vmf->page);
> > - pte_unmap_unlock(vmf->pte, vmf->ptl);
> > - lock_page(vmf->page);
> > - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > - vmf->address, &vmf->ptl);
> > - if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > - update_mmu_tlb(vma, vmf->address, vmf->pte);
> > - unlock_page(vmf->page);
> > - pte_unmap_unlock(vmf->pte, vmf->ptl);
> > - put_page(vmf->page);
> > - return 0;
> > - }
> > - put_page(vmf->page);
> > - }
> > - if (PageKsm(vmf->page)) {
> > - bool reused = reuse_ksm_page(vmf->page, vmf->vma,
> > - vmf->address);
> > - unlock_page(vmf->page);
> > - if (!reused)
> > - goto copy;
> > - wp_page_reuse(vmf);
> > - return VM_FAULT_WRITE;
> > - }
> > - if (reuse_swap_page(vmf->page, &total_map_swapcount)) {
> > - if (total_map_swapcount == 1) {
> > - /*
> > - * The page is all ours. Move it to
> > - * our anon_vma so the rmap code will
> > - * not search our parent or siblings.
> > - * Protected against the rmap code by
> > - * the page lock.
> > - */
> > - page_move_anon_rmap(vmf->page, vma);
> > - }
> > - unlock_page(vmf->page);
> > - wp_page_reuse(vmf);
> > - return VM_FAULT_WRITE;
> > }
> > - unlock_page(vmf->page);
> > + /*
> > + * Ok, we've got the only map reference, and the only
> > + * page count reference, and the page is locked,
> > + * it's dark out, and we're wearing sunglasses. Hit it.
> > + */
> > + wp_page_reuse(vmf);
> > + unlock_page(page);
> > + return VM_FAULT_WRITE;
> > } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
> > (VM_WRITE|VM_SHARED))) {
> > return wp_page_shared(vmf);
> >
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists