[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100413192259.D113.A69D9226@jp.fujitsu.com>
Date: Tue, 13 Apr 2010 19:36:51 +0900 (JST)
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: kosaki.motohiro@...fujitsu.com, Rik van Riel <riel@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Minchan Kim <minchan.kim@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Nick Piggin <npiggin@...e.de>,
Andrea Arcangeli <aarcange@...hat.com>,
Hugh Dickins <hugh.dickins@...cali.co.uk>,
sgunderson@...foot.com
Subject: Re: [PATCH -v2] rmap: make anon_vma_prepare link in all the anon_vmas of a mergeable VMA
Hi Linus,
> On Sun, 11 Apr 2010, Rik van Riel wrote:
> >
> > Another thing I just thought of.
> >
> > The anon_vma struct will not be reused for something completely
> > different due to the SLAB_DESTROY_BY_RCU flag that the anon_vma_cachep
> > is created with.
>
> Rik, we _know_ it got re-used by something totally different. That's
> clearly the problem. The page->mapping pointer does _not_ point to an
> anon_vma any more. That's the problem here.
>
> What we need to figure out is how we have a page on the LRU list that is
> still marked as 'mapped' that has that stale mapping pointer.
>
> I can easily see how the stale mapping pointer happens for a non-mapped
> page. That part is trivial. Here's a simple case:
>
> - vmscan does that whole "isolate LRU pages", and one of them is a (at
> that time mapped) anonymous page. It's now not on any LRU lists at all.
>
> - vmscan ends up waiting for pageout and/or writeback while holding that
> list of pages.
>
> - in the meantime, the process that had the page exists or unmaps,
> unmapping the page and freeing the vma and the anon_vma.
>
> - vmscan eventually gets to the page, and does that page_referenced()
> dance. page->mapping points to something that is long long gone (as in
> "IO access lifetimes", so we're talking something that has been freed
> literally milliseconds ago, rather than any RCU delays)
>
> So I can see the stale page->mapping pointer happening. That part is even
> trivial. What I don't see is how the page would be still marked 'mapped'.
> Everything that actually free's the vma/anon_vmas should also have
> unmapped the page before that - even if it didn't _free_ the page.
Sorry, Now I'm lost what discuss in this crazy long thread.
IIUC, If the page->mapping was freed millisecns ago, following (1)
check returen false and we never touch page->mapping literally.
Am I missing something?
===================================================================
struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma;
unsigned long anon_mapping;
rcu_read_lock();
anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
goto out;
if (!page_mapped(page)) /* (1) here */
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
spin_lock(&anon_vma->lock);
return anon_vma;
out:
rcu_read_unlock();
return NULL;
}
=================================================
And, I think your following patch seems incorrect.
The added page_mapped() is called after spinlock(anon_vma->lock),
it mean check-after-dereference. such check doesn't prevent invalid
pointer dereference, I think.
perhaps, I'm missing anything. I have to reread this thread at all from
first.
---
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -302,7 +302,11 @@ struct anon_vma *page_lock_anon_vma(struct page *page)
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
spin_lock(&anon_vma->lock);
- return anon_vma;
+
+ if (page_mapped(page))
+ return anon_vma;
+
+ spin_unlock(&anon_vma->lock);
out:
rcu_read_unlock();
return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists