[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <100D68C7BA14664A8938383216E40DE04062DEA1@FMSMSX114.amr.corp.intel.com>
Date: Tue, 18 Feb 2014 14:15:59 +0000
From: "Wilcox, Matthew R" <matthew.r.wilcox@...el.com>
To: Rik van Riel <riel@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>, Andi Kleen <ak@...ux.intel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Dave Chinner <david@...morbit.com>,
linux-mm <linux-mm@...ck.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: [RFC, PATCHv2 0/2] mm: map few pages around fault address if
they are in page cache
We don't really need to lock all the pages being returned to protect against truncate. We only need to lock the one at the highest index, and check i_size while that lock is held since truncate_inode_pages_range() will block on any page that is locked.
We're still vulnerable to holepunches, but there's no locking currently between holepunches and truncate, so we're no worse off now.
________________________________________
From: Rik van Riel [riel@...hat.com]
Sent: February 18, 2014 5:28 AM
To: Linus Torvalds; Kirill A. Shutemov
Cc: Andrew Morton; Mel Gorman; Andi Kleen; Wilcox, Matthew R; Dave Hansen; Alexander Viro; Dave Chinner; linux-mm; linux-fsdevel; Linux Kernel Mailing List
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
On 02/17/2014 02:01 PM, Linus Torvalds wrote:
> - increment the page _mapcount (iow, do "page_add_file_rmap()"
> early). This guarantees that any *subsequent* unmap activity on this
> page will walk the file mapping lists, and become serialized by the
> page table lock we hold.
>
> - mb_after_atomic_inc() (this is generally free)
>
> - test that the page is still unlocked and uptodate, and the page
> mapping still points to our page.
>
> - if that is true, we're all good, we can use the page, otherwise we
> decrement the mapcount (page_remove_rmap()) and skip the page.
>
> Hmm? Doing something like this means that we would never lock the
> pages we prefault, and you can go back to your gang lookup rather than
> that "one page at a time". And the race case is basically never going
> to trigger.
>
> Comments?
What would the direct io code do when it runs into a page with
elevated mapcount, but for which a mapping cannot be found yet?
Looking at the code, it looks like the above scheme could cause
some trouble with invalidate_inode_pages2_range(), which has
the following sequence:
if (page_mapped(page)) {
... unmap page
}
BUG_ON(page_mapped(page));
In other words, it looks like incrementing _mapcount first could
lead to an oops in the truncate and direct IO code.
The page lock is used to prevent such races.
*sigh*
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists