[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140218235714.GA16064@node.dhcp.inet.fi>
Date: Wed, 19 Feb 2014 01:57:14 +0200
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
Andi Kleen <ak@...ux.intel.com>,
Matthew Wilcox <matthew.r.wilcox@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Dave Chinner <david@...morbit.com>,
linux-mm <linux-mm@...ck.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if
they are in page cache
On Tue, Feb 18, 2014 at 10:28:11AM -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 10:07 AM, Kirill A. Shutemov
> <kirill.shutemov@...ux.intel.com> wrote:
> >
> > Patch is wrong. Correct one is below.
>
> Hmm. I don't hate this. Looking through it, it's fairly simple
> conceptually, and the code isn't that complex either. I can live with
> this.
>
> I think it's a bit odd how you pass both "max_pgoff" and "nr_pages" to
> the fault-around function, though. In fact, I'd consider that a bug.
> Passing in "FAULT_AROUND_PAGES" is just wrong, since the code cannot -
> and in fact *must* not - actually fault in that many pages, since the
> starting/ending address can be limited by other things.
>
> So I think that part of the code is bogus. You need to remove
> nr_pages, because any use of it is just incorrect. I don't think it
> can actually matter, since the max_pgoff checks are more restrictive,
> but if you think it can matter please explain how and why it wouldn't
> be a major bug?
I don't like this too...
Current max_pgoff is end of page table (or end of vma, if it ends before).
If we drop nr_pages but keep current max_pgoff, we will potentially setup
PTRS_PER_PTE pages a time: i.e. page fault to first page of page table and
all pages are ready. nr_pages limits the number.
It's not necessary bad idea to populate whole page table at once. I need
to measure how much latency we will add by doing that.
The only problem I see is that we take ptl for a bit too long. But with
split ptl it will affect only page table we populate.
Other approach is too limit ourself to FAULT_AROUND_PAGES from start_addr.
In this case sometimes we will do useless radix-tree lookup even if we had
chance to populated pages further in the page table.
> Apart from that, I'd really like to see numbers for different ranges
> of FAULT_AROUND_ORDER, because I think 5 is pretty high, but on the
> whole I don't find this horrible, and you still lock the page so it
> doesn't involve any new rules. I'm not hugely happy with another raw
> radix-tree user, but it's not horrible.
>
> Btw, is the "radix_tree_deref_retry(page) -> goto restart" really
> necessary? I'd be almost more inclined to just make it just do a
> "break;" to break out of the loop and stop doing anything clever at
> all.
The code has not ready yet. I'll rework it. It just what I had by the end
of the day. I wanted to know if setup pte directly from ->fault_nonblock()
is okayish approach or considered layering violation.
--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists