linux-kernel - Re: [RFC v1][PATCH]page_fault retry with NOPAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081201085844.GC4926@wotan.suse.de>
Date:	Mon, 1 Dec 2008 09:58:44 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Török Edwin <edwintorok@...il.com>
Cc:	Mike Waychison <mikew@...gle.com>, Ying Han <yinghan@...gle.com>,
	Ingo Molnar <mingo@...e.hu>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, akpm <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Rohit Seth <rohitseth@...gle.com>,
	Hugh Dickins <hugh@...itas.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY

On Sun, Nov 30, 2008 at 09:54:56PM +0200, Török Edwin wrote:
> On 2008-11-29 01:02, Mike Waychison wrote:
> > Nick Piggin wrote:
> >> On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:
> >>> Nick Piggin wrote:
> >>>> On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:
> >>>>> Török however identified mmap taking on the order of several
> >>>>> milliseconds due to this exact problem:
> >>>>>
> >>>>> http://lkml.org/lkml/2008/9/12/185
> >>>> Turns out to be a different problem.
> >>>>
> >>> What do you mean?
> >>
> >> His is just contending on the write side. The retry patch doesn't help.
> >>
> >
> > I disagree.  How do you get 'write contention' from the following
> > paragraph:
> >
> > "Just to confirm that the problem is with pagefaults and mmap, I dropped
> > the mmap_sem in filemap_fault, and then
> > I got same performance in my testprogram for mmap and read. Of course
> > this is totally unsafe, because the mapping could change at any time."
> >
> > It reads to me that the writers were held off by the readers sleeping
> > in IO.
> 
> It is true that I have a write/write contention too, but do_page_fault
> shows up too on lock_stat.
> 
> This is my guess at what happens:
> * filemap_fault used to sleep with mmap_sem held while waiting for the
> page lock.
> * the google patch avoids that, which is fine: if page lock can't be
> taken, it drops mmap_sem, waits, then retries the fault once
> * however after we acquired the page lock, mapping->a_ops->readpage is
> invoked, mmap_sem is NOT dropped here:
> 
>     error = mapping->a_ops->readpage(file, page);
>     if (!error) {
>         wait_on_page_locked(page);
> 
> If my understanding is correct ->readpage does the actual disk I/O, and
> it keeps the page locked, when the lock is released we know it has finished.
> So wait_on_page_locked(page)  holds mmap_sem locked for read during the
> disk I/O, preventing sys_mmap/sys_munmap from making progress.

Yes that's exactly right. Ahh, the google patch doesn't solve this
case? Interesting...


> I don't know how to prove/disprove my guess above, suggestions welcome.
> 
> Could the patch be changed to also release the mmap_sem after readpage,
> and before wait_on_page_locked?

It should be possible somehow, but it is difficult because after
dropping mmap_sem, then we have to basically retry the whole fault
because the vma might have gone away.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/