linux-kernel - Re: [RFC v1][PATCH]page_fault retry with NOPAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49307893.4030708@google.com>
Date:	Fri, 28 Nov 2008 15:02:43 -0800
From:	Mike Waychison <mikew@...gle.com>
To:	Nick Piggin <npiggin@...e.de>
CC:	Ying Han <yinghan@...gle.com>, Ingo Molnar <mingo@...e.hu>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	akpm <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Rohit Seth <rohitseth@...gle.com>,
	Hugh Dickins <hugh@...itas.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"H. Peter Anvin" <hpa@...or.com>, edwintorok@...il.com
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY

Nick Piggin wrote:
> On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote:
>> Nick Piggin wrote:
>>> On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote:
>>>> Török however identified mmap taking on the order of several 
>>>> milliseconds due to this exact problem:
>>>>
>>>> http://lkml.org/lkml/2008/9/12/185
>>> Turns out to be a different problem.
>>>
>> What do you mean?
> 
> His is just contending on the write side. The retry patch doesn't help.
> 

I disagree.  How do you get 'write contention' from the following paragraph:

"Just to confirm that the problem is with pagefaults and mmap, I dropped
the mmap_sem in filemap_fault, and then
I got same performance in my testprogram for mmap and read. Of course
this is totally unsafe, because the mapping could change at any time."

It reads to me that the writers were held off by the readers sleeping in IO.

> 
>>>> We generally try to avoid such things, but sometimes it a) can't be 
>>>> easily avoided (third party libraries for instance) and b) when it hits 
>>>> us, it affects the overall health of the machine/cluster (the monitoring 
>>>> daemons get blocked, which isn't very healthy).
>>> Are you doing appropriate posix_fadvise to prefetch in the files before
>>> faulting, and madvise hints if appropriate?
>>>
>> Yes, we've been slowly rolling out fadvise hints out, though not to 
>> prefetch, and definitely not for faulting.  I don't see how issuing a 
>> prefetch right before we try to fault in a page is going to help 
>> matters.  The pages may appear in pagecache, but they won't be uptodate 
>> by the time we look at them anyway, so we're back to square one.
> 
> The whole point of a prefetch is to issue it sufficiently early so
> it makes a difference. Actually if you can tell quite well where the
> major faults will be, but don't know it sufficiently in advance to
> do very good prefetching, then perhaps we could add a new madvise hint
> to synchronously bring the page in (dropping the mmap_sem over the IO).
> 

Or we could just fix the faulting code to drop the mmap_sem for us?  I'm 
not sure a new madvise flag could help with the 'starvation hole' issue 
you brought up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/