linux-kernel - Re: [RFC v1][PATCH]page_fault retry with NOPAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081127101436.GI28285@wotan.suse.de>
Date:	Thu, 27 Nov 2008 11:14:36 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Mike Waychison <mikew@...gle.com>, Ying Han <yinghan@...gle.com>,
	Ingo Molnar <mingo@...e.hu>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, akpm <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Rohit Seth <rohitseth@...gle.com>,
	Hugh Dickins <hugh@...itas.com>,
	"H. Peter Anvin" <hpa@...or.com>, edwintorok@...il.com
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY

On Thu, Nov 27, 2008 at 11:00:07AM +0100, Peter Zijlstra wrote:
> On Thu, 2008-11-27 at 01:28 -0800, Mike Waychison wrote:
> 
> > Correct.  I don't recall the numbers from the pathelogical cases we were 
> > seeing, but iirc, it was on the order of 10s of seconds, likely 
> > exascerbated by slower than usual disks.  I've been digging through my 
> > inbox to find numbers without much success -- we've been using a variant 
> > of this patch since 2.6.11.
> 
> > We generally try to avoid such things, but sometimes it a) can't be 
> > easily avoided (third party libraries for instance) and b) when it hits 
> > us, it affects the overall health of the machine/cluster (the monitoring 
> > daemons get blocked, which isn't very healthy).
> 
> If its only monitoring, there might be another solution. If you can keep
> the required data in a separate (approximate) copy so that you don't
> need mmap_sem at all to show them.
> 
> If your mmap_sem is so contended your latencies are unacceptable, adding
> more users to it - even statistics gathering, just isn't going to cure
> the situation.
> 
> Furthermore, /proc code usually isn't written with performance in mind,
> so its usually simple and robust code. Adding it to a 'hot'-path like
> you're doing doesn't seem advisable.
> 
> Also, releasing and re-acquiring mmap_sem can significantly add to the
> cacheline bouncing that thing already has.

Yes, it would be nice to reduce mmap_sem load regardless of any other
fixes or problems. I guess they're not very worried about cacheline
bouncing but more about hold time (how many sockets in these systems?
4 at most?)

I guess it is the pagemap stuff that they use most heavily?

pagemap_read looks like it can use get_user_pages_fast. The smaps and
clear_refs stuff might have been nicer if they could work on ranges
like pagemap. Then they could avoid mmap_sem as well (although maps
would need to be sampled and take mmap_sem I guess).

One problem with dropping mmap_sem is that it hurts priority/fairness.
And it opens a bit of a (maybe theoretical but not something to completely
ignore) forward progress hole AFAIKS. If mmap_sem is very heavily
contended, then the refault is going to take a while to get through,
and then the page might get reclaimed etc).



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/