linux-kernel - Re: [RFC][PATCH 6/8] mm: handle_speculative

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.1001080911340.7821@localhost.localdomain>
Date:	Fri, 8 Jan 2010 09:22:14 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Minchan Kim <minchan.kim@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, cl@...ux-foundation.org,
	"hugh.dickins" <hugh.dickins@...cali.co.uk>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()

On Fri, 8 Jan 2010, Peter Zijlstra wrote:

> On Tue, 2010-01-05 at 20:20 -0800, Linus Torvalds wrote:
> > 
> > Yeah, I should have looked more at your callchain. That's nasty. Much 
> > worse than the per-mm lock. I thought the page buffering would avoid the 
> > zone lock becoming a huge problem, but clearly not in this case.
> 
> Right, so I ran some numbers on a multi-socket (2) machine as well:
> 
>                                pf/min
> 
> -tip                          56398626
> -tip + xadd                  174753190
> -tip + speculative           189274319
> -tip + xadd + speculative    200174641
> 
> [ variance is around 0.5% for this workload, ran most of these numbers
> with --repeat 5 ]

That's a huge jump. It's clear that the spinlock-based rwsem's simply 
suck.  The speculation gets rid of some additional mmap_sem contention, 
but at least for two sockets it looks like the rwsem implementation was 
the biggest problem by far.

> At both the xadd/speculative point the workload is dominated by the
> zone->lock, the xadd+speculative removes some of the contention, and
> removing the various RSS counters could yield another few percent
> according to the profiles, but then we're pretty much there.

I don't know if worrying about a few percent is worth it. "Perfect is the 
enemy of good", and the workload is pretty dang artificial with the whole 
"remove pages and re-fault them as fast as you can".

So the benchmark is pointless and extreme, and I think it's not worth 
worrying too much about details. Especially when compared to just the 
*three-fold* jump from just the fairly trivial rwsem implementation change 
(with speculation on top of it then adding another 15% improvement - 
nothing to sneeze at, but it's still in a different class).

Of course, larger numbers of sockets will likely change the situation, but 
at the same time I do suspect that workloads designed for hundreds of 
cores will need to try to behave better than that benchmark anyway ;)

> One way around those RSS counters is to track it per task, a quick grep
> shows its only the oom-killer and proc that use them.
> 
> A quick hack removing them gets us: 203158058

Yeah, well.. After that 200% and 15% improvement, a 1.5% improvement on a 
totally artificial benchmark looks less interesting.

Because let's face it - if your workload does several million page faults 
per second, you're just doing something fundamentally _wrong_.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/