lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 8 Jan 2010 09:22:14 -0800 (PST) From: Linus Torvalds <torvalds@...ux-foundation.org> To: Peter Zijlstra <peterz@...radead.org> cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>, Minchan Kim <minchan.kim@...il.com>, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>, cl@...ux-foundation.org, "hugh.dickins" <hugh.dickins@...cali.co.uk>, Nick Piggin <nickpiggin@...oo.com.au>, Ingo Molnar <mingo@...e.hu> Subject: Re: [RFC][PATCH 6/8] mm: handle_speculative_fault() On Fri, 8 Jan 2010, Peter Zijlstra wrote: > On Tue, 2010-01-05 at 20:20 -0800, Linus Torvalds wrote: > > > > Yeah, I should have looked more at your callchain. That's nasty. Much > > worse than the per-mm lock. I thought the page buffering would avoid the > > zone lock becoming a huge problem, but clearly not in this case. > > Right, so I ran some numbers on a multi-socket (2) machine as well: > > pf/min > > -tip 56398626 > -tip + xadd 174753190 > -tip + speculative 189274319 > -tip + xadd + speculative 200174641 > > [ variance is around 0.5% for this workload, ran most of these numbers > with --repeat 5 ] That's a huge jump. It's clear that the spinlock-based rwsem's simply suck. The speculation gets rid of some additional mmap_sem contention, but at least for two sockets it looks like the rwsem implementation was the biggest problem by far. > At both the xadd/speculative point the workload is dominated by the > zone->lock, the xadd+speculative removes some of the contention, and > removing the various RSS counters could yield another few percent > according to the profiles, but then we're pretty much there. I don't know if worrying about a few percent is worth it. "Perfect is the enemy of good", and the workload is pretty dang artificial with the whole "remove pages and re-fault them as fast as you can". So the benchmark is pointless and extreme, and I think it's not worth worrying too much about details. Especially when compared to just the *three-fold* jump from just the fairly trivial rwsem implementation change (with speculation on top of it then adding another 15% improvement - nothing to sneeze at, but it's still in a different class). Of course, larger numbers of sockets will likely change the situation, but at the same time I do suspect that workloads designed for hundreds of cores will need to try to behave better than that benchmark anyway ;) > One way around those RSS counters is to track it per task, a quick grep > shows its only the oom-killer and proc that use them. > > A quick hack removing them gets us: 203158058 Yeah, well.. After that 200% and 15% improvement, a 1.5% improvement on a totally artificial benchmark looks less interesting. Because let's face it - if your workload does several million page faults per second, you're just doing something fundamentally _wrong_. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists