lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 17 Mar 2008 11:44:35 +1100 From: Nick Piggin <nickpiggin@...oo.com.au> To: Peter Zijlstra <peterz@...radead.org> Cc: Ingo Molnar <mingo@...e.hu>, "LKML," <linux-kernel@...r.kernel.org> Subject: Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22) On Wednesday 12 March 2008 18:58, Peter Zijlstra wrote: > On Wed, 2008-03-12 at 12:21 +1100, Nick Piggin wrote: > > (Back onto lkml) > > > > On Tuesday 11 March 2008 23:02, Ingo Molnar wrote: > > > another thing to try would be to increase: > > > > > > /proc/sys/kernel/sched_migration_cost > > > > > > from its 500 usecs default to a few msecs ? > > > > This doesn't really help either (at 10ms). > > > > (For the record, I've tried turning SD_WAKE_IDLE, SD_WAKE_AFFINE > > on and off for each domain and that hasn't helped either). > > > > I've also tried increasing sched_latency_ns as far as it can go. > > BTW. this is a pretty nasty behaviour if you ask my opinion. It > > starts *increasing* the number of involuntary context switches > > as resources get oversubscribed. That's completely unintuitive as > > far as I can see -- when we get overloaded, the obvious thing to > > do is try to increase efficiency, or at least try as hard as > > possible not to lose it. So context switches should be steady or > > decreasing as I add more processes to a runqueue. > > > > It seems to max out at nearly 100 context switches per second, > > and this has actually shown to be too frequent for modern CPUs > > and big caches. > > > > Increasing the tunable didn't help for this workload, but it really > > needs to be fixed so it doesn't decrease timeslices as the number > > of processes increases. > > /proc/sys/kernel/sched_min_granularity_ns > /proc/sys/kernel/sched_latency_ns > > period := max(latency, nr_running * min_granularity) > slice := period * w_{i} / W > W := \Sum_{i} w_{i} > > So if you want to increase the slice length for loaded systems, up > min_granularity. OK, but the very concept of reducing efficiency when load increases is nasty, and leads to nasty feedback loops. It's just a very bad behaviour to have out of the box, and as a general observation, 10ms is too short a default timeslice IMO. I don't see how it is really helpful for interactive processes either. By definition, if they are not CPU bound, then they should be run quite soon after waking up; if they are CPU bound, then reducing efficiency by increasing context switches is effectively going to increase their latency anyway. Can this be changed by default, please? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists