linux-kernel - Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080317082638.GB18229@1wt.eu>
Date:	Mon, 17 Mar 2008 09:26:38 +0100
From:	Willy Tarreau <w@....eu>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Ray Lee <ray-lk@...rabbit.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	"LKML," <linux-kernel@...r.kernel.org>
Subject: Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22)

On Mon, Mar 17, 2008 at 06:19:38PM +1100, Nick Piggin wrote:
> On Monday 17 March 2008 16:21, Willy Tarreau wrote:
> > On Sun, Mar 16, 2008 at 10:16:36PM -0700, Ray Lee wrote:
> > > On Sun, Mar 16, 2008 at 5:44 PM, Nick Piggin <nickpiggin@...oo.com.au> 
> > > > Can this be changed by default, please?
> > >
> > > Not without benchmarks of interactivity, please. There are far, far
> > > more linux desktops than there are servers. People expect to have to
> > > tune servers (I do, for the servers I maintain). People don't expect
> > > to have to tune a desktop to make it run well.
> >
> > Oh and even on servers, when your anti-virus proxy reaches a load of 800,
> > you're happy no to have too large a time-slice so that you regularly get
> > a chance of being allowed to type in commands over SSH.
> 
> Your ssh session should be allowed to run anyway. I don't see the difference.
> If the runqueue length is 100 and the time-slice is (say) 10ms, then if your
> ssh only needs average of 5ms of CPU time per second, then it should be run
> next when it becomes runnable. If it wants 20ms of CPU time per second, then
> it has to wait for 2 seconds anyway to be run next, regardless of whether
> the timeslice was 10ms or 20ms.

It's not about what *ssh* uses but about what *others* use. Except by
renicing SSH or marking it real-time, it has no way to say "give the
CPU to me right now, I have something very short to do". So it will
have to wait for the 100 other tasks to eat their 10ms, waiting 1 second
to consume 5ms of CPU (and I was speaking about 800 and not 100).

It is one of the situations where I prefer to shorten timeslices when
load increases because it will not slow down the service too much, but
will still provide better interactivity, which is also benefical to the
service itself since there is no reason for the cycles usage to be the
same for all processes. So by having a finer granularity, small CPU eaters
will finish sooner.

> > Large time-slices are needed only in HPC environments IMHO, where only
> > one task runs.
> 
> That's silly. By definition if there is only one task running, you don't
> care what the timeslice is.

I mean there's only one important task. There is always a bit of pollution
around it, and interrupting the tasks less often slightly reduces the
context-switch overhead.

> We actually did conduct some benchmarks, and a 10ms timeslice can start
> hurting even things like kbuild quite a bit.
> 
> But anyway, I don't care what the time slice is so much (although it should
> be higher -- if the scheduler can't get good interactive behaviour with a
> 20-30ms timeslice in most cases then it's no good IMO). I care mostly that
> the timeslice does not decrease when load increases.

On the opposite, I think it's a fundamental requirement if you need to
maintain a reasonable interactivity, and a fair progress between all
tasks. I think it's obvious to understand that the only way to maintain
a constant latency with a growing number of tasks is to reduce the time
each task may spend on the CPU. Contrary to other domains such as network,
you don't know how much time a task will spend on the CPU if you grant an
access to it, and there is no way to know because only the work that this
task will perform will determine if it should run shorter or longer. Fair
scheduling in other areas such as network is "easier" because you know the
size of your packets so you know how much time they will take on the wire.

Here with tasks, the best you can do is estimating based on history. But
it will be very rare when you'll be able to correctly guess and guarantee
that the latency is correct.

Maybe the timeslices should shrink only past a certain load though (I don't
know how it's done today).

Regards,
willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/