linux-kernel - Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200803171954.01315.nickpiggin@yahoo.com.au>
Date:	Mon, 17 Mar 2008 19:54:00 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Willy Tarreau <w@....eu>
Cc:	Ray Lee <ray-lk@...rabbit.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	"LKML," <linux-kernel@...r.kernel.org>
Subject: Re: Poor PostgreSQL scaling on Linux 2.6.25-rc5 (vs 2.6.22)

On Monday 17 March 2008 19:26, Willy Tarreau wrote:
> On Mon, Mar 17, 2008 at 06:19:38PM +1100, Nick Piggin wrote:

> > Your ssh session should be allowed to run anyway. I don't see the
> > difference. If the runqueue length is 100 and the time-slice is (say)
> > 10ms, then if your ssh only needs average of 5ms of CPU time per second,
> > then it should be run next when it becomes runnable. If it wants 20ms of
> > CPU time per second, then it has to wait for 2 seconds anyway to be run
> > next, regardless of whether the timeslice was 10ms or 20ms.
>
> It's not about what *ssh* uses but about what *others* use. Except by
> renicing SSH or marking it real-time, it has no way to say "give the
> CPU to me right now, I have something very short to do". So it will
> have to wait for the 100 other tasks to eat their 10ms, waiting 1 second
> to consume 5ms of CPU (and I was speaking about 800 and not 100).

Um, if ssh is not using as much CPU time as the other processes running,
(if it has "something very short to do") then yes it should get the CPU
*right now*, regardless of what the timeslice size is. If it *is* using
as much CPU time as everyone else, then it will have to wait to get time,
just like everybody else; and in that case, lowering the timeslice will
not help matters at all because consider if ssh has to compute for 20ms
before returning control to the user, then with a 10ms timeslice it just
has to wait for 2 slices. So in that case you actually do want a longer
and more efficient timeslice so everybody (including ssh) can get their
job done faster.

> > > Large time-slices are needed only in HPC environments IMHO, where only
> > > one task runs.
> >
> > That's silly. By definition if there is only one task running, you don't
> > care what the timeslice is.
>
> I mean there's only one important task. There is always a bit of pollution
> around it, and interrupting the tasks less often slightly reduces the
> context-switch overhead.

I think it is important for many situations, not only just HPC at all.
Just because tpc-c runs are set up so the number of server threads
exactly matches the number of cpus, doesn't mean that real world servers
don't run into lots of different overload conditions. And yes, cache
efficiency does matter for those too, not just HPC.

> > We actually did conduct some benchmarks, and a 10ms timeslice can start
> > hurting even things like kbuild quite a bit.
> >
> > But anyway, I don't care what the time slice is so much (although it
> > should be higher -- if the scheduler can't get good interactive behaviour
> > with a 20-30ms timeslice in most cases then it's no good IMO). I care
> > mostly that the timeslice does not decrease when load increases.
>
> On the opposite, I think it's a fundamental requirement if you need to
> maintain a reasonable interactivity, and a fair progress between all
> tasks. I think it's obvious to understand that the only way to maintain
> a constant latency with a growing number of tasks is to reduce the time
> each task may spend on the CPU. Contrary to other domains such as network,
> you don't know how much time a task will spend on the CPU if you grant an
> access to it, and there is no way to know because only the work that this
> task will perform will determine if it should run shorter or longer. Fair
> scheduling in other areas such as network is "easier" because you know the
> size of your packets so you know how much time they will take on the wire.
>
> Here with tasks, the best you can do is estimating based on history. But
> it will be very rare when you'll be able to correctly guess and guarantee
> that the latency is correct.
>
> Maybe the timeslices should shrink only past a certain load though (I don't
> know how it's done today).

You are just asserting that shorter timeslices are more interactive.
As far as I know (aside from implementation details of a given scheduler),
that assertion only holds in general for a small number of things like
for example video playing or 3d graphics that adaptively scale back their
output as they get starved for CPU (it might be better to drop every 2nd
frame than to drop 10 frames every 20). I doubt there are many server side
apps like that. What you really need on your server is to give ssh more 
priority than your 800 spam threads. You can do that *properly* with nice
or with this group fairness stuff. Lowering timeslices is basically
shooting in the dark.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/