lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 27 Sep 2012 07:47:42 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Mike Galbraith <efault@....de>
Cc:	Borislav Petkov <bp@...en8.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mel Gorman <mgorman@...e.de>,
	Nikolay Ulyanitsky <lystor@...il.com>,
	linux-kernel@...r.kernel.org,
	Andreas Herrmann <andreas.herrmann3@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Suresh Siddha <suresh.b.siddha@...el.com>
Subject: Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to
 3.6-rc5 on AMD chipsets - bisected


* Mike Galbraith <efault@....de> wrote:

> I think the pgbench problem is more about latency for the 1 in 
> 1:N than spinlocks.

So my understanding of the psql workload is that basically we've 
got a central psql proxy process that is distributing work to 
worker psql processes. If a freshly woken worker process ever 
preempts the central proxy process then it is preventing a lot 
of new work from getting distributed.

Correct?

So the central proxy psql process is 'much more important' to 
run than any of the worker processes - an importance that is not 
(currently) visible from the behavioral statistics the scheduler 
keeps on tasks.

So the scheduler has the following problem here: a new wakee 
might be starved enough and the proxy might have run long enough 
to really justify the preemption here and now. The buddy 
statistics help avoid some of these cases - but not all and the 
difference is measurable.

Yet the 'best' way for psql to run is for this proxy process to 
never be preempted. Your SCHED_BATCH experiments confirmed that.

The way remote CPU selection affects it is that if we ever get 
more aggressive in selecting a remote CPU then we, as a side 
effect, also reduce the chance of harmful preemption of the 
central proxy psql process.

So in that sense sibling selection is somewhat of an indirect 
red herring: it really only helps psql indirectly by preventing 
the harmful preemption. It also, somewhat paradoxially argues 
for suboptimal code: for example tearing apart buddies is 
beneficial in the psql workload, because it also allows the more 
important part of the buddy to run more (the proxy).

In that sense the *real* problem isnt even parallelism (although 
we obviously should improve the decisions there - and the logic 
has suffered in the past from the psql dilemma outlined above), 
but whether the scheduler can (and should) identify the central 
proxy and keep it running as much as possible, deprioritizing 
fairness, wakeup buddies, runtime overlap and cache affinity 
considerations.

There's two broad solutions that I can see:

 - Add a kernel solution to somehow identify 'central' processes
   and bias them. Xorg is a similar kind of process, so it would
   help other workloads as well. That way lie dragons, but might
   be worth an attempt or two. We already try to do a couple of
   robust metrics, like overlap statistics to identify buddies. 

 - Let user-space occasionally identify its important (and less
   important) tasks - say psql could mark it worker processes as
   SCHED_BATCH and keep its central process(es) higher prio. A
   single line of obvious code in 100 KLOCs of user-space code.

Just to confirm, if you turn off all preemption via a hack 
(basically if you turn SCHED_OTHER into SCHED_BATCH), does psql 
perform and scale much better, with the quality of sibling 
selection and spreading of processes only being a secondary 
effect?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ