linux-kernel - Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120926171721.GC5339@x1.osrc.amd.com>
Date:	Wed, 26 Sep 2012 19:17:22 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Mike Galbraith <efault@....de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mel Gorman <mgorman@...e.de>,
	Nikolay Ulyanitsky <lystor@...il.com>,
	linux-kernel@...r.kernel.org,
	Andreas Herrmann <andreas.herrmann3@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	Suresh Siddha <suresh.b.siddha@...el.com>
Subject: Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to
 3.6-rc5 on AMD chipsets - bisected

On Wed, Sep 26, 2012 at 04:23:26AM +0200, Mike Galbraith wrote:
> On Tue, 2012-09-25 at 20:42 +0200, Borislav Petkov wrote:
> 
> > Right, so why did we need it all, in the first place? There has to be
> > some reason for it.
> 
> Easy.  Take two communicating tasks.  Is an affine wakeup a good idea?
> It depends on how much execution overlap there is.  Wake affine when
> there is overlap larger than cache miss cost, and you just tossed
> throughput into the bin.
> 
> select_idle_sibling() was originally about shared L2, where any overlap
> was salvageable.  On modern processors with no shared L2,

Oh, but we do have shared L2s in the Bulldozer uarch (a subset of the
modern AMD processors :)).

> you have to get past the cost, but the gain is still there. Intel
> wins with loads that AMD loses very bady on, so I can only guess that
> Intel must feed caches more efficiently. Dunno. It just doesn't matter
> though, point is that there is a win to be had in both cases, the
> breakeven just isn't at the same point.

Well, I guess selecting the proper core in the hierarchy depending on
the workload is one of those hard problems.

Teaching select_idle_sibling to detect the breakeven point and act
accordingly would be not that easy then...

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/