linux-kernel - Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1209270013270.5162@asgard.lang.hm>
Date:	Thu, 27 Sep 2012 00:17:22 -0700 (PDT)
From:	david@...g.hm
To:	Borislav Petkov <bp@...en8.de>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>, Mel Gorman <mgorman@...e.de>,
	Nikolay Ulyanitsky <lystor@...il.com>,
	linux-kernel@...r.kernel.org,
	Andreas Herrmann <andreas.herrmann3@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	Suresh Siddha <suresh.b.siddha@...el.com>
Subject: Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to
 3.6-rc5 on AMD chipsets - bisected

On Wed, 26 Sep 2012, Borislav Petkov wrote:

>> It always selected target_cpu, but the fact is, that doesn't really
>> sound very sane. The target cpu is either the previous cpu or the
>> current cpu, depending on whether they should be balanced or not. But
>> that still doesn't make any *sense*.
>>
>> In fact, the whole select_idle_sibling() logic makes no sense
>> what-so-ever to me. It seems to be total garbage.
>>
>> For example, it starts with the maximum target scheduling domain, and
>> works its way in over the scheduling groups within that domain. What
>> the f*ck is the logic of that kind of crazy thing? It never makes
>> sense to look at a biggest domain first. If you want to be close to
>> something, you want to look at the *smallest* domain first. But
>> because it looks at things in the wrong order, it then needs to have
>> that inner loop saying "does this group actually cover the cpu I am
>> interested in?"
>>
>> Please tell me I am mis-reading this?
>
> First of all, I'm so *not* a scheduler guy so take this with a great
> pinch of salt.
>
> The way I understand it is, you either want to share L2 with a process,
> because, for example, both working sets fit in the L2 and/or there's
> some sharing which saves you moving everything over the L3. This is
> where selecting a core on the same L2 is actually a good thing.
>
> Or, they're too big to fit into the L2 and they start kicking each-other
> out. Then you want to spread them out to different L2s - i.e., different
> HT groups in Intel-speak.

an observation from an outsider here.

if you do overload a L2 cache, then the core will be busy all the time and 
you will end up migrating a task away from that core.

It seems to me that trying to figure out if you are going to overload the 
L2 is an impossible task, so just assume that it will all fit, and the 
worst case is you have one balancing cycle where you can't do as much work 
and then the normal balancing will kick in and move something anyway.

over the long term, the work lost due to not moving optimally right away 
is probably much less than the work lost due to trying to figure out the 
perfect thing to do.

and since the perfect thing to do is going to be both workload and chip 
specific, trying to model that in your decision making is a lost cause.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/