[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1209270013270.5162@asgard.lang.hm>
Date: Thu, 27 Sep 2012 00:17:22 -0700 (PDT)
From: david@...g.hm
To: Borislav Petkov <bp@...en8.de>
cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Mike Galbraith <efault@....de>, Mel Gorman <mgorman@...e.de>,
Nikolay Ulyanitsky <lystor@...il.com>,
linux-kernel@...r.kernel.org,
Andreas Herrmann <andreas.herrmann3@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Suresh Siddha <suresh.b.siddha@...el.com>
Subject: Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to
3.6-rc5 on AMD chipsets - bisected
On Wed, 26 Sep 2012, Borislav Petkov wrote:
>> It always selected target_cpu, but the fact is, that doesn't really
>> sound very sane. The target cpu is either the previous cpu or the
>> current cpu, depending on whether they should be balanced or not. But
>> that still doesn't make any *sense*.
>>
>> In fact, the whole select_idle_sibling() logic makes no sense
>> what-so-ever to me. It seems to be total garbage.
>>
>> For example, it starts with the maximum target scheduling domain, and
>> works its way in over the scheduling groups within that domain. What
>> the f*ck is the logic of that kind of crazy thing? It never makes
>> sense to look at a biggest domain first. If you want to be close to
>> something, you want to look at the *smallest* domain first. But
>> because it looks at things in the wrong order, it then needs to have
>> that inner loop saying "does this group actually cover the cpu I am
>> interested in?"
>>
>> Please tell me I am mis-reading this?
>
> First of all, I'm so *not* a scheduler guy so take this with a great
> pinch of salt.
>
> The way I understand it is, you either want to share L2 with a process,
> because, for example, both working sets fit in the L2 and/or there's
> some sharing which saves you moving everything over the L3. This is
> where selecting a core on the same L2 is actually a good thing.
>
> Or, they're too big to fit into the L2 and they start kicking each-other
> out. Then you want to spread them out to different L2s - i.e., different
> HT groups in Intel-speak.
an observation from an outsider here.
if you do overload a L2 cache, then the core will be busy all the time and
you will end up migrating a task away from that core.
It seems to me that trying to figure out if you are going to overload the
L2 is an impossible task, so just assume that it will all fit, and the
worst case is you have one balancing cycle where you can't do as much work
and then the normal balancing will kick in and move something anyway.
over the long term, the work lost due to not moving optimally right away
is probably much less than the work lost due to trying to figure out the
perfect thing to do.
and since the perfect thing to do is going to be both workload and chip
specific, trying to model that in your decision making is a lost cause.
David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists