[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1348500647.11847.69.camel@twins>
Date: Mon, 24 Sep 2012 17:30:47 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Mel Gorman <mgorman@...e.de>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Borislav Petkov <bp@...en8.de>,
Nikolay Ulyanitsky <lystor@...il.com>,
Mike Galbraith <efault@....de>, linux-kernel@...r.kernel.org,
Andreas Herrmann <andreas.herrmann3@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to
3.6-rc5 on AMD chipsets - bisected
On Mon, 2012-09-24 at 16:00 +0100, Mel Gorman wrote:
> On Fri, Sep 14, 2012 at 02:42:44PM -0700, Linus Torvalds wrote:
> > On Fri, Sep 14, 2012 at 2:27 PM, Borislav Petkov <bp@...en8.de> wrote:
> > >
> > > as Nikolay says below, we have a regression in 3.6 with pgbench's
> > > benchmark in postgresql.
> > >
> > > I was able to reproduce it on another box here and did a bisection run.
> > > It pointed to the commit below.
> >
> > Ok. I guess we should just revert it. However, before we do that,
> > maybe Mike can make it just use the exact old semantics of
> > select_idle_sibling() in the update_top_cache_domain() logic.
> >
>
> The patch that you being reverted was meant to fix problems with
> commit 4dcfe102 (sched: Avoid SMT siblings in select_idle_sibling() if
> possible). That patch made select_idle_sibling() quite fat and I know it
> is responsible for a 2% regression in a kernel compile benchmark between
> kernel 3.1 and 3.2 on an old AMD Phenom II X4 940. Reverting Mike's patch
> might fix this Postgres regression but it reintroduces the overhead caused
> by commit 4dcfe102 for other cases. I do not have a suggestion on how to
> make this better, I'm just pointing out that the revert has some downsides.
Something like the below removes a number of cpumask operations, which
on big machines can be quite expensive.
No idea if its sufficient, but its a start.
Anyway, does anybody have any clue as to why AMD and Intel machine
behave significantly different here? Does an Intel box with HT disabled
behave similar to AMD? or is it something about the micro-architecture?
---
kernel/sched/fair.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b800a1..8757097 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2661,17 +2661,29 @@ static int select_idle_sibling(struct task_struct *p, int target)
for_each_lower_domain(sd) {
sg = sd->groups;
do {
- if (!cpumask_intersects(sched_group_cpus(sg),
- tsk_cpus_allowed(p)))
- goto next;
+ int candidate = nr_cpu_ids;
+ /*
+ * In the SMT case the groups are the SMT-siblings,
+ * otherwise they're singleton groups.
+ */
for_each_cpu(i, sched_group_cpus(sg)) {
+ if (!cpumask_test_cpu(i, tsk_cpus_allowed(p)))
+ continue;
+
+ /*
+ * If any of the SMT-siblings are !idle, the
+ * core isn't idle.
+ */
if (!idle_cpu(i))
goto next;
+
+ if (candidate == nr_cpu_ids)
+ candidate = i;
}
- target = cpumask_first_and(sched_group_cpus(sg),
- tsk_cpus_allowed(p));
+ target = candidate;
+
goto done;
next:
sg = sg->next;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists