[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.GSO.4.64.0805230232190.28654@westnet.com>
Date: Fri, 23 May 2008 03:13:32 -0400 (EDT)
From: Greg Smith <gsmith@...gsmith.com>
To: Peter Zijlstra <peterz@...radead.org>
cc: Mike Galbraith <efault@....de>,
Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
lkml <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>
Subject: Re: PostgreSQL pgbench performance regression in 2.6.23+
On Thu, 22 May 2008, Peter Zijlstra wrote:
> I picked the wake_affine() condition, because I think that is the
> biggest factor in this behaviour.
I tested out Peter's patch (updated version against -rc3 with a typo fix
from Mike below) and it's a big step in the right direction. Here are
updated results from my benchmark script, adding 2.6.26-rc3 and that rev
with this patch applied:
Clients 2.6.22 2.6.24 2.6.25 -rc3 patch
1 11052 10526 10700 10193 10439
2 16352 14447 10370 9817 13289
3 15414 17784 9403 9428 13678
4 14290 16832 8882 9533 13033
5 14211 16356 8527 9558 12790
6 13291 16763 9473 9367 12660
8 12374 15343 9093 9159 12357
10 11218 10732 9057 8711 11839
15 11116 7460 7113 7620 11267
20 11412 7171 7017 7707 10531
30 11191 7049 6896 7195 9766
40 11062 7001 6820 7079 9668
50 11255 6915 6797 7202 9588
Exact versions I tested because I think it may start mattering now:
2.6.22.19, 2.6.24.3, 2.6.25. I didn't save 2.6.23 results but recall them
being similar to 2.6.24.
On this dual-core system, without this patch there's an average of a a 33%
regression in -rc3 compared to 2.6.22. With it that's dropped to 8%; some
cases (around 10 clients) even improve a touch (it's enough within the
margin of error here I wouldn't conclude too much from that). The big
jump in high client count cases is the first I've seen that since CFS was
introduced. It seems a bit odd to me that there's still such a large
regression in the 2-8 client cases compared with not only 2.6.22 but
2.6.24, which owned this benchmark in that area.
With this feedback, any ideas on where to go next? There seems like's
some room for improvement still left here.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5395a61..e160f71 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -965,6 +965,8 @@ struct sched_entity {
u64 last_wakeup;
u64 avg_overlap;
+ struct sched_entity *waker;
+
#ifdef CONFIG_SCHEDSTATS
u64 wait_start;
u64 wait_max;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index e24ecd3..9db3cb4 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1066,7 +1066,8 @@ wake_affine(struct rq *rq, struct sched_domain
*this_sd, struct rq *this_rq,
* a reasonable amount of time then attract this newly
* woken task:
*/
- if (sync && curr->sched_class == &fair_sched_class) {
+ if (sync && curr->sched_class == &fair_sched_class &&
+ p->se.waker == curr->se.waker) {
if (curr->se.avg_overlap < sysctl_sched_migration_cost &&
p->se.avg_overlap <
sysctl_sched_migration_cost)
return 1;
@@ -1238,6 +1239,7 @@ static void check_preempt_wakeup(struct rq *rq,
struct task_struct *p)
if (unlikely(se == pse))
return;
+ se->waker = pse;
cfs_rq_of(pse)->next = pse;
/*
--
* Greg Smith gsmith@...gsmith.com http://www.gregsmith.com Baltimore, MD
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists