[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1321677680.6307.15.camel@marge.simson.net>
Date: Sat, 19 Nov 2011 05:41:20 +0100
From: Mike Galbraith <efault@....de>
To: Suresh Siddha <suresh.b.siddha@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
Venki Pallipadi <venki@...gle.com>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Tim Chen <tim.c.chen@...ux.jf.intel.com>, alex.shi@...el.com
Subject: Re: [patch 5/6] sched: disable sched feature TTWU_QUEUE by default
On Sat, 2011-11-19 at 05:30 +0100, Mike Galbraith wrote:
> On Fri, 2011-11-18 at 15:03 -0800, Suresh Siddha wrote:
> > plain text document attachment (disable_sched_ttwu_queue.patch)
> > Context-switch intensive microbenchmark on a 8-socket system had
> > ~600K times more resched IPI's on each logical CPU with this feature enabled
> > by default. Disabling this features makes that microbenchmark perform 5 times
> > better.
> >
> > Also disabling this feature showed 2% performance improvement on a 8-socket
> > OLTP workload.
> >
> > More heurestics are needed when and how to use this feature by default.
> > For now, disable it by default.
>
> Yeah, the overhead for very hefty switchers is high enough to increase
> TCP_RR latency up to 13% in my testing. I used a trylock() to generally
> not eat that, but leave the contended case improvement intact.
>
> Peter suggested trying doing the IPI only when crossing cache
> boundaries, which worked for me as well.
On a related TTWU_QUEUE note, I was pondering idle_balance().
---
kernel/sched_fair.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
Index: linux-3.0/kernel/sched_fair.c
===================================================================
--- linux-3.0.orig/kernel/sched_fair.c
+++ linux-3.0/kernel/sched_fair.c
@@ -3500,8 +3500,7 @@ out:
static void idle_balance(int this_cpu, struct rq *this_rq)
{
struct sched_domain *sd;
- int pulled_task = 0;
- unsigned long next_balance = jiffies + HZ;
+ unsigned long next_balance;
if (this_rq->avg_idle < sysctl_sched_migration_cost)
return;
@@ -3512,33 +3511,41 @@ static void idle_balance(int this_cpu, s
raw_spin_unlock(&this_rq->lock);
update_shares(this_cpu);
+ next_balance = jiffies + HZ;
rcu_read_lock();
for_each_domain(this_cpu, sd) {
unsigned long interval;
int balance = 1;
+ if (this_rq->nr_running || this_rq->wake_list)
+ break;
+
if (!(sd->flags & SD_LOAD_BALANCE))
continue;
- if (sd->flags & SD_BALANCE_NEWIDLE) {
- /* If we've pulled tasks over stop searching: */
- pulled_task = load_balance(this_cpu, this_rq,
- sd, CPU_NEWLY_IDLE, &balance);
- }
+ if (!(sd->flags & SD_BALANCE_NEWIDLE))
+ continue;
+
+ load_balance(this_cpu, this_rq, sd, CPU_NEWLY_IDLE, &balance);
interval = msecs_to_jiffies(sd->balance_interval);
if (time_after(next_balance, sd->last_balance + interval))
next_balance = sd->last_balance + interval;
- if (pulled_task) {
+ if (this_rq->nr_running || this_rq->wake_list) {
this_rq->idle_stamp = 0;
break;
}
}
rcu_read_unlock();
+ /* IPI in flighht? Let the it happen */
+ if (unlikely(this_rq->wake_list)) {
+ local_irq_enable();
+ local_irq_disable();
+ }
raw_spin_lock(&this_rq->lock);
- if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
+ if (this_rq->nr_running || time_after(jiffies, this_rq->next_balance)) {
/*
* We are going idle. next_balance may be set based on
* a busy processor. So reset next_balance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists