linux-kernel - [PATCH 2/3] sched: enable interrupts and drop rq-lock during newidle balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080624141607.28487.75030.stgit@lsg.lsg.lab.novell.com>
Date:	Tue, 24 Jun 2008 08:16:07 -0600
From:	Gregory Haskins <ghaskins@...ell.com>
To:	mingo@...e.hu, rostedt@...dmis.org, tglx@...utronix.de
Cc:	linux-kernel@...r.kernel.org, peterz@...radead.org,
	linux-rt-users@...r.kernel.org, ghaskins@...ell.com,
	dbahi@...ell.com, npiggin@...e.de
Subject: [PATCH 2/3] sched: enable interrupts and drop rq-lock during newidle
	balancing

Oprofile data shows that the system may spend a significant amount of
time (60%+) in find_busiest_groups as a result of newidle balancing.  This
entire operation is a critical section since it occurs inline with
a schedule().  Since we do find_busiest_groups() et. al. without locks
held for normal balancing, lets do it for newidle as well.  It will
at least allow other cpus and interrupts to make forward progress
(against our RQ) while we try to balance.

Additionally, we abort the newidle processing if we are preempted.

This patch should both improve latency response, as well as increase
throughput.  It has shown to significantly contribute to a 6-12%
increase in network peformance.

Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
---

 kernel/sched.c |   40 +++++++++++++++++++++++++++++++++-------
 1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 54b27b4..e40c575 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3335,6 +3335,15 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd)
 	cpumask_t cpus = sd->span;
 	int remain = cpus_weight(sd->span) - 1;
 
+	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
+
+	/*
+	 * We are in a preempt-disabled section, so dropping the lock/irq
+	 * here simply means that other cores may acquire the lock,
+	 * and interrupts may occur.
+	 */
+	spin_unlock_irq(&this_rq->lock);
+
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
 	 * sibling can pick up load irrespective of busy siblings. In this case,
@@ -3345,7 +3354,6 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd)
 	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
 		sd_idle = 1;
 
-	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
 redo:
 	group = find_busiest_group(sd, this_cpu, &imbalance, CPU_NEWLY_IDLE,
 				   &sd_idle, &cpus, NULL);
@@ -3366,22 +3374,37 @@ redo:
 	schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance);
 
 	ld_moved = 0;
-	if (busiest->nr_running > 1) {
+	if (!need_resched() && busiest->nr_running > 1) {
 		/* Attempt to move tasks */
-		double_lock_balance(this_rq, busiest);
-		/* this_rq->clock is already updated */
-		update_rq_clock(busiest);
+		local_irq_disable();
+		double_rq_lock(this_rq, busiest);
+
+		BUG_ON(this_cpu != smp_processor_id());
+
+		/*
+		 * Checking rq->nr_running covers both the case where
+		 * newidle-balancing pulls a task, as well as if something
+		 * else issued a NEEDS_RESCHED (since we would only need
+		 * a reschedule if something was moved to us)
+		 */
+		if (this_rq->nr_running) {
+			spin_unlock(&busiest->lock);
+			goto out_balanced_locked;
+		}
+
 		ld_moved = move_tasks(this_rq, this_cpu, busiest,
 					imbalance, sd, CPU_NEWLY_IDLE,
 					&all_pinned);
 		spin_unlock(&busiest->lock);
 
-		if (unlikely(all_pinned && remain)) {
+		if (unlikely(all_pinned && remain && !this_rq->nr_running)) {
+			spin_unlock_irq(&this_rq->lock);
 			cpu_clear(cpu_of(busiest), cpus);
 			remain--;
 			goto redo;
 		}
-	}
+	} else
+		spin_lock_irq(&this_rq->lock);
 
 	if (!ld_moved) {
 		schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]);
@@ -3394,6 +3417,9 @@ redo:
 	return ld_moved;
 
 out_balanced:
+	spin_lock_irq(&this_rq->lock);
+
+out_balanced_locked:
 	schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]);
 	if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
 	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/