lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080624141607.28487.75030.stgit@lsg.lsg.lab.novell.com>
Date:	Tue, 24 Jun 2008 08:16:07 -0600
From:	Gregory Haskins <ghaskins@...ell.com>
To:	mingo@...e.hu, rostedt@...dmis.org, tglx@...utronix.de
Cc:	linux-kernel@...r.kernel.org, peterz@...radead.org,
	linux-rt-users@...r.kernel.org, ghaskins@...ell.com,
	dbahi@...ell.com, npiggin@...e.de
Subject: [PATCH 2/3] sched: enable interrupts and drop rq-lock during newidle
	balancing

Oprofile data shows that the system may spend a significant amount of
time (60%+) in find_busiest_groups as a result of newidle balancing.  This
entire operation is a critical section since it occurs inline with
a schedule().  Since we do find_busiest_groups() et. al. without locks
held for normal balancing, lets do it for newidle as well.  It will
at least allow other cpus and interrupts to make forward progress
(against our RQ) while we try to balance.

Additionally, we abort the newidle processing if we are preempted.

This patch should both improve latency response, as well as increase
throughput.  It has shown to significantly contribute to a 6-12%
increase in network peformance.

Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
---

 kernel/sched.c |   40 +++++++++++++++++++++++++++++++++-------
 1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 54b27b4..e40c575 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3335,6 +3335,15 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd)
 	cpumask_t cpus = sd->span;
 	int remain = cpus_weight(sd->span) - 1;
 
+	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
+
+	/*
+	 * We are in a preempt-disabled section, so dropping the lock/irq
+	 * here simply means that other cores may acquire the lock,
+	 * and interrupts may occur.
+	 */
+	spin_unlock_irq(&this_rq->lock);
+
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
 	 * sibling can pick up load irrespective of busy siblings. In this case,
@@ -3345,7 +3354,6 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd)
 	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
 		sd_idle = 1;
 
-	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
 redo:
 	group = find_busiest_group(sd, this_cpu, &imbalance, CPU_NEWLY_IDLE,
 				   &sd_idle, &cpus, NULL);
@@ -3366,22 +3374,37 @@ redo:
 	schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance);
 
 	ld_moved = 0;
-	if (busiest->nr_running > 1) {
+	if (!need_resched() && busiest->nr_running > 1) {
 		/* Attempt to move tasks */
-		double_lock_balance(this_rq, busiest);
-		/* this_rq->clock is already updated */
-		update_rq_clock(busiest);
+		local_irq_disable();
+		double_rq_lock(this_rq, busiest);
+
+		BUG_ON(this_cpu != smp_processor_id());
+
+		/*
+		 * Checking rq->nr_running covers both the case where
+		 * newidle-balancing pulls a task, as well as if something
+		 * else issued a NEEDS_RESCHED (since we would only need
+		 * a reschedule if something was moved to us)
+		 */
+		if (this_rq->nr_running) {
+			spin_unlock(&busiest->lock);
+			goto out_balanced_locked;
+		}
+
 		ld_moved = move_tasks(this_rq, this_cpu, busiest,
 					imbalance, sd, CPU_NEWLY_IDLE,
 					&all_pinned);
 		spin_unlock(&busiest->lock);
 
-		if (unlikely(all_pinned && remain)) {
+		if (unlikely(all_pinned && remain && !this_rq->nr_running)) {
+			spin_unlock_irq(&this_rq->lock);
 			cpu_clear(cpu_of(busiest), cpus);
 			remain--;
 			goto redo;
 		}
-	}
+	} else
+		spin_lock_irq(&this_rq->lock);
 
 	if (!ld_moved) {
 		schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]);
@@ -3394,6 +3417,9 @@ redo:
 	return ld_moved;
 
 out_balanced:
+	spin_lock_irq(&this_rq->lock);
+
+out_balanced_locked:
 	schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]);
 	if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
 	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ