lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1319060737.2604.38.camel@schen9-DESK>
Date:	Wed, 19 Oct 2011 14:45:37 -0700
From:	Tim Chen <tim.c.chen@...ux.intel.com>
To:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Venki Pallipadi <venki@...gle.com>
Subject: [Patch] Idle balancer: cache align nohz structure to improve idle
 load balancing scalability

Idle load balancing makes use of a global structure nohz to keep track
of the cpu doing the idle load balancing, first and second busy cpu and
the cpus that are idle.  This leads to scalability issue.

For workload that has processes waking up and going to sleep often, the 
load_balancer, first_pick_cpu, second_cpu and idle_cpus_mask in the
no_hz structure get updated very frequently. This causes lots of cache
bouncing and slowing down the idle and wakeup path for large system with
many cores/sockets.  This is evident from up to 41% of cpu cycles spent
in the function select_nohz_load_balancer from a test work load I ran.
By putting these fields in their own cache line, the problem can be
mitigated.

The test workload has multiple pairs of processes. Within a process
pair, each process receive and then send message back and forth to the
other process via a pipe connecting them. So at any one time, half the
processes are active.

I found that for 32 pairs of processes, I got an increase of the rate of
context switching between the processes by 37% and by 24% for 64 process
pairs. The test was run on a 8 socket 64 cores NHM-EX system, where
hyper-threading has been turned on.

Tim

Workload cpu cycle profile on vanilla kernel:
41.19%          swapper  [kernel.kallsyms]          [k] select_nohz_load_balancer   
   - select_nohz_load_balancer                                                       
      + 54.91% tick_nohz_restart_sched_tick                                         
      + 45.04% tick_nohz_stop_sched_tick     
18.96%          swapper  [kernel.kallsyms]          [k] mwait_idle_with_hints        
 3.50%          swapper  [kernel.kallsyms]          [k] tick_nohz_restart_sched_tick 
 3.36%          swapper  [kernel.kallsyms]          [k] tick_check_idle              
 2.96%          swapper  [kernel.kallsyms]          [k] rcu_enter_nohz               
 2.40%          swapper  [kernel.kallsyms]          [k] _raw_spin_lock               
 2.11%          swapper  [kernel.kallsyms]          [k] tick_nohz_stop_sched_tick    


Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index bc8ee99..26ea877 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3639,10 +3639,10 @@ static inline void init_sched_softirq_csd(struct call_single_data *csd)
  *   load balancing for all the idle CPUs.
  */
 static struct {
-	atomic_t load_balancer;
-	atomic_t first_pick_cpu;
-	atomic_t second_pick_cpu;
-	cpumask_var_t idle_cpus_mask;
+	atomic_t load_balancer ____cacheline_aligned;
+	atomic_t first_pick_cpu ____cacheline_aligned;
+	atomic_t second_pick_cpu ____cacheline_aligned;
+	cpumask_var_t idle_cpus_mask ____cacheline_aligned;
 	cpumask_var_t grp_idle_mask;
 	unsigned long next_balance;     /* in jiffy units */
 } nohz ____cacheline_aligned;











--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ