[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1319084319.8416.38.camel@edumazet-laptop>
Date: Thu, 20 Oct 2011 06:18:39 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Venki Pallipadi <venki@...gle.com>
Subject: Re: [Patch] Idle balancer: cache align nohz structure to improve
idle load balancing scalability
Le mercredi 19 octobre 2011 à 14:45 -0700, Tim Chen a écrit :
> Idle load balancing makes use of a global structure nohz to keep track
> of the cpu doing the idle load balancing, first and second busy cpu and
> the cpus that are idle. This leads to scalability issue.
>
> For workload that has processes waking up and going to sleep often, the
> load_balancer, first_pick_cpu, second_cpu and idle_cpus_mask in the
> no_hz structure get updated very frequently. This causes lots of cache
> bouncing and slowing down the idle and wakeup path for large system with
> many cores/sockets. This is evident from up to 41% of cpu cycles spent
> in the function select_nohz_load_balancer from a test work load I ran.
> By putting these fields in their own cache line, the problem can be
> mitigated.
>
> The test workload has multiple pairs of processes. Within a process
> pair, each process receive and then send message back and forth to the
> other process via a pipe connecting them. So at any one time, half the
> processes are active.
>
> I found that for 32 pairs of processes, I got an increase of the rate of
> context switching between the processes by 37% and by 24% for 64 process
> pairs. The test was run on a 8 socket 64 cores NHM-EX system, where
> hyper-threading has been turned on.
>
> Tim
>
> Workload cpu cycle profile on vanilla kernel:
> 41.19% swapper [kernel.kallsyms] [k] select_nohz_load_balancer
> - select_nohz_load_balancer
> + 54.91% tick_nohz_restart_sched_tick
> + 45.04% tick_nohz_stop_sched_tick
> 18.96% swapper [kernel.kallsyms] [k] mwait_idle_with_hints
> 3.50% swapper [kernel.kallsyms] [k] tick_nohz_restart_sched_tick
> 3.36% swapper [kernel.kallsyms] [k] tick_check_idle
> 2.96% swapper [kernel.kallsyms] [k] rcu_enter_nohz
> 2.40% swapper [kernel.kallsyms] [k] _raw_spin_lock
> 2.11% swapper [kernel.kallsyms] [k] tick_nohz_stop_sched_tick
>
>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index bc8ee99..26ea877 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3639,10 +3639,10 @@ static inline void init_sched_softirq_csd(struct call_single_data *csd)
> * load balancing for all the idle CPUs.
> */
> static struct {
> - atomic_t load_balancer;
> - atomic_t first_pick_cpu;
> - atomic_t second_pick_cpu;
> - cpumask_var_t idle_cpus_mask;
> + atomic_t load_balancer ____cacheline_aligned;
> + atomic_t first_pick_cpu ____cacheline_aligned;
> + atomic_t second_pick_cpu ____cacheline_aligned;
> + cpumask_var_t idle_cpus_mask ____cacheline_aligned;
> cpumask_var_t grp_idle_mask;
> unsigned long next_balance; /* in jiffy units */
> } nohz ____cacheline_aligned;
>
Dont you increase cache footprint, say for an Uniprocessor machine ?
(CONFIG_SMP=n)
____cacheline_aligned_in_smp seems more suitable in this case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists