[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 10 Oct 2016 14:02:21 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Wanpeng Li <kernellwp@...il.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Wanpeng Li <wanpeng.li@...mail.com>,
Ingo Molnar <mingo@...nel.org>, Mike Galbraith <efault@....de>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/core: Fix kick offline cpu to do nohz idle load
balance
On Mon, Oct 10, 2016 at 04:34:48PM +0800, Wanpeng Li wrote:
> > If there is a need to kick the idle load balancer, an ILB will be selected
> > to perform nohz idle load balance, however, if the selected ILB is in the
> > process of offline, smp_sched_reschedule() which generates a sched IPI will
> > splat as above.
> >
> > CPU0 CPU1
> >
> > find_new_ilb()
> > set_rq_offline()
> > smp_sched_reschedule() Oops
> > nohz_balance_exit_idle()
> >
> > This patch fix it by exiting nohz idle balance before set cpu offline.
>
> CPU 0 CPU1
>
> find_new_ilb()
> nohz_balance_exit_idle()
> set_rq_offline()
> smp_sched_reschedule()
>
> It seems that the patch still can't avoid this race, so any proposal
> is a great appreciated. :)
Not sure how this can happen, scheduler_tick() -> trigger_load_balance()
-> nohz_balancer_kick() is called with IRQs disabled, this too implies a
RCU-sched read side section.
And hotplug explicitly includes a rcu_sync_sched().
It would be find_new_ilb() is 'broken' in that it considers !active
CPUs. That's not immediately obvious.
Powered by blists - more mailing lists