[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110515185547.GL2258@linux.vnet.ibm.com>
Date: Sun, 15 May 2011 11:55:47 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Yong Zhang <yong.zhang0@...il.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Oleg Nesterov <oleg@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>, Li Zefan <lizf@...fujitsu.com>,
Miao Xie <miaox@...fujitsu.com>
Subject: Re: [PATCH 1/2] cpuset: fix cpuset_cpus_allowed_fallback() don't
update tsk->rt.nr_cpus_allowed
On Fri, May 13, 2011 at 02:42:48PM +0800, Yong Zhang wrote:
> On Fri, May 13, 2011 at 1:48 PM, KOSAKI Motohiro
> <kosaki.motohiro@...fujitsu.com> wrote:
> > Hi
> >
> >> On Mon, 2011-05-02 at 19:55 +0900, KOSAKI Motohiro wrote:
> >>>
> >>> The rule is, we have to update tsk->rt.nr_cpus_allowed too if we change
> >>> tsk->cpus_allowed. Otherwise RT scheduler may confuse.
> >>>
> >>> This patch fixes it.
> >>>
> >>> btw, system_state checking is very important. current boot sequence is
> >>> (1) smp_init
> >>> (ie secondary cpus up and created cpu bound kthreads). (2)
> >>> sched_init_smp().
> >>> Then following bad scenario can be happen,
> >>>
> >>> (1) cpuup call notifier(CPU_UP_PREPARE)
> >>> (2) A cpu notifier consumer create FIFO kthread
> >>> (3) It call kthread_bind()
> >>> ... but, now secondary cpu haven't ONLINE
> >>
> >> isn't
> >
> > thanks, correction.
> >
> >>
> >>> (3) schedule() makes fallback and cpuset_cpus_allowed_fallback
> >>> change task->cpus_allowed
> >>
> >> I'm failing to see how this is happening, surely that kthread isn't
> >> actually running that early?
> >
> > If my understand correctly, current call graph is below.
> >
> > kernel_init()
> > smp_init();
> > cpu_up()
> > ... cpu hotplug notification
> > kthread_create()
> > sched_init_smp();
> >
> >
> > So, cpu hotplug event is happen before sched_init_smp(). The old rule is,
> > all kthreads of using cpu-up notification have to use kthread_bind(). It
> > protected from sched load balance.
> >
> > but, now cpuset_cpus_allowed_fallback() forcedly change kthread's cpumask.
> > Why is this works? the point are two.
> >
> > - rcu_cpu_kthread_should_stop() call set_cpus_allowed_ptr() again
> > periodically.
> > then, it can reset cpumask if cpuset_cpus_allowed_fallback() change it.
> > my debug print obseve following cpumask change occur at boot time.
> > 1) kthread_bind: bind cpu1
> > 2) cpuset_cpus_allowed_fallback: bind possible cpu
> > 3) rcu_cpu_kthread_should_stop: rebind cpu1
> > - while tsk->rt.nr_cpus_allowed == 1, sched load balancer never be crash.
>
> Seems rcu_spawn_one_cpu_kthread() call wake_up_process() directly,
> which is under hotplug event CPU_UP_PREPARE. Maybe it should be
> under CPU_ONLINE.
Sorry, but this does not work. The kthread must be running by the time
the CPU appears, otherwise RCU grace periods in CPU_ONLINE notifiers
will never complete.
This did turn out to be a scheduler bug -- see Cheng Xu's recent patch.
(chengxu@...ux.vnet.ibm.com)
Thanx, Paul
> Thanks,
> Yong
>
> >
> >
> >>
> >>> (4) find_lowest_rq() touch local_cpu_mask if task->rt.nr_cpus_allowed !=
> >>> 1,
> >>> but it haven't been initialized.
> >>>
> >>> RCU folks plan to introduce such FIFO kthread and our testing hitted the
> >>> above issue. Then this patch also protect it.
> >>
> >> I'm fairly sure it doesn't, normal cpu-hotplug doesn't poke at
> >> system_state.
> >
> > If my understand correctly. it's pure scheduler issue. because
> >
> > - rcuc keep the old rule (ie an early spawned kthread have to call
> > kthread_bind)
> > - cpuset_cpus_allowed_fallback() is called from scheduler internal
> > - crash is happen in find_lowest_rq(). (following line)
> >
> >
> > static int find_lowest_rq(struct task_struct *task)
> > {
> > (snip)
> > if (cpumask_test_cpu(cpu, lowest_mask)) // HERE
> >
> >
> > IOW, I think cpuset_cpus_allowed_fallback() should NOT be changed
> > tsk->cpus_allowed until to finish sched_init_smp().
> >
> > Do you have an any alternative idea for this?
> >
> >
> >>> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@...fujitsu.com>
> >>> Cc: Oleg Nesterov<oleg@...hat.com>
> >>> Cc: Peter Zijlstra<a.p.zijlstra@...llo.nl>
> >>> Cc: Ingo Molnar<mingo@...e.hu>
> >>> ---
> >>> include/linux/cpuset.h | 1 +
> >>> kernel/cpuset.c | 1 +
> >>> kernel/sched.c | 4 ++++
> >>> 3 files changed, 6 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> >>> index f20eb8f..42dcbdc 100644
> >>> --- a/include/linux/cpuset.h
> >>> +++ b/include/linux/cpuset.h
> >>> @@ -147,6 +147,7 @@ static inline void cpuset_cpus_allowed(struct
> >>> task_struct *p,
> >>> static inline int cpuset_cpus_allowed_fallback(struct task_struct *p)
> >>> {
> >>> cpumask_copy(&p->cpus_allowed, cpu_possible_mask);
> >>> + p->rt.nr_cpus_allowed = cpumask_weight(&p->cpus_allowed);
> >>> return cpumask_any(cpu_active_mask);
> >>> }
> >>>
> >>> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> >>> index 1ceeb04..6e5bbe8 100644
> >>> --- a/kernel/cpuset.c
> >>> +++ b/kernel/cpuset.c
> >>> @@ -2220,6 +2220,7 @@ int cpuset_cpus_allowed_fallback(struct task_struct
> >>> *tsk)
> >>> cpumask_copy(&tsk->cpus_allowed, cpu_possible_mask);
> >>> cpu = cpumask_any(cpu_active_mask);
> >>> }
> >>> + tsk->rt.nr_cpus_allowed = cpumask_weight(&tsk->cpus_allowed);
> >>>
> >>> return cpu;
> >>> }
> >>
> >> I don't really see the point of doing this separately from your second
> >> patch, please fold them.
> >
> > Ok. Will do.
> >
> >>
> >>> diff --git a/kernel/sched.c b/kernel/sched.c
> >>> index fd4625f..bfcd219 100644
> >>> --- a/kernel/sched.c
> >>> +++ b/kernel/sched.c
> >>> @@ -2352,6 +2352,10 @@ static int select_fallback_rq(int cpu, struct
> >>> task_struct *p)
> >>> if (dest_cpu< nr_cpu_ids)
> >>> return dest_cpu;
> >>>
> >>> + /* Don't worry. It's temporary mismatch. */
> >>> + if (system_state< SYSTEM_RUNNING)
> >>> + return cpu;
> >>> +
> >>> /* No more Mr. Nice Guy. */
> >>> dest_cpu = cpuset_cpus_allowed_fallback(p);
> >>> /*
> >>
> >> Like explained, I don't believe this actually fixes your problem (its
> >> also disgusting).
> >
> > If anybody have an alternative idea, I have no reason to refuse it.
> >
> > Thanks.
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
>
> --
> Only stand for myself
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists