[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170617173105.GI3721@linux.vnet.ibm.com>
Date: Sat, 17 Jun 2017 10:31:05 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Tejun Heo <tj@...nel.org>
Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?
On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> Hello,
>
> On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > And no test failures from yesterday evening. So it looks like we get
> > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > runtime with your printk() in the mix.
> >
> > Was the above output from your printk() output of any help?
>
> Yeah, if my suspicion is correct, it'd require new kworker creation
> racing against CPU offline, which would explain why it's so difficult
> to repro. Can you please see whether the following patch resolves the
> issue?
That could explain why only Steve Rostedt and I saw the issue. As far
as I know, we are the only ones who regularly run CPU-hotplug stress
tests. ;-)
I have a weekend-long run going, but will give this a shot overnight on
Monday, Pacific Time. Thank you for putting it together, looking forward
to seeing what it does!
Thanx, Paul
> Thanks.
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 803c3bc274c4..1500217ce4b4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -980,8 +980,13 @@ struct migration_arg {
> static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
> struct task_struct *p, int dest_cpu)
> {
> - if (unlikely(!cpu_active(dest_cpu)))
> - return rq;
> + if (p->flags & PF_KTHREAD) {
> + if (unlikely(!cpu_online(dest_cpu)))
> + return rq;
> + } else {
> + if (unlikely(!cpu_active(dest_cpu)))
> + return rq;
> + }
>
> /* Affinity changed (again). */
> if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
>
Powered by blists - more mailing lists