[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180703040518.GV3593@linux.vnet.ibm.com>
Date: Mon, 2 Jul 2018 21:05:18 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Tejun Heo <tj@...nel.org>
Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?
On Mon, Jul 02, 2018 at 02:05:40PM -0700, Tejun Heo wrote:
> Hello, Paul.
>
> Sorry about the late reply.
>
> On Wed, Jun 20, 2018 at 12:29:01PM -0700, Paul E. McKenney wrote:
> > I have hit this WARN_ON_ONCE() in process_one_work:
> >
> > WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > raw_smp_processor_id() != pool->cpu);
> >
> > This looks like it is my rcu_gp workqueue (see splat below), and it
> > appears to be intermittent. This happens on rcutorture scenario SRCU-N,
> > which does random CPU-hotplug operations (in case that helps).
> >
> > Is this related to the recent addition of WQ_MEM_RECLAIM? Either way,
> > what should I do to further debug this?
>
> Hmm... I checked the code paths but couldn't spot anything suspicious.
> Can you please apply the following patch and see whether it triggers
> before hitting the warn and if so report what it says?
I will apply this, but be advised that I have not seen that WARN_ON_ONCE()
trigger since. :-/
Thanx, Paul
> Thanks.
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 0db8938fbb23..81caab9643b2 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -79,6 +79,15 @@ static struct lockdep_map cpuhp_state_up_map =
> static struct lockdep_map cpuhp_state_down_map =
> STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
>
> +int cpuhp_current_state(int cpu)
> +{
> + return per_cpu_ptr(&cpuhp_state, cpu)->state;
> +}
> +
> +int cpuhp_target_state(int cpu)
> +{
> + return per_cpu_ptr(&cpuhp_state, cpu)->target;
> +}
>
> static inline void cpuhp_lock_acquire(bool bringup)
> {
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 78b192071ef7..365cf6342808 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1712,6 +1712,9 @@ static struct worker *alloc_worker(int node)
> return worker;
> }
>
> +int cpuhp_current_state(int cpu);
> +int cpuhp_target_state(int cpu);
> +
> /**
> * worker_attach_to_pool() - attach a worker to a pool
> * @worker: worker to be attached
> @@ -1724,13 +1727,20 @@ static struct worker *alloc_worker(int node)
> static void worker_attach_to_pool(struct worker *worker,
> struct worker_pool *pool)
> {
> + int ret;
> +
> mutex_lock(&wq_pool_attach_mutex);
>
> /*
> * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> * online CPUs. It'll be re-applied when any of the CPUs come up.
> */
> - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> + ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> + if (ret && pool->cpu >= 0 && worker->rescue_wq)
> + printk("XXX rescuer failed to attach: ret=%d pool=%d this_cpu=%d target_cpu=%d cpuhp_state=%d chuhp_target=%d\n",
> + ret, pool->id, raw_smp_processor_id(), pool->cpu,
> + cpuhp_current_state(pool->cpu),
> + cpuhp_target_state(pool->cpu));
>
> /*
> * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>
Powered by blists - more mailing lists