linux-kernel - Re: WARN_ON_ONCE() in process_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20180703040518.GV3593@linux.vnet.ibm.com>
Date:   Mon, 2 Jul 2018 21:05:18 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     jiangshanlai@...il.com, linux-kernel@...r.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?

On Mon, Jul 02, 2018 at 02:05:40PM -0700, Tejun Heo wrote:
> Hello, Paul.
> 
> Sorry about the late reply.
> 
> On Wed, Jun 20, 2018 at 12:29:01PM -0700, Paul E. McKenney wrote:
> > I have hit this WARN_ON_ONCE() in process_one_work:
> > 
> > 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > 		     raw_smp_processor_id() != pool->cpu);
> > 
> > This looks like it is my rcu_gp workqueue (see splat below), and it
> > appears to be intermittent.  This happens on rcutorture scenario SRCU-N,
> > which does random CPU-hotplug operations (in case that helps).
> > 
> > Is this related to the recent addition of WQ_MEM_RECLAIM?  Either way,
> > what should I do to further debug this?
> 
> Hmm... I checked the code paths but couldn't spot anything suspicious.
> Can you please apply the following patch and see whether it triggers
> before hitting the warn and if so report what it says?

I will apply this, but be advised that I have not seen that WARN_ON_ONCE()
trigger since.  :-/

							Thanx, Paul

> Thanks.
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 0db8938fbb23..81caab9643b2 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -79,6 +79,15 @@ static struct lockdep_map cpuhp_state_up_map =
>  static struct lockdep_map cpuhp_state_down_map =
>  	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
> 
> +int cpuhp_current_state(int cpu)
> +{
> +	return per_cpu_ptr(&cpuhp_state, cpu)->state;
> +}
> +
> +int cpuhp_target_state(int cpu)
> +{
> +	return per_cpu_ptr(&cpuhp_state, cpu)->target;
> +}
> 
>  static inline void cpuhp_lock_acquire(bool bringup)
>  {
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 78b192071ef7..365cf6342808 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1712,6 +1712,9 @@ static struct worker *alloc_worker(int node)
>  	return worker;
>  }
> 
> +int cpuhp_current_state(int cpu);
> +int cpuhp_target_state(int cpu);
> +
>  /**
>   * worker_attach_to_pool() - attach a worker to a pool
>   * @worker: worker to be attached
> @@ -1724,13 +1727,20 @@ static struct worker *alloc_worker(int node)
>  static void worker_attach_to_pool(struct worker *worker,
>  				   struct worker_pool *pool)
>  {
> +	int ret;
> +
>  	mutex_lock(&wq_pool_attach_mutex);
> 
>  	/*
>  	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
>  	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
>  	 */
> -	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +	ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +	if (ret && pool->cpu >= 0 && worker->rescue_wq)
> +		printk("XXX rescuer failed to attach: ret=%d pool=%d this_cpu=%d target_cpu=%d cpuhp_state=%d chuhp_target=%d\n",
> +		       ret, pool->id, raw_smp_processor_id(), pool->cpu,
> +		       cpuhp_current_state(pool->cpu),
> +		       cpuhp_target_state(pool->cpu));
> 
>  	/*
>  	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>