linux-kernel - Re: WARN_ON_ONCE() in process_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170505171159.GA10296@linux.vnet.ibm.com>
Date:   Fri, 5 May 2017 10:11:59 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     jiangshanlai@...il.com, linux-kernel@...r.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?

On Mon, May 01, 2017 at 11:58:19AM -0700, Paul E. McKenney wrote:
> On Mon, May 01, 2017 at 02:44:02PM -0400, Tejun Heo wrote:
> > Hello, Paul.
> > 
> > On Mon, May 01, 2017 at 11:38:07AM -0700, Paul E. McKenney wrote:
> > > On Mon, May 01, 2017 at 09:57:47AM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > > 
> > > > I am hitting this WARN_ON_ONCE() in process_one_work() and am wondering
> > > > what I did wrong to make this happen:
> > > 
> > > Oh, wait...  Rescuer, it says.  Might this be due to the fact that RCU's
> > > expedited grace periods block within a workqueue handler?  Might this
> > > in turn run the system out of workqueue kthreads?  If this is the likely
> > > cause, my approach would be to rework the expected-grace-period workqueue
> > > handler to return when waiting for the grace period to complete, and to
> > > replace the current wakeup with a schedule_work() or something similar.
> > 
> > That should be completely fine.  It could just be that the rescuer
> > path has a bug around CPU hotplug handling.  Can you please confirm
> > either way on the cpuset usage?
> 
> I have no explicit cpuset usage or affinity of the workqueue handlers
> themselves.
> 
> However, this is thus far only happening in CONFIG_NO_HZ_FULL=y runs, in
> this case, with the kernel boot parameter nohz_full=2-9 out of 16 CPUs.
> IIRC, this sets up a "housekeeping" cpuset that pushes normal tasks away
> from the nohz_full CPUs.
> 
> I do build with CONFIG_HOTPLUG_CPU=y, and the test does a lot of
> hotplugging.  Also, other kthreads (but again, not the workqueue handlers)
> do a lot of explicit CPU-affinity manipulation.

Just following up...  I have hit this bug a couple of times over the
past few days.  Anything I can do to help?

							Thanx, Paul