lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170618104000.GC28042@htj.duckdns.org>
Date:   Sun, 18 Jun 2017 06:40:00 -0400
From:   Tejun Heo <tj@...nel.org>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     jiangshanlai@...il.com, linux-kernel@...r.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?

Hello,

On Sat, Jun 17, 2017 at 10:31:05AM -0700, Paul E. McKenney wrote:
> On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> > Hello,
> > 
> > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > > And no test failures from yesterday evening.  So it looks like we get
> > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > > runtime with your printk() in the mix.
> > >
> > > Was the above output from your printk() output of any help?
> > 
> > Yeah, if my suspicion is correct, it'd require new kworker creation
> > racing against CPU offline, which would explain why it's so difficult
> > to repro.  Can you please see whether the following patch resolves the
> > issue?
> 
> That could explain why only Steve Rostedt and I saw the issue.  As far
> as I know, we are the only ones who regularly run CPU-hotplug stress
> tests.  ;-)

I was a bit confused.  It has to be racing against either new kworker
being created on the wrong CPU or rescuer trying to migrate to the
CPU, and it looks like we're mostly seeing the rescuer condition, but,
yeah, this would only get triggered rarely.  Another contributing
factor could be the vmstat work putting on a workqueue w/ rescuer
recently.  It runs quite often, so probably has increased the chance
of hitting the right condition.

> I have a weekend-long run going, but will give this a shot overnight on
> Monday, Pacific Time.  Thank you for putting it together, looking forward
> to seeing what it does!

Thanks a lot for the testing and patience.  Sorry that it took so
long.  I'm not completely sure the patch is correct.  It might have to
be more specifc about which type of migration or require further
synchronization around migration, but hopefully it'll at least be able
to show that this was the cause of the problem.

Thanks!

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ