lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 18 Jun 2017 06:40:00 -0400 From: Tejun Heo <tj@...nel.org> To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? Hello, On Sat, Jun 17, 2017 at 10:31:05AM -0700, Paul E. McKenney wrote: > On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote: > > Hello, > > > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote: > > > And no test failures from yesterday evening. So it looks like we get > > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture > > > runtime with your printk() in the mix. > > > > > > Was the above output from your printk() output of any help? > > > > Yeah, if my suspicion is correct, it'd require new kworker creation > > racing against CPU offline, which would explain why it's so difficult > > to repro. Can you please see whether the following patch resolves the > > issue? > > That could explain why only Steve Rostedt and I saw the issue. As far > as I know, we are the only ones who regularly run CPU-hotplug stress > tests. ;-) I was a bit confused. It has to be racing against either new kworker being created on the wrong CPU or rescuer trying to migrate to the CPU, and it looks like we're mostly seeing the rescuer condition, but, yeah, this would only get triggered rarely. Another contributing factor could be the vmstat work putting on a workqueue w/ rescuer recently. It runs quite often, so probably has increased the chance of hitting the right condition. > I have a weekend-long run going, but will give this a shot overnight on > Monday, Pacific Time. Thank you for putting it together, looking forward > to seeing what it does! Thanks a lot for the testing and patience. Sorry that it took so long. I'm not completely sure the patch is correct. It might have to be more specifc about which type of migration or require further synchronization around migration, but hopefully it'll at least be able to show that this was the cause of the problem. Thanks! -- tejun
Powered by blists - more mailing lists