lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 20 Jun 2017 09:45:23 -0700 From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> To: Tejun Heo <tj@...nel.org> Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? On Sun, Jun 18, 2017 at 06:40:00AM -0400, Tejun Heo wrote: > Hello, > > On Sat, Jun 17, 2017 at 10:31:05AM -0700, Paul E. McKenney wrote: > > On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote: > > > Hello, > > > > > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote: > > > > And no test failures from yesterday evening. So it looks like we get > > > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture > > > > runtime with your printk() in the mix. > > > > > > > > Was the above output from your printk() output of any help? > > > > > > Yeah, if my suspicion is correct, it'd require new kworker creation > > > racing against CPU offline, which would explain why it's so difficult > > > to repro. Can you please see whether the following patch resolves the > > > issue? > > > > That could explain why only Steve Rostedt and I saw the issue. As far > > as I know, we are the only ones who regularly run CPU-hotplug stress > > tests. ;-) > > I was a bit confused. It has to be racing against either new kworker > being created on the wrong CPU or rescuer trying to migrate to the > CPU, and it looks like we're mostly seeing the rescuer condition, but, > yeah, this would only get triggered rarely. Another contributing > factor could be the vmstat work putting on a workqueue w/ rescuer > recently. It runs quite often, so probably has increased the chance > of hitting the right condition. Sounds like too much fun! ;-) But more constructively... If I understand correctly, it is now possible to take a CPU partially offline and put it back online again. This should allow much more intense testing of this sort of interaction. And no, I haven't yet tried this with RCU because I would probably need to do some mix of just-RCU online/offline and full-up online-offline. Plus RCU requires pretty much a full online/offline cycle to fully exercise it. :-/ > > I have a weekend-long run going, but will give this a shot overnight on > > Monday, Pacific Time. Thank you for putting it together, looking forward > > to seeing what it does! > > Thanks a lot for the testing and patience. Sorry that it took so > long. I'm not completely sure the patch is correct. It might have to > be more specifc about which type of migration or require further > synchronization around migration, but hopefully it'll at least be able > to show that this was the cause of the problem. And last night's tests had no failures. Which might actually mean something, will get more info when I run without your patch this evening. ;-) Thanx, Paul
Powered by blists - more mailing lists