lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 25 Aug 2011 09:20:51 -0400
From:	"Will Simoneau" <simoneau@....uri.edu>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, dipankar@...ibm.com
Subject: Re: 2.6.39.4: Oops in rcu_read_unlock_special()/_raw_spin_lock()

On 14:27 Wed 24 Aug     , Paul E. McKenney wrote:
> On Wed, Aug 24, 2011 at 05:19:07PM -0400, Will Simoneau wrote:
> > The below Oops/BUGs were captured on a serial console during a large
> > rsync job. I do not know of a way to reproduce the Oops, I've only seen
> > it once. Some recent changes have been made suspiciously close to the
> > exploding code, which makes me think that maybe 2.6.39-stable is lacking
> > some fixes? The following commits from Linus' git seem vaguely related,
> > although I have no idea how relevant they are to 2.6.39.4:
> > 
> >    ec433f0c (softirq,rcu: Inform RCU of irq_exit() activity)
> >    10f39bb1 (rcu: protect __rcu_read_unlock() against scheduler-using
> >              irq handlers)
> 
> If this failure mechanism really is the culprit, you should be able
> to make failure happen much more frequently by inserting a delay in
> __rcu_read_unlock() just prior to the call to rcu_read_unlock_special().
> I would suggest starting with a few tens to hundreds of microseconds
> worth of delay.
> 
> If this does make the failure reproducible, then it would make sense
> to try applying the two patches you identified.

Hmm. I tried adding progressively larger delays in the spot you
indicated. I went from 100uS to an entire 1S (!) and got no crash or
deadlock. The target runs at 40MHz so the delays do need to be
relatively long compared to modern machines.

My hardware breakpoint as well as printk tests confirm that
rcu_read_unlock_special() really does get called multiple times per
second, and the 1S delay makes it painfully obvious as well. But, no
dice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ