linux-kernel - Re: 2.6.39.4: Oops in rcu_read_unlock_special()/_raw_spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 25 Aug 2011 09:20:51 -0400
From:	"Will Simoneau" <simoneau@....uri.edu>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	linux-kernel@...r.kernel.org, dipankar@...ibm.com
Subject: Re: 2.6.39.4: Oops in rcu_read_unlock_special()/_raw_spin_lock()

On 14:27 Wed 24 Aug     , Paul E. McKenney wrote:
> On Wed, Aug 24, 2011 at 05:19:07PM -0400, Will Simoneau wrote:
> > The below Oops/BUGs were captured on a serial console during a large
> > rsync job. I do not know of a way to reproduce the Oops, I've only seen
> > it once. Some recent changes have been made suspiciously close to the
> > exploding code, which makes me think that maybe 2.6.39-stable is lacking
> > some fixes? The following commits from Linus' git seem vaguely related,
> > although I have no idea how relevant they are to 2.6.39.4:
> > 
> >    ec433f0c (softirq,rcu: Inform RCU of irq_exit() activity)
> >    10f39bb1 (rcu: protect __rcu_read_unlock() against scheduler-using
> >              irq handlers)
> 
> If this failure mechanism really is the culprit, you should be able
> to make failure happen much more frequently by inserting a delay in
> __rcu_read_unlock() just prior to the call to rcu_read_unlock_special().
> I would suggest starting with a few tens to hundreds of microseconds
> worth of delay.
> 
> If this does make the failure reproducible, then it would make sense
> to try applying the two patches you identified.

Hmm. I tried adding progressively larger delays in the spot you
indicated. I went from 100uS to an entire 1S (!) and got no crash or
deadlock. The target runs at 40MHz so the delays do need to be
relatively long compared to modern machines.

My hardware breakpoint as well as printk tests confirm that
rcu_read_unlock_special() really does get called multiple times per
second, and the 1S delay makes it painfully obvious as well. But, no
dice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/