lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 17 May 2011 01:52:41 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Yinghai Lu <yinghai@...nel.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL rcu/next] rcu commits for 2.6.40

On Mon, May 16, 2011 at 02:24:49PM -0700, Paul E. McKenney wrote:
> On Mon, May 16, 2011 at 02:23:29PM +0200, Ingo Molnar wrote:
> > 
> > * Ingo Molnar <mingo@...e.hu> wrote:
> > 
> > > > In the meantime, would you be willing to try out the patch at 
> > > > https://lkml.org/lkml/2011/5/14/89?  This patch helped out Yinghai in 
> > > > several configurations.
> > > 
> > > Wasn't this the one i tested - or is it a new iteration?
> > > 
> > > I'll try it in any case.
> > 
> > oh, this was a new iteration, mea culpa!
> > 
> > And yes, it solves all problems for me as well. Mind pushing it as a fix? :-)
> 
> ;-)
> 
> Unfortunately, the only reason I can see that it works is (1) there
> is some obscure bug in my code or (2) someone somewhere is failing to
> call irq_exit() on some interrupt-exit path.  Much as I might be tempted
> to paper this one over, I believe that we do need to find whatever the
> underlying bug is.
> 
> Oh, yes, there is option (3) as well: maybe if an interrupt deschedules
> a process, the final irq_exit() is omitted in favor of rcu_enter_nohz()?
> But I couldn't see any evidence of this in my admittedly cursory scan
> of the x86 interrupt-handling code.
> 
> So until I learn differently, I am assuming that each and every
> irq_enter() has a matching call to irq_exit(), and that rcu_enter_nohz()
> is called after the final irq_exit() of a given burst of interrupts.
> 
> If my assumptions are mistaken, please do let me know!

About 2), I believe that such an unpairing would have been detected before
your whole patchset was merged.
For example if an interrupt failed to call rcu_irq_exit(), we would have
found cases where we have:

rcu_enter_nohz()
<irq>
	rcu_irq_enter()
</irq>
rcu_exit_nohz()

And then that last call would trigger "WARN_ON_ONCE(!(rdtp->dynticks & 0x1))".

But may be there was a patch in your set that touched one of these rcu_irq_...
callsites.

About 3), it shouldn't happen because preempt_schedule_irq() is called in the
exit path of the low level interrupt handler. rcu_exit_irq() is called from
the higher level, before resuming to the low level.

That said there might be something nasty that the old checks in the QS APIs
were missing.

I think it would be nice to add some checks in rcu-lockdep inside
rcu_read_lock()/rcu_dereference() to ensure rdp->dynticks is not even, ie
that we are not in an extended qs. That's something I planned to add for
my next nohz tasks patchset version, because I bring more dance with the
extended quiescent state, but given the problems we are facing today, it
may be better sooner.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ