lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190315134620.GE3378@google.com>
Date:   Fri, 15 Mar 2019 09:46:20 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Byungchul Park <byungchul.park@....com>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
        jiangshanlai@...il.com, dipankar@...ibm.com,
        mathieu.desnoyers@...icios.com, josh@...htriplett.org,
        rostedt@...dmis.org, luto@...nel.org, kernel-team@....com
Subject: Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect
 half-interrupts

On Fri, Mar 15, 2019 at 04:44:52PM +0900, Byungchul Park wrote:
> On 03/15/2019 04:31 PM, Byungchul Park wrote:
> > On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > > > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > > > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > > > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > > > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > > > on the other.  These things really did happen at one time, as evidenced
> > > > by this ca-2011 LKML post:
> > > > 
> > > > http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> > > > 
> > > > The reason why RCU tolerates half-interrupts is that usermode helpers
> > > > used exceptions to invoke a system call from within the kernel such that
> > > > the system call did a normal return (not a return from exception) to
> > > > the calling context.  This caused rcu_irq_enter() to be invoked without
> > > > a matching rcu_irq_exit().  However, usermode helpers have since been
> > > > rewritten to make much more housebroken use of workqueues, kernel threads,
> > > > and do_execve(), and therefore should no longer produce half-interrupts.
> > > > No one knows of any other source of half-interrupts, but then again,
> > > > no one seems insane enough to go audit the entire kernel to verify that
> > > > half-interrupts really are a relic of the past.
> > > > 
> > > > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > > > trigger in the presence of half interrupts, which the code will continue
> > > > to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> > > > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > > > would be a considerable simplification.
> > > Hi Paul and everyone,
> > > I was thinking some more about this patch and whether we can simplify this code
> > > much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> > > keep it fresh in memory is a good idea ;-)
> > > 
> > > To me it seems we cannot easily combine the counters (dynticks_nesting and
> > > dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> > > half-interrupt scenario (assuming simplication means counter combining like
> > > Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> > > 2 counters need to be tracked separately as they are used differently in the
> > > following function:
> > Hi Joel and Paul,
> > 
> > I always love the way to logically approach problems so I'm a fan of
> > all your works :) But I'm JUST curious about something here. Why can't
> > we combine them the way I tried even if we confirm no possibility of
> > half-interrupt? IMHO, the only thing we want to know through calling
> > rcu_is_cpu_rrupt_from_idle() is whether the interrupt comes from
> > RCU-idle or not - of course assuming the caller context always be an
> > well-defined interrupt context like e.g. the tick handler.
> > 
> > So the function can return true if the caller is within a RCU-idle
> > region except a well-known single interrupt nested.
> > 
> > Of course, now that we cannot confirm it yet, the crowbar is necessary.
> > But does it still have a problem even after confirming it? Why? What am
> > I missing? Could you explain why for me? :(
> 
> 
> Did you also want to consider the case the function is called from others
> than
> well-known interrupt contexts? If yes, then I agree with you, there doesn't
> seem to be the kind of code and it's not a good idea to let the function be
> called generally though.

We were discussing exactly this on the thread. I am going to be adding a
lockdep check to make sure the function isn't called from anywhere but an
interrupt context. Then once we can confirm that there are no more
half-interrupts in the future, we can apply your counter combining approach.

Based on the comments on the rrupt_from_idle() function, it wasn't clear to
me if the function was intended to be called from any context. That's why I
thought the counter combing approach would not work, but you are right - it
should work.

I will CC you on the lockdep check patch. Also hope you are subscribed to
the new shiny rcu@...r.kernel.org list as well :-)

thanks,

 - Joel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ