lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180622181422.GT3593@linux.vnet.ibm.com>
Date:   Fri, 22 Jun 2018 11:14:22 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        Byungchul Park <max.byungchul.park@...il.com>,
        Byungchul Park <byungchul.park@....com>,
        jiangshanlai@...il.com, josh@...htriplett.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        linux-kernel@...r.kernel.org, kernel-team@....com, luto@...nel.org
Subject: Re: [RFC 2/2] rcu: Remove ->dynticks_nmi_nesting from struct
 rcu_dynticks

On Fri, Jun 22, 2018 at 12:01:49PM -0400, Steven Rostedt wrote:
> On Fri, 22 Jun 2018 06:28:43 -0700
> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> 
> > It has been some years since I traced the code flow, but what happened
> > back then is that it switches itself from an interrupt handler to not
> > without actually returning from the interrupt.  This can only happen when
> > interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> > relies on this restriction.  If I remember correctly, the code ends up
> > executing in the context of the interrupted process, but it has been some
> > years, so please apply appropriate skepticism.
> 
> If irq_enter() is not paired with irq_exit() then major things will
> break. Especially since that's how in_interrupt() and friends rely on to
> work.
> 
> Now, perhaps rcu_irq_enter() is called elsewhere (as a git grep appears
> it may be), and that rcu_irq_enter() may not be paired with
> rcu_irq_exit(). But that's not anything to do with the irq_enter() and
> irq_exit() routines being paired or not.

The non-irq_enter() calls to rcu_irq_enter() and the non-irq_exit()
calls to rcu_irq_exit() do appear to be balanced as of v4.17.

If I recall correctly, the offending piece of functionality was the
usermode helpers, which on some architectures did a syscall exception
from within the kernel to make a system call happen.  This seems to now
be common code using workqueues, kernel threads, and do_execve().
Here is the best reference I could find to the specific problem
I encountered back in the day:

https://groups.google.com/forum/#!msg/linux.kernel/B5hZX1tJRs8/sOVVfhrirL8J

I do recall that there were real failures.  There is no way I would have
written code tolerating half-interrupts without cause, no more than I
would have written code handling what looks to RCU like interrupts from
NMI handlers without cause.  ;-)

One approach would be for me to add a WARN_ON_ONCE() to check for
misnesting.  If this didn't trigger for some time long enough for the
check to propagate to the various distros' users, then this code could
be simplified.  Though it would not be that big a deal, just the removal
of a store or two.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ