lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 24 Nov 2014 15:52:10 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:	Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	Josh Triplett <josh@...htriplett.org>,
	Frédéric Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context

On Mon, Nov 24, 2014 at 3:50 PM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Mon, Nov 24, 2014 at 03:35:55PM -0800, Andy Lutomirski wrote:
>> On Mon, Nov 24, 2014 at 3:31 PM, Paul E. McKenney
>> <paulmck@...ux.vnet.ibm.com> wrote:
>> > On Mon, Nov 24, 2014 at 02:57:54PM -0800, Paul E. McKenney wrote:
>> >> On Mon, Nov 24, 2014 at 02:36:18PM -0800, Andy Lutomirski wrote:
>> >> > On Mon, Nov 24, 2014 at 2:34 PM, Paul E. McKenney
>> >> > <paulmck@...ux.vnet.ibm.com> wrote:
>> >> > > On Mon, Nov 24, 2014 at 01:35:01PM -0800, Paul E. McKenney wrote:
>> >>
>> >> [ . . . ]
>> >>
>> >> > > And the following Promela model claims that your approach works.
>> >> > > Should I trust it?  ;-)
>> >> > >
>> >> >
>> >> > I think so.
>> >> >
>> >> > Want to write a patch?  If so, whose tree should it go in?  I can add
>> >> > it to my IST series, but that seems a bit odd.
>> >>
>> >> Working on it.  ;-)
>> >
>> > And here is a sneak preview of the patch.  Thoughts?
>> >
>> >                                                         Thanx, Paul
>> >
>> > ------------------------------------------------------------------------
>> >
>> > rcu: Make rcu_nmi_enter() handle nesting
>> >
>> > Andy Lutomirski is introducing ISTs into x86, which from RCU's
>> > viewpoint are NMIs.  Because ISTs and NMIs can nest, rcu_nmi_enter() and
>> > rcu_nmi_exit() must now correctly handle nesting.
>>
>> You must not be a frequent reader of entry_64.S and the Intel SDM :)
>> IOW, IST is just a stack switching mechanism, and these interrupts
>> have been around forever -- they're just buggy right now.
>>
>> How about:
>>
>> x86 has multiple types of NMI-like interrupts: real NMIs, machine
>> checks, and, for some values of NMI-like, debugging and breakpoint
>> interrupts.  These interrupts can nest inside each other.  Andy
>> Lutomirski is adding RCU support to these interrupts, so
>> rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting.
>>
>> Other than that, I like it.
>
> And here is the updated version.  Left to my normal workflow, this would
> go into 3.20.  Please let me know if you need it earlier, but in that
> case, I will need you to test it fairly soon.  (I don't have any way
> to test nested NMI-like things.)

Dunno.  Tony and Borislav -- when do you want the IST stack switching stuff?

--Andy

>
>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Make rcu_nmi_enter() handle nesting
>
> The x86 architecture has multiple types of NMI-like interrupts: real
> NMIs, machine checks, and, for some values of NMI-like, debugging
> and breakpoint interrupts.  These interrupts can nest inside each
> other.  Andy Lutomirski is adding RCU support to these interrupts,
> so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting.
>
> This commit therefore introduces nesting, using a clever NMI-coordination
> algorithm suggested by Andy.  The trick is to atomically increment
> ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry
> (and, accordingly, after on exit).  In addition, ->dynticks_nmi_nesting
> is incremented by one if ->dynticks was incremented and by two otherwise.
> This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal
> to one, it knows that ->dynticks must be atomically incremented.
>
> This NMI-coordination algorithms has been validated by the following
> Promela model, for whatever that might be worth:
>
> /*
>  * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter()
>  * that allows nesting.
>  *
>  * This program is free software; you can redistribute it and/or modify
>  * it under the terms of the GNU General Public License as published by
>  * the Free Software Foundation; either version 2 of the License, or
>  * (at your option) any later version.
>  *
>  * This program is distributed in the hope that it will be useful,
>  * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>  * GNU General Public License for more details.
>  *
>  * You should have received a copy of the GNU General Public License
>  * along with this program; if not, you can access it online at
>  * http://www.gnu.org/licenses/gpl-2.0.html.
>  *
>  * Copyright IBM Corporation, 2014
>  *
>  * Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>  */
>
> byte dynticks_nesting = 0;
> byte dynticks_nmi_nesting = 0;
> byte dynticks = 0;
>
> /*
>  * Promela verision of rcu_nmi_enter().
>  */
> inline rcu_nmi_enter()
> {
>         byte incby;
>
>         incby = 2;
>         assert(dynticks_nmi_nesting >= 0);
>         if
>         :: (dynticks & 1) == 0 ->
>                 atomic {
>                         dynticks = dynticks + 1;
>                 }
>                 assert((dynticks & 1) == 1);
>                 incby = 1;
>         :: else ->
>                 skip;
>         fi;
>         dynticks_nmi_nesting = dynticks_nmi_nesting + incby;
>         assert(dynticks_nmi_nesting >= 1);
> }
>
> /*
>  * Promela verision of rcu_nmi_exit().
>  */
> inline rcu_nmi_exit()
> {
>         assert(dynticks_nmi_nesting > 0);
>         assert((dynticks & 1) != 0);
>         if
>         :: dynticks_nmi_nesting != 1 ->
>                 dynticks_nmi_nesting = dynticks_nmi_nesting - 2;
>         :: else ->
>                 dynticks_nmi_nesting = 0;
>                 atomic {
>                         dynticks = dynticks + 1;
>                 }
>                 assert((dynticks & 1) == 0);
>         fi;
> }
>
> /*
>  * Base-level NMI runs non-atomically.  Crudely emulates process-level
>  * dynticks-idle entry/exit.
>  */
> proctype base_NMI()
> {
>         byte busy;
>
>         busy = 0;
>         do
>         ::      if
>                 :: 1 -> atomic {
>                                 dynticks = dynticks + 1;
>                         }
>                         busy = 0;
>                 :: 1 -> skip;
>                 fi;
>                 rcu_nmi_enter();
>                 assert((dynticks & 1) == 1);
>                 rcu_nmi_exit();
>                 if
>                 :: busy -> skip;
>                 :: !busy ->
>                         atomic {
>                                 dynticks = dynticks + 1;
>                         }
>                         busy = 1;
>                 fi;
>         od;
> }
>
> /*
>  * Nested NMI runs atomically to emulate interrupting base_level().
>  */
> proctype nested_NMI()
> {
>         do
>         ::      atomic {
>                         rcu_nmi_enter();
>                         assert((dynticks & 1) == 1);
>                         rcu_nmi_exit();
>                 }
>         od;
> }
>
> init {
>         run base_NMI();
>         run nested_NMI();
> }
>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8749f43f3f05..fc0236992655 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -759,39 +759,71 @@ void rcu_irq_enter(void)
>  /**
>   * rcu_nmi_enter - inform RCU of entry to NMI context
>   *
> - * If the CPU was idle with dynamic ticks active, and there is no
> - * irq handler running, this updates rdtp->dynticks_nmi to let the
> - * RCU grace-period handling know that the CPU is active.
> + * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
> + * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
> + * that the CPU is active.  This implementation permits nested NMIs, as
> + * long as the nesting level does not overflow an int.  (You will probably
> + * run out of stack space first.)
>   */
>  void rcu_nmi_enter(void)
>  {
>         struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> +       int incby = 2;
>
> -       if (rdtp->dynticks_nmi_nesting == 0 &&
> -           (atomic_read(&rdtp->dynticks) & 0x1))
> -               return;
> -       rdtp->dynticks_nmi_nesting++;
> -       smp_mb__before_atomic();  /* Force delay from prior write. */
> -       atomic_inc(&rdtp->dynticks);
> -       /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
> -       smp_mb__after_atomic();  /* See above. */
> -       WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> +       /* Complain about underflow. */
> +       WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
> +
> +       /*
> +        * If idle from RCU viewpoint, atomically increment ->dynticks
> +        * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> +        * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> +        * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> +        * to be in the outermost NMI handler that interrupted an RCU-idle
> +        * period (observation due to Andy Lutomirski).
> +        */
> +       if (!(atomic_read(&rdtp->dynticks) & 0x1)) {
> +               smp_mb__before_atomic();  /* Force delay from prior write. */
> +               atomic_inc(&rdtp->dynticks);
> +               /* atomic_inc() before later RCU read-side crit sects */
> +               smp_mb__after_atomic();  /* See above. */
> +               WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> +               incby = 1;
> +       }
> +       rdtp->dynticks_nmi_nesting += incby;
> +       barrier();
>  }
>
>  /**
>   * rcu_nmi_exit - inform RCU of exit from NMI context
>   *
> - * If the CPU was idle with dynamic ticks active, and there is no
> - * irq handler running, this updates rdtp->dynticks_nmi to let the
> - * RCU grace-period handling know that the CPU is no longer active.
> + * If we are returning from the outermost NMI handler that interrupted an
> + * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> + * to let the RCU grace-period handling know that the CPU is back to
> + * being RCU-idle.
>   */
>  void rcu_nmi_exit(void)
>  {
>         struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>
> -       if (rdtp->dynticks_nmi_nesting == 0 ||
> -           --rdtp->dynticks_nmi_nesting != 0)
> +       /*
> +        * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
> +        * (We are exiting an NMI handler, so RCU better be paying attention
> +        * to us!)
> +        */
> +       WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
> +       WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> +
> +       /*
> +        * If the nesting level is not 1, the CPU wasn't RCU-idle, so
> +        * leave it in non-RCU-idle state.
> +        */
> +       if (rdtp->dynticks_nmi_nesting != 1) {
> +               rdtp->dynticks_nmi_nesting -= 2;
>                 return;
> +       }
> +
> +       /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
> +       rdtp->dynticks_nmi_nesting = 0;
>         /* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
>         smp_mb__before_atomic();  /* See above. */
>         atomic_inc(&rdtp->dynticks);
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ