linux-kernel - Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141124205441.GW5050@linux.vnet.ibm.com>
Date:	Mon, 24 Nov 2014 12:54:41 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	Josh Triplett <josh@...htriplett.org>,
	Frédéric Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST
 context

On Mon, Nov 24, 2014 at 12:22:13PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 22, 2014 at 3:41 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Fri, Nov 21, 2014 at 09:53:29PM -0800, Andy Lutomirski wrote:
> >> On Fri, Nov 21, 2014 at 8:20 PM, Paul E. McKenney
> >> <paulmck@...ux.vnet.ibm.com> wrote:
> >> > On Fri, Nov 21, 2014 at 06:00:14PM -0800, Andy Lutomirski wrote:
> >> >> On Fri, Nov 21, 2014 at 3:38 PM, Paul E. McKenney
> >> >> <paulmck@...ux.vnet.ibm.com> wrote:
> 
> > Returning state sounds like a bad idea, if we can reasonably avoid it.
> 
> I agree, except that we already do it for exception_enter(), etc.  But
> yes, changing fewer things is nice.
> 
> >
> > And I think I finally see what you are pointing out about my code: If
> > another NMI comes in between the time I increment ->dynticks_nmi_nesting
> > and the time I atomically increment ->dynticks, the nested NMI handler
> > will incorrectly believe that RCU is already paying attention to this CPU.
> > Which would indeed not be at all good, so good catch!!!
> >
> >> Otherwise, I think that there may need to be enough state somewhere so
> >> that the outermost nested rcu_nmi_enter knows whether to increment
> >> dynticks.  For example, dynticks_nmi_nesting could store the nesting
> >> count * 2 - (1 if the outermost nested user needs to increment
> >> dynticks).  Something like:
> >>
> >> void rcu_nmi_enter(void)
> >> {
> >>   /* Be very careful -- this function may be called reentrently on the
> >> same CPU. */
> >>   atomically: increment dynticks if it's even.
> >>
> >>   /* If an rcu_nmi_enter/rcu_nmi_exit pair happens here, then it will not change
> >>    * the state. */
> >>
> >>   local_inc(&dynticks_nmi_nesting, (we incremented dynticks ? 1 : 2));
> >>
> >>   WARN_ON(we incremented dynticks and dynticks_nmi_nesting was nonzero);
> >> }
> >>
> >> void rcu_nmi_exit(void)
> >> {
> >>   WARN_ON(!(dynticks & 1));
> >>   locally atomically: dynticks_nmi_nesting -= 2, unless
> >> dynticks_nmi_nesting == 1, in which case set it to zero
> >>
> >>   if (dynticks_nmi_nesting was 1)
> >>     atomic_inc(&dynticks);
> >> }
> >>
> >> The invariant here is that, for a single unnested enter/exit, if
> >> dynticks_nmi_nesting != 0, then dynticks is odd.  As a result, an
> >> rcu_nmi_enter/rcu_nmi_exit pair at any time when dynticks_nmi_nesting
> >> != 0 *or* dynticks is odd will have no net effect, so the invariant,
> >> in fact, holds for all invocations, nested or otherwise.
> >>
> >> At least one of those conditions is true at all times during the
> >> execution of outermost pair, starting with the first atomic operation
> >> and ending with the final atomic_inc.  So they nest properly no matter
> >> what else happens (unless, of course, someone else pokes dynticks in
> >> the middle).
> >>
> >> Thoughts?
> >
> > Let's see...  The evenness of ->dynticks should be preserved by nested NMI
> > handlers, so the check and increment need not be atomic.  We don't have
> > any way (other than atomic operations) to do local atomic modifications
> > on all architectures, because we cannot mask NMIs.  (Yes, it can work
> > on x86, but this is common code that needs to work everywhere.)  On the
> > other hand, presumably NMIs are rare, so atomic modification of the NMI
> > nesting counter should be OK, at least if it proves absolutely necessary.
> > And I am thinking that a mechanical proof will be needed here.  :-/
> >
> > But first, let me try generating the code and informally evaluating it:
> >
> >          1   struct rcu_dynticks {
> >          2     long long dynticks_nesting;
> >          3     int dynticks_nmi_nesting;
> >          4     atomic_t dynticks;
> >          5   };
> >          6
> >          7   void rcu_nmi_enter(void)
> >          8   {
> >          9     struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >         10     int incby = 2;
> >         11
> >         12     if (!(atomic_read(&rdtp->dynticks) & 0x1)) {
> >         13       smp_mb__before_atomic();
> >         14       atomic_inc(&rdtp->dynticks);
> >         15       smp_mb__after_atomic();
> >         16       WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> >         17       incby = 1;
> 
> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 1) here, perhaps?

That would make sense.

> >         18     }
> >         19     rdtp->dynticks_nmi_nesting += incby;
> 
> Oh, I see why you don't need local_add -- it's because an nmi in the
> middle of this increment won't have any effect on the interrupted
> code, so even a software RMW will be okay.

Yep!  ;-)

> >         20     barrier();
> >         21   }
> >         22
> >         23   void rcu_nmi_exit(void)
> >         24   {
> >         25     struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >         26
> >         27     WARN_ON_ONCE(!rdtp->dynticks_nmi_nesting);
> >         28     WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> >         29     if (rdtp->dynticks_nmi_nesting != 1) {
> 
> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 2), perhaps?

This is already implied by the WARN_ON_ONCE() on line 27 and the check
on line 29.

> >         30       rdtp->dynticks_nmi_nesting -= 2;
> >         31       return;
> >         32     }
> >         33     rdtp->dynticks_nmi_nesting = 0;
> >         34     smp_mb__before_atomic();
> 
> This implies barrier(), right?

Yep!

> >         35     atomic_inc(&rdtp->dynticks);
> >         36     smp_mb__after_atomic();
> >         37     WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> >         38   }
> >
> > Line 9 picks up a pointer to this CPU's rcu_dynticks structure and line 10
> > assumes that we don't need to increment ->dynticks.
> >
> > Line 12 checks to see if ->dynticks is even.  Note that this check is
> > stable: If there are nested NMIs, they will increment ->dynticks twice
> > or not at all, and either way preserves the evenness (to be proven, of
> > course, but that is the plan).  If ->dynticks is even, lines 13-15
> > atomically increment it, line 16 complains if still even, and line 17
> > says we will increment ->dynticks_nmi_nesting by only 1.
> >
> > Either way, line 19 increments ->dynticks_nmi_nesting as needed and
> > line 20 keeps the compiler from getting too cute.
> >
> > For rcu_nmi_exit(), line 25 again picks up this CPUs rcu_dynticks
> > structure.  Lines 27 and 28 complain bitterly if invariants are violated.
> > If line 29 finds that the value of ->dynticks_nmi_nesting is not 1,
> > then line 30 subtracts 2 from ->dynticks_nmi_nesting and line 31 returns.
> >
> > Otherwise, line 33 sets ->dynticks_nmi_nesting to zero, lines 34-36
> > atomically increment ->dynticks with full ordering, and line 37
> > complains bitterly if ->dynticks is not even.
> >
> > So, if an NMI occurs before rcu_nmi_enter's atomic increment, then the
> > nested NMI's rcu_nmi_enter() and rcu_nmi_exit() will think that they are
> > not nested, which is the correct thing for them to think in that case.
> > They will increment ->dynticks twice and restore ->dynticks_nmi_nesting
> > to zero (adding and then subtracting 1).  If the NMI happens after the
> > atomic increment, then the nested rcu_nmi_enter() and rcu_nmi_exit()
> > will leave ->dynticks alone, and will restore ->dynticks_nmi_nesting
> > to zero (adding and subtracting two again).  If the NMI happens after
> > the increment of ->dynticks_nmi_nesting, the nested NMI's rcu_nmi_enter()
> > and rcu_nmi_exit() will again restore ->dynticks_nmi_nesting, but this
> > time to one (again adding and subtracting two).
> >
> > In rcu_nmi_exit(), ->dynticks_nmi_nesting of zero had better not happen,
> > one means we need to atomically increment ->dynticks, and other values
> > mean that we are partially or fully nested.  Reasoning proceeds as for
> > rcu_nmi_enter(), but in the opposite direction.
> >
> > Whew!  That might even work.
> 
> I think I like this, with the warnings above.

OK with dropping the one that I called out as redundant?

> > But how about taking a different approach.  Assuming that there can
> > never be more than (say) 14 nesting NMI-like things, use the lower
> > four bits of ->dynticks to represent the NMI nesting and the upper
> > 28 bits as the counter.  This of course requires modifying lots of
> > places in RCU that check the counter, but it is probably time to
> > abstract the check anyway.
> >
> > This would allow my earlier attempted logic to work and (maybe) simplify
> > the reasoning a bit (and yes, the "magic" constants need macros):
> >
> >         void rcu_nmi_enter(void)
> >         {
> >                 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >                 int nesting = atomic_read(&rdtp->dynticks) & 0xf;
> >                 int incby = 0x01;
> >
> >                 WARN_ON_ONCE(nexting == 0xf);
> >                 if (nesting == 0) {
> >                         if (atomic_read(&rdtp->dynticks) & 0x10)
> >                                 return;
> >                         incby = 0x11;
> >                 }
> >                 smp_mb__before_atomic();
> >                 atomic_add(&rdtp->dynticks, incby);
> >                 smp_mb__after_atomic();
> >                 WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> >         }
> >
> >         void rcu_nmi_exit(void)
> >         {
> >                 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >                 int nesting = atomic_read(&rdtp->dynticks) & 0xf;
> >                 int incby = 0x0f;
> >
> >                 if (nesting == 0)
> >                         return;
> >                 if (nesting > 1)
> >                         incby = -1;
> >                 smp_mb__before_atomic();
> >                 atomic_add(&rdtp->dynticks, incby);
> >                 smp_mb__after_atomic();
> >                 WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> >         }
> >
> > Over to you!  ;-)
> 
> This latter one is all you :)

Well, let's see how I feel about it after trying a Promela model of
the first code sequence.  ;-)

							Thanx, Paul

> --Andy
> 
> >
> >                                                         Thanx, Paul
> >
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/