lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141124213501.GX5050@linux.vnet.ibm.com>
Date:	Mon, 24 Nov 2014 13:35:01 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	Josh Triplett <josh@...htriplett.org>,
	Frédéric Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST
 context

On Mon, Nov 24, 2014 at 01:02:51PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 24, 2014 at 12:54 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Mon, Nov 24, 2014 at 12:22:13PM -0800, Andy Lutomirski wrote:
> >> On Sat, Nov 22, 2014 at 3:41 PM, Paul E. McKenney
> >> <paulmck@...ux.vnet.ibm.com> wrote:
> >> > On Fri, Nov 21, 2014 at 09:53:29PM -0800, Andy Lutomirski wrote:
> >> >> On Fri, Nov 21, 2014 at 8:20 PM, Paul E. McKenney
> >> >> <paulmck@...ux.vnet.ibm.com> wrote:
> >> >> > On Fri, Nov 21, 2014 at 06:00:14PM -0800, Andy Lutomirski wrote:
> >> >> >> On Fri, Nov 21, 2014 at 3:38 PM, Paul E. McKenney
> >> >> >> <paulmck@...ux.vnet.ibm.com> wrote:
> >>
> >> > Returning state sounds like a bad idea, if we can reasonably avoid it.
> >>
> >> I agree, except that we already do it for exception_enter(), etc.  But
> >> yes, changing fewer things is nice.
> >>
> >> >
> >> > And I think I finally see what you are pointing out about my code: If
> >> > another NMI comes in between the time I increment ->dynticks_nmi_nesting
> >> > and the time I atomically increment ->dynticks, the nested NMI handler
> >> > will incorrectly believe that RCU is already paying attention to this CPU.
> >> > Which would indeed not be at all good, so good catch!!!
> >> >
> >> >> Otherwise, I think that there may need to be enough state somewhere so
> >> >> that the outermost nested rcu_nmi_enter knows whether to increment
> >> >> dynticks.  For example, dynticks_nmi_nesting could store the nesting
> >> >> count * 2 - (1 if the outermost nested user needs to increment
> >> >> dynticks).  Something like:
> >> >>
> >> >> void rcu_nmi_enter(void)
> >> >> {
> >> >>   /* Be very careful -- this function may be called reentrently on the
> >> >> same CPU. */
> >> >>   atomically: increment dynticks if it's even.
> >> >>
> >> >>   /* If an rcu_nmi_enter/rcu_nmi_exit pair happens here, then it will not change
> >> >>    * the state. */
> >> >>
> >> >>   local_inc(&dynticks_nmi_nesting, (we incremented dynticks ? 1 : 2));
> >> >>
> >> >>   WARN_ON(we incremented dynticks and dynticks_nmi_nesting was nonzero);
> >> >> }
> >> >>
> >> >> void rcu_nmi_exit(void)
> >> >> {
> >> >>   WARN_ON(!(dynticks & 1));
> >> >>   locally atomically: dynticks_nmi_nesting -= 2, unless
> >> >> dynticks_nmi_nesting == 1, in which case set it to zero
> >> >>
> >> >>   if (dynticks_nmi_nesting was 1)
> >> >>     atomic_inc(&dynticks);
> >> >> }
> >> >>
> >> >> The invariant here is that, for a single unnested enter/exit, if
> >> >> dynticks_nmi_nesting != 0, then dynticks is odd.  As a result, an
> >> >> rcu_nmi_enter/rcu_nmi_exit pair at any time when dynticks_nmi_nesting
> >> >> != 0 *or* dynticks is odd will have no net effect, so the invariant,
> >> >> in fact, holds for all invocations, nested or otherwise.
> >> >>
> >> >> At least one of those conditions is true at all times during the
> >> >> execution of outermost pair, starting with the first atomic operation
> >> >> and ending with the final atomic_inc.  So they nest properly no matter
> >> >> what else happens (unless, of course, someone else pokes dynticks in
> >> >> the middle).
> >> >>
> >> >> Thoughts?
> >> >
> >> > Let's see...  The evenness of ->dynticks should be preserved by nested NMI
> >> > handlers, so the check and increment need not be atomic.  We don't have
> >> > any way (other than atomic operations) to do local atomic modifications
> >> > on all architectures, because we cannot mask NMIs.  (Yes, it can work
> >> > on x86, but this is common code that needs to work everywhere.)  On the
> >> > other hand, presumably NMIs are rare, so atomic modification of the NMI
> >> > nesting counter should be OK, at least if it proves absolutely necessary.
> >> > And I am thinking that a mechanical proof will be needed here.  :-/
> >> >
> >> > But first, let me try generating the code and informally evaluating it:
> >> >
> >> >          1   struct rcu_dynticks {
> >> >          2     long long dynticks_nesting;
> >> >          3     int dynticks_nmi_nesting;
> >> >          4     atomic_t dynticks;
> >> >          5   };
> >> >          6
> >> >          7   void rcu_nmi_enter(void)
> >> >          8   {
> >> >          9     struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >> >         10     int incby = 2;
> >> >         11
> >> >         12     if (!(atomic_read(&rdtp->dynticks) & 0x1)) {
> >> >         13       smp_mb__before_atomic();
> >> >         14       atomic_inc(&rdtp->dynticks);
> >> >         15       smp_mb__after_atomic();
> >> >         16       WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> >> >         17       incby = 1;
> >>
> >> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 1) here, perhaps?
> >
> > That would make sense.
> >
> >> >         18     }
> >> >         19     rdtp->dynticks_nmi_nesting += incby;
> >>
> >> Oh, I see why you don't need local_add -- it's because an nmi in the
> >> middle of this increment won't have any effect on the interrupted
> >> code, so even a software RMW will be okay.
> >
> > Yep!  ;-)
> >
> >> >         20     barrier();
> >> >         21   }
> >> >         22
> >> >         23   void rcu_nmi_exit(void)
> >> >         24   {
> >> >         25     struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >> >         26
> >> >         27     WARN_ON_ONCE(!rdtp->dynticks_nmi_nesting);
> >> >         28     WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> >> >         29     if (rdtp->dynticks_nmi_nesting != 1) {
> >>
> >> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 2), perhaps?
> >
> > This is already implied by the WARN_ON_ONCE() on line 27 and the check
> > on line 29.
> 
> I was worried about negative numbers.  Maybe change line 27 to
> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0), then?  (Or is it
> unsigned?  If so, let's make to signed to catch this type of error.)

Good point, they are signed, so your WARN_ON_ONCE() would work.

> >> >         30       rdtp->dynticks_nmi_nesting -= 2;
> >> >         31       return;
> >> >         32     }
> >> >         33     rdtp->dynticks_nmi_nesting = 0;
> >> >         34     smp_mb__before_atomic();
> >>
> >> This implies barrier(), right?
> >
> > Yep!
> >
> >> >         35     atomic_inc(&rdtp->dynticks);
> >> >         36     smp_mb__after_atomic();
> >> >         37     WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> >> >         38   }
> >> >
> >> > Line 9 picks up a pointer to this CPU's rcu_dynticks structure and line 10
> >> > assumes that we don't need to increment ->dynticks.
> >> >
> >> > Line 12 checks to see if ->dynticks is even.  Note that this check is
> >> > stable: If there are nested NMIs, they will increment ->dynticks twice
> >> > or not at all, and either way preserves the evenness (to be proven, of
> >> > course, but that is the plan).  If ->dynticks is even, lines 13-15
> >> > atomically increment it, line 16 complains if still even, and line 17
> >> > says we will increment ->dynticks_nmi_nesting by only 1.
> >> >
> >> > Either way, line 19 increments ->dynticks_nmi_nesting as needed and
> >> > line 20 keeps the compiler from getting too cute.
> >> >
> >> > For rcu_nmi_exit(), line 25 again picks up this CPUs rcu_dynticks
> >> > structure.  Lines 27 and 28 complain bitterly if invariants are violated.
> >> > If line 29 finds that the value of ->dynticks_nmi_nesting is not 1,
> >> > then line 30 subtracts 2 from ->dynticks_nmi_nesting and line 31 returns.
> >> >
> >> > Otherwise, line 33 sets ->dynticks_nmi_nesting to zero, lines 34-36
> >> > atomically increment ->dynticks with full ordering, and line 37
> >> > complains bitterly if ->dynticks is not even.
> >> >
> >> > So, if an NMI occurs before rcu_nmi_enter's atomic increment, then the
> >> > nested NMI's rcu_nmi_enter() and rcu_nmi_exit() will think that they are
> >> > not nested, which is the correct thing for them to think in that case.
> >> > They will increment ->dynticks twice and restore ->dynticks_nmi_nesting
> >> > to zero (adding and then subtracting 1).  If the NMI happens after the
> >> > atomic increment, then the nested rcu_nmi_enter() and rcu_nmi_exit()
> >> > will leave ->dynticks alone, and will restore ->dynticks_nmi_nesting
> >> > to zero (adding and subtracting two again).  If the NMI happens after
> >> > the increment of ->dynticks_nmi_nesting, the nested NMI's rcu_nmi_enter()
> >> > and rcu_nmi_exit() will again restore ->dynticks_nmi_nesting, but this
> >> > time to one (again adding and subtracting two).
> >> >
> >> > In rcu_nmi_exit(), ->dynticks_nmi_nesting of zero had better not happen,
> >> > one means we need to atomically increment ->dynticks, and other values
> >> > mean that we are partially or fully nested.  Reasoning proceeds as for
> >> > rcu_nmi_enter(), but in the opposite direction.
> >> >
> >> > Whew!  That might even work.
> >>
> >> I think I like this, with the warnings above.
> >
> > OK with dropping the one that I called out as redundant?
> 
> Sure, but see about.
> 
> >
> >> > But how about taking a different approach.  Assuming that there can
> >> > never be more than (say) 14 nesting NMI-like things, use the lower
> >> > four bits of ->dynticks to represent the NMI nesting and the upper
> >> > 28 bits as the counter.  This of course requires modifying lots of
> >> > places in RCU that check the counter, but it is probably time to
> >> > abstract the check anyway.
> >> >
> >> > This would allow my earlier attempted logic to work and (maybe) simplify
> >> > the reasoning a bit (and yes, the "magic" constants need macros):
> >> >
> >> >         void rcu_nmi_enter(void)
> >> >         {
> >> >                 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >> >                 int nesting = atomic_read(&rdtp->dynticks) & 0xf;
> >> >                 int incby = 0x01;
> >> >
> >> >                 WARN_ON_ONCE(nexting == 0xf);
> >> >                 if (nesting == 0) {
> >> >                         if (atomic_read(&rdtp->dynticks) & 0x10)
> >> >                                 return;
> >> >                         incby = 0x11;
> >> >                 }
> >> >                 smp_mb__before_atomic();
> >> >                 atomic_add(&rdtp->dynticks, incby);
> >> >                 smp_mb__after_atomic();
> >> >                 WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> >> >         }
> >> >
> >> >         void rcu_nmi_exit(void)
> >> >         {
> >> >                 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >> >                 int nesting = atomic_read(&rdtp->dynticks) & 0xf;
> >> >                 int incby = 0x0f;
> >> >
> >> >                 if (nesting == 0)
> >> >                         return;
> >> >                 if (nesting > 1)
> >> >                         incby = -1;
> >> >                 smp_mb__before_atomic();
> >> >                 atomic_add(&rdtp->dynticks, incby);
> >> >                 smp_mb__after_atomic();
> >> >                 WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> >> >         }
> >> >
> >> > Over to you!  ;-)
> >>
> >> This latter one is all you :)
> >
> > Well, let's see how I feel about it after trying a Promela model of
> > the first code sequence.  ;-)
> 
> :)
> 
> Does Promela understand the differences between this type of
> reentrancy and real threading?

Not as far as I know.  But it can be tricked into making this distinction.
One thread just has the Promela code as is, and the other thread has
the same Promela code entirely contained in an atomic block.  This means
that the entire second thread must executed at one point in the first
thread, just like an NMI would.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ