linux-kernel - Re: Reconciling rcu_irq_enter()/rcu_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150718131252.GB1747@lerouge>
Date:	Sat, 18 Jul 2015 15:12:53 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>, X86 ML <x86@...nel.org>,
	Rik van Riel <riel@...hat.com>
Subject: Re: Reconciling rcu_irq_enter()/rcu_nmi_enter() with context tracking

On Fri, Jul 17, 2015 at 11:59:18AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 16, 2015 at 9:49 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Thu, Jul 16, 2015 at 09:29:07PM -0700, Paul E. McKenney wrote:
> >> On Thu, Jul 16, 2015 at 06:53:15PM -0700, Andy Lutomirski wrote:
> >> > For reasons that mystify me a bit, we currently track context tracking
> >> > state separately from rcu's watching state.  This results in strange
> >> > artifacts: nothing generic cause IRQs to enter CONTEXT_KERNEL, and we
> >> > can nest exceptions inside the IRQ handler (an example would be
> >> > wrmsr_safe failing), and, in -next, we splat a warning:
> >> >
> >> > https://gist.github.com/sashalevin/a006a44989312f6835e7
> >> >
> >> > I'm trying to make context tracking more exact, which will fix this
> >> > issue (the particular splat that Sasha hit shouldn't be possible when
> >> > I'm done), but I think it would be nice to unify all of this stuff.
> >> > Would it be plausible for us to guarantee that RCU state is always in
> >> > sync with context tracking state?  If so, we could maybe simplify
> >> > things and have fewer state variables.
> >>
> >> A noble goal.  Might even be possible, and maybe even advantageous.
> >>
> >> But it is usually easier to say than to do.  RCU really does need to make
> >> some adjustments when the state changes, as do the other subsystems.
> >> It might or might not be possible to do the transitions atomically.
> >> And if the transitions are not atomic, there will still be weird code
> >> paths where (say) the processor is considered non-idle, but RCU doesn't
> >> realize it yet.  Such a code path could not safely use rcu_read_lock(),
> >> so you still need RCU to be able to scream if someone tries it.
> >> Contrariwise, if there is a code path where the processor is considered
> >> idle, but RCU thinks it is non-idle, that code path can stall
> >> grace periods.  (Yes, not a problem if the code path is short enough.
> >> At least if the underlying VCPU is making progres...)
> >>
> >> Still, I cannot prove that it is impossible, and if it is possible,
> >> then as you say, there might well be benefits.
> >>
> >> > Doing this for NMIs might be weird.  Would it make sense to have a
> >> > CONTEXT_NMI that's somehow valid even if the NMI happened while
> >> > changing context tracking state.
> >>
> >> Face it, NMIs are weird.  ;-)
> >>
> >> > Thoughts?  As it stands, I think we might already be broken for real:
> >> >
> >> > Syscall -> user_exit.  Perf NMI hits *during* user_exit.  Perf does
> >> > copy_from_user_nmi, which can fault, causing do_page_fault to get
> >> > called, which calls exception_enter(), which can't be a good thing.
> >> >
> >> > RCU is okay (sort of) because of rcu_nmi_enter, but this seems very fragile.
> >>
> >> Actually, I see more cases where people forget irq_enter() than
> >> rcu_nmi_enter().  "We will just nip in quickly and do something without
> >> actually letting the irq system know.  Oh, and we want some event tracing
> >> in that code path."  Boom!
> >>
> >> > Thoughts?  As it stands, I need to do something because -tip and thus
> >> > -next spews occasional warnings.
> >>
> >> Tell me more?
> >
> > And for completeness, RCU also has the following requirements on the
> > state-transition mechanism:
> >
> > 1.      It must be possible to reliably sample some other CPU's state.
> >         This is an energy-efficiency requirement, as RCU is not normally
> >         permitted to wake up idle CPUs.  Nor nohz CPUs, for that matter.
> 
> NOHZ needs this for vtime accounting, too.  I think Rik might be
> thinking about this.  Maybe the underlying state could be shared?
> 
> >
> > 2.      RCU must be able to track passage through idle and nohz states.
> >         In other words, if RCU samples at t=0 and finds that the CPU
> >         is executing (say) in kernel mode, and RCU samples again at
> >         t=10 and again finds that the CPU is executing in kernel mode,
> >         RCU needs to be able to determine whether or not that CPU passed
> >         through idle or nohz betweentimes.
> 
> And RCU can do this for CONTEXT_KERNEL vs CONTEXT_USER because the
> context tracking stuff notifies RCU.  The think I'm less than happy
> with is that we can currently be CONTEXT_USER but still rcu-awake.
> This is manageable, but it seems messy.

When we interrupt userspace, right? I don't see that much as a problem,
until we use a unified context tracking for both RCU and context tracking.

> 
> >
> > 3.      In some configurations, RCU needs to be able to block entry into
> >         nohz state, both for idle and userspace.
> >
> 
> Hmm.  I suppose we could be CONTEXT_USER but still have RCU awake,
> although the tick would have to stay on.

Well 3) is handled by the tick nohz code so it's still external.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/