[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150718131252.GB1747@lerouge>
Date: Sat, 18 Jul 2015 15:12:53 +0200
From: Frederic Weisbecker <fweisbec@...il.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Sasha Levin <sasha.levin@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>, X86 ML <x86@...nel.org>,
Rik van Riel <riel@...hat.com>
Subject: Re: Reconciling rcu_irq_enter()/rcu_nmi_enter() with context tracking
On Fri, Jul 17, 2015 at 11:59:18AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 16, 2015 at 9:49 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Thu, Jul 16, 2015 at 09:29:07PM -0700, Paul E. McKenney wrote:
> >> On Thu, Jul 16, 2015 at 06:53:15PM -0700, Andy Lutomirski wrote:
> >> > For reasons that mystify me a bit, we currently track context tracking
> >> > state separately from rcu's watching state. This results in strange
> >> > artifacts: nothing generic cause IRQs to enter CONTEXT_KERNEL, and we
> >> > can nest exceptions inside the IRQ handler (an example would be
> >> > wrmsr_safe failing), and, in -next, we splat a warning:
> >> >
> >> > https://gist.github.com/sashalevin/a006a44989312f6835e7
> >> >
> >> > I'm trying to make context tracking more exact, which will fix this
> >> > issue (the particular splat that Sasha hit shouldn't be possible when
> >> > I'm done), but I think it would be nice to unify all of this stuff.
> >> > Would it be plausible for us to guarantee that RCU state is always in
> >> > sync with context tracking state? If so, we could maybe simplify
> >> > things and have fewer state variables.
> >>
> >> A noble goal. Might even be possible, and maybe even advantageous.
> >>
> >> But it is usually easier to say than to do. RCU really does need to make
> >> some adjustments when the state changes, as do the other subsystems.
> >> It might or might not be possible to do the transitions atomically.
> >> And if the transitions are not atomic, there will still be weird code
> >> paths where (say) the processor is considered non-idle, but RCU doesn't
> >> realize it yet. Such a code path could not safely use rcu_read_lock(),
> >> so you still need RCU to be able to scream if someone tries it.
> >> Contrariwise, if there is a code path where the processor is considered
> >> idle, but RCU thinks it is non-idle, that code path can stall
> >> grace periods. (Yes, not a problem if the code path is short enough.
> >> At least if the underlying VCPU is making progres...)
> >>
> >> Still, I cannot prove that it is impossible, and if it is possible,
> >> then as you say, there might well be benefits.
> >>
> >> > Doing this for NMIs might be weird. Would it make sense to have a
> >> > CONTEXT_NMI that's somehow valid even if the NMI happened while
> >> > changing context tracking state.
> >>
> >> Face it, NMIs are weird. ;-)
> >>
> >> > Thoughts? As it stands, I think we might already be broken for real:
> >> >
> >> > Syscall -> user_exit. Perf NMI hits *during* user_exit. Perf does
> >> > copy_from_user_nmi, which can fault, causing do_page_fault to get
> >> > called, which calls exception_enter(), which can't be a good thing.
> >> >
> >> > RCU is okay (sort of) because of rcu_nmi_enter, but this seems very fragile.
> >>
> >> Actually, I see more cases where people forget irq_enter() than
> >> rcu_nmi_enter(). "We will just nip in quickly and do something without
> >> actually letting the irq system know. Oh, and we want some event tracing
> >> in that code path." Boom!
> >>
> >> > Thoughts? As it stands, I need to do something because -tip and thus
> >> > -next spews occasional warnings.
> >>
> >> Tell me more?
> >
> > And for completeness, RCU also has the following requirements on the
> > state-transition mechanism:
> >
> > 1. It must be possible to reliably sample some other CPU's state.
> > This is an energy-efficiency requirement, as RCU is not normally
> > permitted to wake up idle CPUs. Nor nohz CPUs, for that matter.
>
> NOHZ needs this for vtime accounting, too. I think Rik might be
> thinking about this. Maybe the underlying state could be shared?
>
> >
> > 2. RCU must be able to track passage through idle and nohz states.
> > In other words, if RCU samples at t=0 and finds that the CPU
> > is executing (say) in kernel mode, and RCU samples again at
> > t=10 and again finds that the CPU is executing in kernel mode,
> > RCU needs to be able to determine whether or not that CPU passed
> > through idle or nohz betweentimes.
>
> And RCU can do this for CONTEXT_KERNEL vs CONTEXT_USER because the
> context tracking stuff notifies RCU. The think I'm less than happy
> with is that we can currently be CONTEXT_USER but still rcu-awake.
> This is manageable, but it seems messy.
When we interrupt userspace, right? I don't see that much as a problem,
until we use a unified context tracking for both RCU and context tracking.
>
> >
> > 3. In some configurations, RCU needs to be able to block entry into
> > nohz state, both for idle and userspace.
> >
>
> Hmm. I suppose we could be CONTEXT_USER but still have RCU awake,
> although the tick would have to stay on.
Well 3) is handled by the tick nohz code so it's still external.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists