linux-kernel - Re: Reconciling rcu_irq_enter()/rcu_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXCUfdzU6M8Uix+WPA8vV9cfkdDgb=1pt62Up=ahZ-xaA@mail.gmail.com>
Date:	Fri, 17 Jul 2015 11:59:18 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:	Sasha Levin <sasha.levin@...cle.com>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>, X86 ML <x86@...nel.org>,
	Rik van Riel <riel@...hat.com>
Subject: Re: Reconciling rcu_irq_enter()/rcu_nmi_enter() with context tracking

On Thu, Jul 16, 2015 at 9:49 PM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Thu, Jul 16, 2015 at 09:29:07PM -0700, Paul E. McKenney wrote:
>> On Thu, Jul 16, 2015 at 06:53:15PM -0700, Andy Lutomirski wrote:
>> > For reasons that mystify me a bit, we currently track context tracking
>> > state separately from rcu's watching state.  This results in strange
>> > artifacts: nothing generic cause IRQs to enter CONTEXT_KERNEL, and we
>> > can nest exceptions inside the IRQ handler (an example would be
>> > wrmsr_safe failing), and, in -next, we splat a warning:
>> >
>> > https://gist.github.com/sashalevin/a006a44989312f6835e7
>> >
>> > I'm trying to make context tracking more exact, which will fix this
>> > issue (the particular splat that Sasha hit shouldn't be possible when
>> > I'm done), but I think it would be nice to unify all of this stuff.
>> > Would it be plausible for us to guarantee that RCU state is always in
>> > sync with context tracking state?  If so, we could maybe simplify
>> > things and have fewer state variables.
>>
>> A noble goal.  Might even be possible, and maybe even advantageous.
>>
>> But it is usually easier to say than to do.  RCU really does need to make
>> some adjustments when the state changes, as do the other subsystems.
>> It might or might not be possible to do the transitions atomically.
>> And if the transitions are not atomic, there will still be weird code
>> paths where (say) the processor is considered non-idle, but RCU doesn't
>> realize it yet.  Such a code path could not safely use rcu_read_lock(),
>> so you still need RCU to be able to scream if someone tries it.
>> Contrariwise, if there is a code path where the processor is considered
>> idle, but RCU thinks it is non-idle, that code path can stall
>> grace periods.  (Yes, not a problem if the code path is short enough.
>> At least if the underlying VCPU is making progres...)
>>
>> Still, I cannot prove that it is impossible, and if it is possible,
>> then as you say, there might well be benefits.
>>
>> > Doing this for NMIs might be weird.  Would it make sense to have a
>> > CONTEXT_NMI that's somehow valid even if the NMI happened while
>> > changing context tracking state.
>>
>> Face it, NMIs are weird.  ;-)
>>
>> > Thoughts?  As it stands, I think we might already be broken for real:
>> >
>> > Syscall -> user_exit.  Perf NMI hits *during* user_exit.  Perf does
>> > copy_from_user_nmi, which can fault, causing do_page_fault to get
>> > called, which calls exception_enter(), which can't be a good thing.
>> >
>> > RCU is okay (sort of) because of rcu_nmi_enter, but this seems very fragile.
>>
>> Actually, I see more cases where people forget irq_enter() than
>> rcu_nmi_enter().  "We will just nip in quickly and do something without
>> actually letting the irq system know.  Oh, and we want some event tracing
>> in that code path."  Boom!
>>
>> > Thoughts?  As it stands, I need to do something because -tip and thus
>> > -next spews occasional warnings.
>>
>> Tell me more?
>
> And for completeness, RCU also has the following requirements on the
> state-transition mechanism:
>
> 1.      It must be possible to reliably sample some other CPU's state.
>         This is an energy-efficiency requirement, as RCU is not normally
>         permitted to wake up idle CPUs.  Nor nohz CPUs, for that matter.

NOHZ needs this for vtime accounting, too.  I think Rik might be
thinking about this.  Maybe the underlying state could be shared?

>
> 2.      RCU must be able to track passage through idle and nohz states.
>         In other words, if RCU samples at t=0 and finds that the CPU
>         is executing (say) in kernel mode, and RCU samples again at
>         t=10 and again finds that the CPU is executing in kernel mode,
>         RCU needs to be able to determine whether or not that CPU passed
>         through idle or nohz betweentimes.

And RCU can do this for CONTEXT_KERNEL vs CONTEXT_USER because the
context tracking stuff notifies RCU.  The think I'm less than happy
with is that we can currently be CONTEXT_USER but still rcu-awake.
This is manageable, but it seems messy.

>
> 3.      In some configurations, RCU needs to be able to block entry into
>         nohz state, both for idle and userspace.
>

Hmm.  I suppose we could be CONTEXT_USER but still have RCU awake,
although the tick would have to stay on.

Grumble.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/