linux-kernel - Re: instrument_atomic_read()/_write() in noinstr functions?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YV1W8FAV6h2t5gQo@hirez.programming.kicks-ass.net>
Date:   Wed, 6 Oct 2021 09:57:36 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Daniel Axtens <dja@...ens.net>
Cc:     kasan-dev@...glegroups.com, linux-kernel@...r.kernel.org,
        paulmck@...nel.org, rcu@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Mark Rutland <mark.rutland@....com>
Subject: Re: instrument_atomic_read()/_write() in noinstr functions?

On Wed, Oct 06, 2021 at 12:05:00PM +1100, Daniel Axtens wrote:
> Hi,
> 
> commit b58e733fd774 ("rcu: Fixup noinstr warnings") adds some
> instrument_atomic_read calls to rcu_nmi_enter - a function marked
> noinstr. Similar calls are added to some other functions as well.

It moves the instrumentation, it was already there. Specifically:

-       seq = atomic_add_return(RCU_DYNTICK_CTRL_CTR, &rdp->dynticks);
+       seq = arch_atomic_add_return(RCU_DYNTICK_CTRL_CTR, &rdp->dynticks);

removes the instrumentation from the critical place where RCU isn't yet
watching, which is then added back here:

+       // instrumentation for the noinstr rcu_dynticks_eqs_enter()
+       instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));

Once it's deemed safe to run instrumentation.

> This is causing me some grief on powerpc64 while trying to enable
> KASAN. powerpc64 book3s takes some NMIs in real mode, and in real mode
> we can't access where I'm proposing to put the KASAN shadow - we can
> only access it with translations on. So I end up taking a fault in the
> kasan_check_read path via rcu_nmi_enter.

Then your entry ordering is wrong :-( RCU should be the very last thing,
once RCU is watching it should be safe to run instrumentation.

> As far as I can tell `instrumentation_begin()` and
> `instrumentation_end()` don't make it safe to call instrumentation, they
> just tell the developer that instrumentation is safe. (And they are used
> to check the balance of _begin()/_end() blocks.)

That is correct. In that respect it is an unsafe (pun intended)
annotation. The annotation can be used to annotate away actual
violations (although the one at hand is not one such). There are some
explicitly unsafe annotations like that though, typically WARNs in early
init code where we really can't do much better than to ignore and hope
the error gets out.

> Is the underlying assumption that the KASAN shadow will always be safe
> to access, even in functions marked noinstr? It seems to undercut what
> an architecture can assume about a function marked noinstr...

The assumption is that RCU is the very last thing in the entry code to
be enabled, and the very first to be disabled. Therefore, the moment RCU
is active we can allow instrumentation, and hence the
instrumentation_begin() is correct there.

The NMI dance on x86 is particularly nasty, but the first part
(currently all in entry_64.S) ensures the kernel page-tables are active
and that we have a kernel stack.

Then we call into C, which is still gnarly and deals with
self-recursion, but eventually calls irqentry_nmi_enter(). This then
carefully frobs the preempt, lockdep and rcu states into the right place
after which we have a fully 'normal' C context.

> P.S. On a more generic note instrumentation_begin()/_end() is now
> scattered across the kernel and it makes me a bit nervous. It's making a
> statement about something that is in part a property of how the arch
> implements instrumentation. Are arches expected to implement things in
> such a way as to make these blocks accurate?

Yes, there's only a limited ways in which all this can slot toghether
due to all the nasty inter-dependencies. Thomas and me spend quite a bit
of time trying to untangle the web such that we have a coherent
entry/exit ordering that's actually workable.

Pretty much everybody had this wrong and was/is broken in various
non-fun ways.

It's just that we didn't seem to have gotten around to writing
much documentation for any of this :/

> For example in
> arch/powerpc/include/asm/interrupt.h::interrupt_nmi_enter_prepare we
> currently sometimes call nmi_enter in real mode; should we instead only
> call it when we have translations on?

nmi_enter() is the 'old' interface that has known issues. That said, you
seem to have a comment exactly there:

	/*
	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
	 * NMI because not everything it touches is within the RMA limit.
	 */
	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
			!firmware_has_feature(FW_FEATURE_LPAR) ||
			radix_enabled() || (mfmsr() & MSR_DR))
		nmi_enter();


To me it sounds like this real-mode is something that's not a normal C
context and really should not ever run any instrumented code. As such I
don't think it should be using RCU.


Let me illustrate with the IRQ entry code, as that's easier:

Your code currently seems to do things like:

DEFINE_INTERRUPT_HANDLER_ASYNC()
  interrupt_async_enter_prepare()
    interrupt_enter_prepare()
      trace_hardirqs_off()
        lockdep_hardirqs_off()
	tracer_hardirqs_off()  // relies on RCU
	trace_irq_disable()    // relies on RCU
    irq_enter()
      rcu_irq_enter() // relies on lockdep, enables RCU
      ...


And there's a 'funnier' one involving trace_hardirqs_on(), there
lockdep itself relies on RCU and RCU relies on lockdep. But I'm not
quite sure how power does that.