[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200309204710.GU2935@paulmck-ThinkPad-P72>
Date: Mon, 9 Mar 2020 13:47:10 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Joel Fernandes <joel@...lfernandes.org>,
Frederic Weisbecker <frederic@...nel.org>
Subject: Re: Instrumentation and RCU
On Mon, Mar 09, 2020 at 06:02:32PM +0100, Thomas Gleixner wrote:
> Folks,
>
> I'm starting a new conversation because there are about 20 different
> threads which look at that problem in various ways and the information
> is so scattered that creating a coherent picture is pretty much
> impossible.
>
> There are several problems to solve:
>
> 1) Fragile low level entry code
>
> 2) Breakpoint utilization
>
> 3) RCU idle
>
> 4) Callchain protection
>
> #1 Fragile low level entry code
>
> While I understand the desire of instrumentation to observe
> everything we really have to ask the question whether it is worth the
> trouble especially with entry trainwrecks like x86, PTI and other
> horrors in that area.
>
> I don't think so and we really should just bite the bullet and forbid
> any instrumentation in that code unless it is explicitly designed
> for that case, makes sense and has a real value from an observation
> perspective.
>
> This is very much related to #3..
>
> #2) Breakpoint utilization
>
> As recent findings have shown, breakpoint utilization needs to be
> extremly careful about not creating infinite breakpoint recursions.
>
> I think that's pretty much obvious, but falls into the overall
> question of how to protect callchains.
>
> #3) RCU idle
>
> Being able to trace code inside RCU idle sections is very similar to
> the question raised in #1.
>
> Assume all of the instrumentation would be doing conditional RCU
> schemes, i.e.:
>
> if (rcuidle)
> ....
> else
> rcu_read_lock_sched()
>
> before invoking the actual instrumentation functions and of course
> undoing that right after it, that really begs the question whether
> it's worth it.
>
> Especially constructs like:
>
> trace_hardirqs_off()
> idx = srcu_read_lock()
> rcu_irq_enter_irqson();
> ...
> rcu_irq_exit_irqson();
> srcu_read_unlock(idx);
>
> if (user_mode)
> user_exit_irqsoff();
> else
> rcu_irq_enter();
>
> are really more than questionable. For 99.9999% of instrumentation
> users it's absolutely irrelevant whether this traces the interrupt
> disabled time of user_exit_irqsoff() or rcu_irq_enter() or not.
>
> But what's relevant is the tracer overhead which is e.g. inflicted
> with todays trace_hardirqs_off/on() implementation because that
> unconditionally uses the rcuidle variant with the scru/rcu_irq dance
> around every tracepoint.
>
> Even if the tracepoint sits in the ASM code it just covers about ~20
> low level ASM instructions more. The tracer invocation, which is
> even done twice when coming from user space on x86 (the second call
> is optimized in the tracer C-code), costs definitely way more
> cycles. When you take the scru/rcu_irq dance into account it's a
> complete disaster performance wise.
Suppose that we had a variant of RCU that had about the same read-side
overhead as Preempt-RCU, but which could be used from idle as well as
from CPUs in the process of coming online or going offline? I have not
thought through the irq/NMI/exception entry/exit cases, but I don't see
why that would be problem.
This would have explicit critical-section entry/exit code, so it would
not be any help for trampolines.
Would such a variant of RCU help?
Yeah, I know. Just what the kernel doesn't need, yet another variant
of RCU...
Thanx, Paul
> #4 Protecting call chains
>
> Our current approach of annotating functions with notrace/noprobe is
> pretty much broken.
>
> Functions which are marked NOPROBE or notrace call out into functions
> which are not marked and while this might be ok, there are enough
> places where it is not. But we have no way to verify that.
>
> That's just a recipe for disaster. We really cannot request from
> sysadmins who want to use instrumentation to stare at the code first
> whether they can place/enable an instrumentation point somewhere.
> That'd be just a bad joke.
>
> I really think we need to have proper text sections which are off
> limit for any form of instrumentation and have tooling to analyze the
> calls into other sections. These calls need to be annotated as safe
> and intentional.
>
> Thoughts?
>
> Thanks,
>
> tglx
>
>
>
>
>
>
>
>
Powered by blists - more mailing lists