linux-kernel - Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220301180509.GQ4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Tue, 1 Mar 2022 10:05:09 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Daniel Bristot de Oliveira <bristot@...nel.org>
Cc:     Nicolas Saenz Julienne <nsaenzju@...hat.com>, rostedt@...dmis.org,
        mingo@...hat.com, linux-kernel@...r.kernel.org, mtosatti@...hat.com
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, Mar 01, 2022 at 06:55:23PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/1/22 11:52, Nicolas Saenz Julienne wrote:
> > On Mon, 2022-02-28 at 21:00 +0100, Daniel Bristot de Oliveira wrote:
> >> On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> >>> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> >>> kernel might have the side effect of extending grace periods too much.
> >>> This will eventually entice RCU to schedule a task on the isolated CPU
> >>> to end the overly extended grace period, adding unwarranted noise to the
> >>> CPU being traced in the process.
> >>>
> >>> So, check if we're the only ones running on this isolated CPU and that
> >>> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> >>> between measurements.
> >>>
> >>> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> >>> loop's cond_resched() will go though a quiescent state for them.
> >>>
> >>> Note that this same exact problem is what extended quiescent states were
> >>> created for. But adapting them to this specific use-case isn't trivial
> >>> as it'll imply reworking entry/exit and dynticks/context tracking code.
> >>>
> >>> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>
> >>> ---
> >>>  kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> >>>  1 file changed, 19 insertions(+)
> >>>
> >>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> >>> index 870a08da5b48..4928358f6e88 100644
> >>> --- a/kernel/trace/trace_osnoise.c
> >>> +++ b/kernel/trace/trace_osnoise.c
> >>> @@ -21,7 +21,9 @@
> >>>  #include <linux/uaccess.h>
> >>>  #include <linux/cpumask.h>
> >>>  #include <linux/delay.h>
> >>> +#include <linux/tick.h>
> >>>  #include <linux/sched/clock.h>
> >>> +#include <linux/sched/isolation.h>
> >>>  #include <uapi/linux/sched/types.h>
> >>>  #include <linux/sched.h>
> >>>  #include "trace.h"
> >>> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> >>>  	struct osnoise_sample s;
> >>>  	unsigned int threshold;
> >>>  	u64 runtime, stop_in;
> >>> +	unsigned long flags;
> >>>  	u64 sum_noise = 0;
> >>>  	int hw_count = 0;
> >>>  	int ret = -1;
> >>> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> >>>  					osnoise_stop_tracing();
> >>>  		}
> >>>  
> >>> +		/*
> >>> +		 * Check if we're the only ones running on this nohz_full CPU
> >>> +		 * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> >>> +		 * QS since there is no way for RCU to know we're not making
> >>> +		 * use of it.
> >>> +		 *
> >>> +		 * Otherwise it'll be done through cond_resched().
> >>> +		 */
> >>> +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> >>> +		    !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> >>
> >> Does this restrict to only isolcpus cpus?
> > 
> > nohz_full CPUs actually, IIUC HK_FLAG_MISC isn't set if isolcpus is used, which
> > is deprecated anyway.
> 
> Perfecto!
> 
> > 
> >> what if this CPU was isolated via other methods?
> > 
> > osnoise with an uncontested FIFO priority for example?
> 
> No, I was mentioning something like tuna/tasket/systemd/cgroup, anything other
> than isolcpus... as it is doing (I miss interpreted the HK_FLAG_MISC).
> 
> I do not agree on using busy-loop with FIFO.
> 
> I believe in that case
> > RCU will start throwing "rcu_preempt detected stalls" style warnings. As it
> > won't be able to preempt the osnoise CPU to force the grace period ending.
> > 
> > I see your point though, this would also help in that situation. We could maybe
> > relax the entry barrier to rcu_momentary_dyntick_idle(). I think it's safe to
> > call it regardless of nohz_full/tick state for most cases, I just wanted to
> > avoid the overhead. The only thing that worries me is PREEMPT_RT and its
> > rt_spinlocks, which can be preempted.
> 
> no, that was not my point.
> 
> > 
> >>> +		    tick_nohz_tick_stopped()) {
> >>> +			local_irq_save(flags);
> >>
> >> This code is always with interrupts enabled, so local_irq_disable()/enable()
> >> should be enough (and faster).
> > 
> > Noted.
> > 
> >>> +			rcu_momentary_dyntick_idle();
> >>> +			local_irq_restore(flags);
> >>> +		}
> >>
> >> Question, if we set this once, we could avoid setting it on every loop unless we
> >> have a preemption from another thread, right?
> > 
> > This tells RCU the CPU went through a quiescent state, which removes it from
> > the current grace period accounting. It's different from an extended quiescent
> > state, which fully disables the CPU from RCU's perspective.
> 
> Got it!
> 
> > We don't need to do it on every iteration, but as Paul explained in the mail
> > thread it has to happen at least every ~20-30ms.
> 
> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
> fast machine, we start see HW noise where it does not exist, and that would
> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
> need to make it as lightweight as possible.

In the common case, it is atomically incrementing a local per-CPU counter
and doing a store.  This should be quite cheap.

The uncommon case is when the osnoise process was preempted or otherwise
interfered with during a recent RCU read-side critical section and
preemption was disabled around that critical section's outermost
rcu_read_unlock().  This can be quite expensive.  But I would expect
you to just not do this.  ;-)

							Thanx, Paul