lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b604526d3186f6cd3da189abb70bd1ad9a6105c5.camel@redhat.com>
Date:   Tue, 01 Mar 2022 11:52:38 +0100
From:   Nicolas Saenz Julienne <nsaenzju@...hat.com>
To:     Daniel Bristot de Oliveira <bristot@...nel.org>,
        rostedt@...dmis.org, paulmck@...nel.org
Cc:     mingo@...hat.com, linux-kernel@...r.kernel.org, mtosatti@...hat.com
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, 2022-02-28 at 21:00 +0100, Daniel Bristot de Oliveira wrote:
> On 2/28/22 15:14, Nicolas Saenz Julienne wrote:
> > At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> > kernel might have the side effect of extending grace periods too much.
> > This will eventually entice RCU to schedule a task on the isolated CPU
> > to end the overly extended grace period, adding unwarranted noise to the
> > CPU being traced in the process.
> > 
> > So, check if we're the only ones running on this isolated CPU and that
> > we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> > between measurements.
> > 
> > Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> > loop's cond_resched() will go though a quiescent state for them.
> > 
> > Note that this same exact problem is what extended quiescent states were
> > created for. But adapting them to this specific use-case isn't trivial
> > as it'll imply reworking entry/exit and dynticks/context tracking code.
> > 
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>
> > ---
> >  kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> > index 870a08da5b48..4928358f6e88 100644
> > --- a/kernel/trace/trace_osnoise.c
> > +++ b/kernel/trace/trace_osnoise.c
> > @@ -21,7 +21,9 @@
> >  #include <linux/uaccess.h>
> >  #include <linux/cpumask.h>
> >  #include <linux/delay.h>
> > +#include <linux/tick.h>
> >  #include <linux/sched/clock.h>
> > +#include <linux/sched/isolation.h>
> >  #include <uapi/linux/sched/types.h>
> >  #include <linux/sched.h>
> >  #include "trace.h"
> > @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> >  	struct osnoise_sample s;
> >  	unsigned int threshold;
> >  	u64 runtime, stop_in;
> > +	unsigned long flags;
> >  	u64 sum_noise = 0;
> >  	int hw_count = 0;
> >  	int ret = -1;
> > @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> >  					osnoise_stop_tracing();
> >  		}
> >  
> > +		/*
> > +		 * Check if we're the only ones running on this nohz_full CPU
> > +		 * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> > +		 * QS since there is no way for RCU to know we're not making
> > +		 * use of it.
> > +		 *
> > +		 * Otherwise it'll be done through cond_resched().
> > +		 */
> > +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> > +		    !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> 
> Does this restrict to only isolcpus cpus?

nohz_full CPUs actually, IIUC HK_FLAG_MISC isn't set if isolcpus is used, which
is deprecated anyway.

> what if this CPU was isolated via other methods?

osnoise with an uncontested FIFO priority for example? I believe in that case
RCU will start throwing "rcu_preempt detected stalls" style warnings. As it
won't be able to preempt the osnoise CPU to force the grace period ending.

I see your point though, this would also help in that situation. We could maybe
relax the entry barrier to rcu_momentary_dyntick_idle(). I think it's safe to
call it regardless of nohz_full/tick state for most cases, I just wanted to
avoid the overhead. The only thing that worries me is PREEMPT_RT and its
rt_spinlocks, which can be preempted.

> > +		    tick_nohz_tick_stopped()) {
> > +			local_irq_save(flags);
> 
> This code is always with interrupts enabled, so local_irq_disable()/enable()
> should be enough (and faster).

Noted.

> > +			rcu_momentary_dyntick_idle();
> > +			local_irq_restore(flags);
> > +		}
> 
> Question, if we set this once, we could avoid setting it on every loop unless we
> have a preemption from another thread, right?

This tells RCU the CPU went through a quiescent state, which removes it from
the current grace period accounting. It's different from an extended quiescent
state, which fully disables the CPU from RCU's perspective.

We don't need to do it on every iteration, but as Paul explained in the mail
thread it has to happen at least every ~20-30ms.

Thanks!

-- 
Nicolás Sáenz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ