linux-kernel - Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220228221154.GN4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Mon, 28 Feb 2022 14:11:54 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Nicolas Saenz Julienne <nsaenzju@...hat.com>
Cc:     rostedt@...dmis.org, bristot@...nel.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, mtosatti@...hat.com
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> kernel might have the side effect of extending grace periods too much.
> This will eventually entice RCU to schedule a task on the isolated CPU
> to end the overly extended grace period, adding unwarranted noise to the
> CPU being traced in the process.
> 
> So, check if we're the only ones running on this isolated CPU and that
> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> between measurements.
> 
> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> loop's cond_resched() will go though a quiescent state for them.
> 
> Note that this same exact problem is what extended quiescent states were
> created for. But adapting them to this specific use-case isn't trivial
> as it'll imply reworking entry/exit and dynticks/context tracking code.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>
> ---
>  kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 870a08da5b48..4928358f6e88 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -21,7 +21,9 @@
>  #include <linux/uaccess.h>
>  #include <linux/cpumask.h>
>  #include <linux/delay.h>
> +#include <linux/tick.h>
>  #include <linux/sched/clock.h>
> +#include <linux/sched/isolation.h>
>  #include <uapi/linux/sched/types.h>
>  #include <linux/sched.h>
>  #include "trace.h"
> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
>  	struct osnoise_sample s;
>  	unsigned int threshold;
>  	u64 runtime, stop_in;
> +	unsigned long flags;
>  	u64 sum_noise = 0;
>  	int hw_count = 0;
>  	int ret = -1;
> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
>  					osnoise_stop_tracing();
>  		}
>  
> +		/*
> +		 * Check if we're the only ones running on this nohz_full CPU
> +		 * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> +		 * QS since there is no way for RCU to know we're not making
> +		 * use of it.
> +		 *
> +		 * Otherwise it'll be done through cond_resched().
> +		 */
> +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> +		    !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> +		    tick_nohz_tick_stopped()) {
> +			local_irq_save(flags);
> +			rcu_momentary_dyntick_idle();
> +			local_irq_restore(flags);

What is supposed to happen in this case is that RCU figures out that
there is a nohz_full CPU running for an extended period of time in the
kernel and takes matters into its own hands.  This goes as follows on
a HZ=1000 kernel with default RCU settings:

o	At about 20 milliseconds into the grace period, RCU makes
	cond_resched() report quiescent states, among other things.
	As you say, this does not help for CONFIG_PREEMPT=n kernels.

o	At about 30 milliseconds into the grace period, RCU forces an
	explicit context switch on the wayward CPU.  This should get
	the CPU's attention even in CONFIG_PREEMPT=y kernels.

So what is happening for you instead?

							Thanx, Paul

> +		}
> +
>  		/*
>  		 * For the non-preemptive kernel config: let threads runs, if
>  		 * they so wish.
> -- 
> 2.35.1
>