[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220228221154.GN4285@paulmck-ThinkPad-P17-Gen-1>
Date: Mon, 28 Feb 2022 14:11:54 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Nicolas Saenz Julienne <nsaenzju@...hat.com>
Cc: rostedt@...dmis.org, bristot@...nel.org, mingo@...hat.com,
linux-kernel@...r.kernel.org, mtosatti@...hat.com
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing
On Mon, Feb 28, 2022 at 03:14:23PM +0100, Nicolas Saenz Julienne wrote:
> At the moment running osnoise on an isolated CPU and a PREEMPT_RCU
> kernel might have the side effect of extending grace periods too much.
> This will eventually entice RCU to schedule a task on the isolated CPU
> to end the overly extended grace period, adding unwarranted noise to the
> CPU being traced in the process.
>
> So, check if we're the only ones running on this isolated CPU and that
> we're on a PREEMPT_RCU setup. If so, let's force quiescent states in
> between measurements.
>
> Non-PREEMPT_RCU setups don't need to worry about this as osnoise main
> loop's cond_resched() will go though a quiescent state for them.
>
> Note that this same exact problem is what extended quiescent states were
> created for. But adapting them to this specific use-case isn't trivial
> as it'll imply reworking entry/exit and dynticks/context tracking code.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>
> ---
> kernel/trace/trace_osnoise.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 870a08da5b48..4928358f6e88 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -21,7 +21,9 @@
> #include <linux/uaccess.h>
> #include <linux/cpumask.h>
> #include <linux/delay.h>
> +#include <linux/tick.h>
> #include <linux/sched/clock.h>
> +#include <linux/sched/isolation.h>
> #include <uapi/linux/sched/types.h>
> #include <linux/sched.h>
> #include "trace.h"
> @@ -1295,6 +1297,7 @@ static int run_osnoise(void)
> struct osnoise_sample s;
> unsigned int threshold;
> u64 runtime, stop_in;
> + unsigned long flags;
> u64 sum_noise = 0;
> int hw_count = 0;
> int ret = -1;
> @@ -1386,6 +1389,22 @@ static int run_osnoise(void)
> osnoise_stop_tracing();
> }
>
> + /*
> + * Check if we're the only ones running on this nohz_full CPU
> + * and that we're on a PREEMPT_RCU setup. If so, let's fake a
> + * QS since there is no way for RCU to know we're not making
> + * use of it.
> + *
> + * Otherwise it'll be done through cond_resched().
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RCU) &&
> + !housekeeping_cpu(raw_smp_processor_id(), HK_FLAG_MISC) &&
> + tick_nohz_tick_stopped()) {
> + local_irq_save(flags);
> + rcu_momentary_dyntick_idle();
> + local_irq_restore(flags);
What is supposed to happen in this case is that RCU figures out that
there is a nohz_full CPU running for an extended period of time in the
kernel and takes matters into its own hands. This goes as follows on
a HZ=1000 kernel with default RCU settings:
o At about 20 milliseconds into the grace period, RCU makes
cond_resched() report quiescent states, among other things.
As you say, this does not help for CONFIG_PREEMPT=n kernels.
o At about 30 milliseconds into the grace period, RCU forces an
explicit context switch on the wayward CPU. This should get
the CPU's attention even in CONFIG_PREEMPT=y kernels.
So what is happening for you instead?
Thanx, Paul
> + }
> +
> /*
> * For the non-preemptive kernel config: let threads runs, if
> * they so wish.
> --
> 2.35.1
>
Powered by blists - more mailing lists