[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221020231353.GC5600@paulmck-ThinkPad-P17-Gen-1>
Date: Thu, 20 Oct 2022 16:13:53 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Zhen Lei <thunder.leizhen@...wei.com>
Cc: Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>,
Josh Triplett <josh@...htriplett.org>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] rcu: Add RCU stall diagnosis information
On Mon, Oct 17, 2022 at 06:01:05PM +0800, Zhen Lei wrote:
> In some extreme cases, such as the I/O pressure test, the CPU usage may
> be 100%, causing RCU stall. In this case, the printed information about
> current is not useful. Displays the number and usage of hard interrupts,
> soft interrupts, and context switches that are generated within half of
> the CPU stall timeout, can help us make a general judgment. In other
> cases, we can preliminarily determine whether an infinite loop occurs
> when local_irq, local_bh or preempt is disabled.
>
> Zhen Lei (3):
> sched: Add helper kstat_cpu_softirqs_sum()
> sched: Add helper nr_context_switches_cpu()
> rcu: Add RCU stall diagnosis information
Interesting approach, thank you!
I have pulled this in for testing and review, having rescued it from my
spam folder.
Some questions that might come up include: (1) Can the addition of
things like cond_resched() make RCU happier with the I/O pressure test?
(2) Should there be a way to turn this off for environments with slow
consoles? (3) If this information shows heavy CPU usage, what debug
and fix approach should be used?
For an example of #1, if a CPU is flooded with softirq activity, one
might hope that the call to rcu_softirq_qs() would prevent the RCU CPU
stall warning, at least for kernels built with CONFIG_PREEMPT_RT=n.
Similarly, if there are huge numbers of context switches, one might hope
that the rcu_note_context_switch() would report a quiescent state sooner
rather than later.
Thoughts?
Thanx, Paul
> include/linux/kernel_stat.h | 12 +++++++++++
> kernel/rcu/tree.h | 11 ++++++++++
> kernel/rcu/tree_stall.h | 40 +++++++++++++++++++++++++++++++++++++
> kernel/sched/core.c | 5 +++++
> 4 files changed, 68 insertions(+)
>
> --
> 2.25.1
>
Powered by blists - more mailing lists