[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221107152935.167-1-thunder.leizhen@huawei.com>
Date: Mon, 7 Nov 2022 23:29:35 +0800
From: Zhen Lei <thunder.leizhen@...wei.com>
To: "Paul E . McKenney" <paulmck@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>,
"Josh Triplett" <josh@...htriplett.org>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Joel Fernandes <joel@...lfernandes.org>, <rcu@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: Zhen Lei <thunder.leizhen@...wei.com>,
Robert Elliott <elliott@....com>
Subject: [PATCH] rcu: Illustrate the stall information of CONFIG_RCU_CPU_STALL_CPUTIME=y
Describes how to quickly determine the RCU stall fault type based on the
extra output information during CONFIG_RCU_CPU_STALL_CPUTIME=y.
Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com>
---
Documentation/RCU/stallwarn.rst | 56 +++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index dfa4db8c0931eaf..40748bff8b8186e 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -390,3 +390,59 @@ for example, "P3421".
It is entirely possible to see stall warnings from normal and from
expedited grace periods at about the same time during the same run.
+
+RCU_CPU_STALL_CPUTIME
+=====================
+If CONFIG_RCU_CPU_STALL_CPUTIME=y or rcupdate.rcu_cpu_stall_cputime=1,
+some statistics related to interrupts and tasks are shown additionally
+as follows:
+rcu: hardirqs softirqs csw/system
+rcu: number: 624 45 0
+rcu: cputime: 69 1 2425 ==> 2500(ms)
+
+These statistics are collected in the second half of the rcu stall
+timeout. The values in row "number:" are the number of hard interrupts,
+number of soft interrupts, and number of context switches. The values in
+row "cputime:" are the cputime of hard interrupts, cputime of soft
+interrupts, cputime of tasks, and sampling period. Because user-mode tasks
+do not cause rcu stall, these tasks can only be kernel tasks, that's why
+only the cputime of system are considered.
+
+The following describes four typical scenarios:
+1. A CPU looping with interrupts disabled.
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 0 0 0
+ rcu: cputime: 0 0 0 ==> 2500(ms)
+ The start time of the interrupt processing is marked when the handler
+ is entered, and the end time is marked when the handler is exited. The
+ cputime of hard interrupts is zero because the current processing time
+ of current interrupt has not been calculated. Since the irq is disabled,
+ all other counts must be zero in the second half of rcu stall timeout.
+
+2. A CPU looping with bottom halves disabled.
+ Similar to the former, but the number and cputime of hard interrupts
+ are non-zero.
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 0 0
+ rcu: cputime: 49 0 2446 ==> 2500(ms)
+ The cputime of system is non-zero, so local_bh_disable() is called in
+ current task. Otherwise, the cputime of softirqs should be non-zero.
+ Note, in this case, the number of soft interrupts is always zero.
+
+3. A CPU looping with preemption disabled.
+ The number and cputime of hard interrupts and soft interrupts are all
+ non-zero. Only the number of context switches is zero.
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 45 0
+ rcu: cputime: 69 1 2425 ==> 2500(ms)
+
+4. No looping, but massive hard and soft interrupts.
+ rcu: hardirqs softirqs csw/system
+ rcu: number: xx xx 0
+ rcu: cputime: xx xx 0 ==> 2500(ms)
+ The number and cputime of hard interrupts are all non-zero. The number
+ of context switches and the cputime of system are zero. The number and
+ cputime of soft interrupts depends on the cputime of hard interrupts,
+ either all zeros or all non-zeros.
+ If it can be reproduced, cat /proc/interrupts or write code to trace
+ each interrupt by referring to show_interrupts().
--
2.25.1
Powered by blists - more mailing lists