[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <PH0PR11MB5880E088CB975674DB791B3BDAF99@PH0PR11MB5880.namprd11.prod.outlook.com>
Date: Sun, 24 Apr 2022 03:19:15 +0000
From: "Zhang, Qiang1" <qiang1.zhang@...el.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "frederic@...nel.org" <frederic@...nel.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"urezki@...il.com" <urezki@...il.com>
Subject: RE: [PATCH v2] rcu: Dump all rcuc kthreads status for CPUs that not
report quiescent state
On Tue, Apr 19, 2022 at 01:34:26PM +0800, Zqiang wrote:
> If the rcutree.use_softirq is configured, when RCU Stall event
> happened, dump status of all rcuc kthreads who due to starvation
> prevented grace period ends on CPUs that not report quiescent
> state.
>Please accept my apologies for the delay, and please let me try
>again. ;-)
>
>Your earlier patch added at most one line and one stack backtrace to
>the RCU CPU stall warning text, which is OK. Sort of, anyway. I was
>relying on the fact that the people who have (rightly) complained about
>RCU CPU stall-warning verbosity never run with !use_softirq. But it is
>only a matter of time. Yes, we could argue that they should use faster
>console serial lines, faster management-console hardware, faster networks,
>faster mass storage, and so on, but I would expect them to in turn ask
>us if we were volunteering to pay for all that.
>
>In contrast, this patch can add one line per stalled CPU on top of the
>existing output. Which is better than your earlier patch, which could
>add a line plus a stack trace per stalled CPU. But that can still be
>a lot of added output, and that added output can cause problems.
>
>So, could you please merge this rcuc-stalled information into the
>existing per-CPU line printed by print_cpu_stall_info()? Right now,
>each such line looks something like this:
>
>rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2
>
>One approach would be to add the number of jiffies that the rcuc
>task was stalled to this line, maybe something like this:
>
>rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2 rcuc=15384
>
>Of course, this "rcuc=" string should only ut only if the stall lasted
>for longer than (say) one eighth of the stall timeout.
>
>Any "(false positive?)" needs to remain at the end of the line:
>
>rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2 rcuc=15384 (false positive?)
>
>Thoughts?
Thanks suggestion, I will resend v3.
>
> Thanx, Paul
>
> Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> ---
> v1->v2:
> rework rcuc_kthread_dump function
>
> kernel/rcu/tree_stall.h | 32 ++++++++++++++++++++++++++------
> 1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index d7956c03edbd..fcf0b2e1a71c 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -465,11 +465,13 @@ static void print_cpu_stall_info(int cpu)
> falsepositive ? " (false positive?)" : "");
> }
>
> -static void rcuc_kthread_dump(struct rcu_data *rdp)
> +static void __rcuc_kthread_dump(int cpu)
> {
> - int cpu;
> - unsigned long j;
> + struct rcu_data *rdp;
> struct task_struct *rcuc;
> + unsigned long j;
> +
> + rdp = per_cpu_ptr(&rcu_data, cpu);
>
> rcuc = rdp->rcu_cpu_kthread_task;
> if (!rcuc)
> @@ -488,6 +490,21 @@ static void rcuc_kthread_dump(struct rcu_data *rdp)
> dump_cpu_task(cpu);
> }
>
> +static void rcuc_kthread_dump(void)
> +{
> + int cpu;
> + struct rcu_node *rnp;
> + unsigned long flags;
> +
> + rcu_for_each_leaf_node(rnp) {
> + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> + for_each_leaf_node_possible_cpu(rnp, cpu)
> + if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
> + __rcuc_kthread_dump(cpu);
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + }
> +}
> +
> /* Complain about starvation of grace-period kthread. */
> static void rcu_check_gp_kthread_starvation(void)
> {
> @@ -597,6 +614,9 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> if (ndetected) {
> rcu_dump_cpu_stacks();
>
> + if (!use_softirq)
> + rcuc_kthread_dump();
> +
> /* Complain about tasks blocking the grace period. */
> rcu_for_each_leaf_node(rnp)
> rcu_print_detail_task_stall_rnp(rnp);
> @@ -659,11 +679,11 @@ static void print_cpu_stall(unsigned long gps)
> rcu_check_gp_kthread_expired_fqs_timer();
> rcu_check_gp_kthread_starvation();
>
> - if (!use_softirq)
> - rcuc_kthread_dump(rdp);
> -
> rcu_dump_cpu_stacks();
>
> + if (!use_softirq)
> + rcuc_kthread_dump();
> +
> raw_spin_lock_irqsave_rcu_node(rnp, flags);
> /* Rewrite if needed in case of slow consoles. */
> if (ULONG_CMP_GE(jiffies, READ_ONCE(rcu_state.jiffies_stall)))
> --
> 2.25.1
>
Powered by blists - more mailing lists