linux-kernel - Re: [PATCH v2] rcu: Dump all rcuc kthreads status for CPUs that not report quiescent state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220422152647.GY4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Fri, 22 Apr 2022 08:26:47 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Zqiang <qiang1.zhang@...el.com>
Cc:     frederic@...nel.org, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org, urezki@...il.com
Subject: Re: [PATCH v2] rcu: Dump all rcuc kthreads status for CPUs that not
 report quiescent state

On Tue, Apr 19, 2022 at 01:34:26PM +0800, Zqiang wrote:
> If the rcutree.use_softirq is configured, when RCU Stall event
> happened, dump status of all rcuc kthreads who due to starvation
> prevented grace period ends on CPUs that not report quiescent
> state.

Please accept my apologies for the delay, and please let me try
again.  ;-)

Your earlier patch added at most one line and one stack backtrace to
the RCU CPU stall warning text, which is OK.  Sort of, anyway.  I was
relying on the fact that the people who have (rightly) complained about
RCU CPU stall-warning verbosity never run with !use_softirq.  But it is
only a matter of time.  Yes, we could argue that they should use faster
console serial lines, faster management-console hardware, faster networks,
faster mass storage, and so on, but I would expect them to in turn ask
us if we were volunteering to pay for all that.

In contrast, this patch can add one line per stalled CPU on top of the
existing output.  Which is better than your earlier patch, which could
add a line plus a stack trace per stalled CPU.  But that can still be
a lot of added output, and that added output can cause problems.

So, could you please merge this rcuc-stalled information into the
existing per-CPU line printed by print_cpu_stall_info()?  Right now,
each such line looks something like this:

rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2

One approach would be to add the number of jiffies that the rcuc
task was stalled to this line, maybe something like this:

rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2 rcuc=15384

Of course, this "rcuc=" string should only ut only if the stall lasted
for longer than (say) one eighth of the stall timeout.

Any "(false positive?)" needs to remain at the end of the line:

rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2 rcuc=15384 (false positive?)

Thoughts?

							Thanx, Paul

> Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> ---
>  v1->v2:
>  rework rcuc_kthread_dump function
> 
>  kernel/rcu/tree_stall.h | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index d7956c03edbd..fcf0b2e1a71c 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -465,11 +465,13 @@ static void print_cpu_stall_info(int cpu)
>  	       falsepositive ? " (false positive?)" : "");
>  }
>  
> -static void rcuc_kthread_dump(struct rcu_data *rdp)
> +static void __rcuc_kthread_dump(int cpu)
>  {
> -	int cpu;
> -	unsigned long j;
> +	struct rcu_data *rdp;
>  	struct task_struct *rcuc;
> +	unsigned long j;
> +
> +	rdp = per_cpu_ptr(&rcu_data, cpu);
>  
>  	rcuc = rdp->rcu_cpu_kthread_task;
>  	if (!rcuc)
> @@ -488,6 +490,21 @@ static void rcuc_kthread_dump(struct rcu_data *rdp)
>  		dump_cpu_task(cpu);
>  }
>  
> +static void rcuc_kthread_dump(void)
> +{
> +	int cpu;
> +	struct rcu_node *rnp;
> +	unsigned long flags;
> +
> +	rcu_for_each_leaf_node(rnp) {
> +		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +		for_each_leaf_node_possible_cpu(rnp, cpu)
> +			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
> +				__rcuc_kthread_dump(cpu);
> +		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +	}
> +}
> +
>  /* Complain about starvation of grace-period kthread.  */
>  static void rcu_check_gp_kthread_starvation(void)
>  {
> @@ -597,6 +614,9 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (ndetected) {
>  		rcu_dump_cpu_stacks();
>  
> +		if (!use_softirq)
> +			rcuc_kthread_dump();
> +
>  		/* Complain about tasks blocking the grace period. */
>  		rcu_for_each_leaf_node(rnp)
>  			rcu_print_detail_task_stall_rnp(rnp);
> @@ -659,11 +679,11 @@ static void print_cpu_stall(unsigned long gps)
>  	rcu_check_gp_kthread_expired_fqs_timer();
>  	rcu_check_gp_kthread_starvation();
>  
> -	if (!use_softirq)
> -		rcuc_kthread_dump(rdp);
> -
>  	rcu_dump_cpu_stacks();
>  
> +	if (!use_softirq)
> +		rcuc_kthread_dump();
> +
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	/* Rewrite if needed in case of slow consoles. */
>  	if (ULONG_CMP_GE(jiffies, READ_ONCE(rcu_state.jiffies_stall)))
> -- 
> 2.25.1
>