linux-kernel - RE: [PATCH v2] rcu: Dump all rcuc kthreads status for CPUs that not report quiescent state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <PH0PR11MB5880E088CB975674DB791B3BDAF99@PH0PR11MB5880.namprd11.prod.outlook.com>
Date:   Sun, 24 Apr 2022 03:19:15 +0000
From:   "Zhang, Qiang1" <qiang1.zhang@...el.com>
To:     "paulmck@...nel.org" <paulmck@...nel.org>
CC:     "frederic@...nel.org" <frederic@...nel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "urezki@...il.com" <urezki@...il.com>
Subject: RE: [PATCH v2] rcu: Dump all rcuc kthreads status for CPUs that not
 report quiescent state


On Tue, Apr 19, 2022 at 01:34:26PM +0800, Zqiang wrote:
> If the rcutree.use_softirq is configured, when RCU Stall event
> happened, dump status of all rcuc kthreads who due to starvation
> prevented grace period ends on CPUs that not report quiescent
> state.

>Please accept my apologies for the delay, and please let me try
>again.  ;-)
>
>Your earlier patch added at most one line and one stack backtrace to
>the RCU CPU stall warning text, which is OK.  Sort of, anyway.  I was
>relying on the fact that the people who have (rightly) complained about
>RCU CPU stall-warning verbosity never run with !use_softirq.  But it is
>only a matter of time.  Yes, we could argue that they should use faster
>console serial lines, faster management-console hardware, faster networks,
>faster mass storage, and so on, but I would expect them to in turn ask
>us if we were volunteering to pay for all that.
>
>In contrast, this patch can add one line per stalled CPU on top of the
>existing output.  Which is better than your earlier patch, which could
>add a line plus a stack trace per stalled CPU.  But that can still be
>a lot of added output, and that added output can cause problems.
>
>So, could you please merge this rcuc-stalled information into the
>existing per-CPU line printed by print_cpu_stall_info()?  Right now,
>each such line looks something like this:
>
>rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2
>
>One approach would be to add the number of jiffies that the rcuc
>task was stalled to this line, maybe something like this:
>
>rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2 rcuc=15384
>
>Of course, this "rcuc=" string should only ut only if the stall lasted
>for longer than (say) one eighth of the stall timeout.
>
>Any "(false positive?)" needs to remain at the end of the line:
>
>rcu: 0-....: (4 ticks this GP) idle=1e6/1/0x4000000000000002 softirq=12470/12470 fqs=2 rcuc=15384 (false positive?)
>
>Thoughts?

Thanks suggestion, I will resend v3.

>
>							Thanx, Paul
>
> Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> ---
>  v1->v2:
>  rework rcuc_kthread_dump function
> 
>  kernel/rcu/tree_stall.h | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index d7956c03edbd..fcf0b2e1a71c 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -465,11 +465,13 @@ static void print_cpu_stall_info(int cpu)
>  	       falsepositive ? " (false positive?)" : "");
>  }
>  
> -static void rcuc_kthread_dump(struct rcu_data *rdp)
> +static void __rcuc_kthread_dump(int cpu)
>  {
> -	int cpu;
> -	unsigned long j;
> +	struct rcu_data *rdp;
>  	struct task_struct *rcuc;
> +	unsigned long j;
> +
> +	rdp = per_cpu_ptr(&rcu_data, cpu);
>  
>  	rcuc = rdp->rcu_cpu_kthread_task;
>  	if (!rcuc)
> @@ -488,6 +490,21 @@ static void rcuc_kthread_dump(struct rcu_data *rdp)
>  		dump_cpu_task(cpu);
>  }
>  
> +static void rcuc_kthread_dump(void)
> +{
> +	int cpu;
> +	struct rcu_node *rnp;
> +	unsigned long flags;
> +
> +	rcu_for_each_leaf_node(rnp) {
> +		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +		for_each_leaf_node_possible_cpu(rnp, cpu)
> +			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
> +				__rcuc_kthread_dump(cpu);
> +		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +	}
> +}
> +
>  /* Complain about starvation of grace-period kthread.  */
>  static void rcu_check_gp_kthread_starvation(void)
>  {
> @@ -597,6 +614,9 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	if (ndetected) {
>  		rcu_dump_cpu_stacks();
>  
> +		if (!use_softirq)
> +			rcuc_kthread_dump();
> +
>  		/* Complain about tasks blocking the grace period. */
>  		rcu_for_each_leaf_node(rnp)
>  			rcu_print_detail_task_stall_rnp(rnp);
> @@ -659,11 +679,11 @@ static void print_cpu_stall(unsigned long gps)
>  	rcu_check_gp_kthread_expired_fqs_timer();
>  	rcu_check_gp_kthread_starvation();
>  
> -	if (!use_softirq)
> -		rcuc_kthread_dump(rdp);
> -
>  	rcu_dump_cpu_stacks();
>  
> +	if (!use_softirq)
> +		rcuc_kthread_dump();
> +
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	/* Rewrite if needed in case of slow consoles. */
>  	if (ULONG_CMP_GE(jiffies, READ_ONCE(rcu_state.jiffies_stall)))
> -- 
> 2.25.1
>