linux-kernel - Re: [RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <8a30005f-e87c-40bf-a49f-c9f049cfbdb2@paulmck-laptop>
Date:   Thu, 1 Jun 2023 09:21:00 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Imran Khan <imran.f.khan@...cle.com>
Cc:     peterz@...radead.org, jgross@...e.com, vschneid@...hat.com,
        yury.norov@...il.com, tglx@...utronix.de,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org
Subject: Re: [RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to
 CSD destination.

On Tue, May 30, 2023 at 11:24:00AM +1000, Imran Khan wrote:
> Hello Paul,
> 
> On 16/5/2023 10:09 pm, Paul E. McKenney wrote:
> > On Tue, May 09, 2023 at 08:31:24AM +1000, Imran Khan wrote:
> >> On systems with hundreds of CPUs, if few hundred or most of the CPUs
> >> detect a CSD hang, then all of these waiters endup sending an NMI to
> >> destination CPU to dump its backtrace.
> >> Depending on the number of such NMIs, destination CPU can spent
> >> a significant amount of time handling these NMIs and thus making
> >> it more difficult for this CPU to address those pending CSDs timely.
> >> In worst case it can happen that by the time destination CPU is done
> >> handling all of the above mentioned backtrace NMIs, csd wait time
> >> may have elapsed and all of the waiters start sending backtrace NMI
> >> again and this behaviour continues in loop.
> >>
> >> To avoid the above mentioned scenario, issue backtrace NMI only from
> >> first waiter. The other waiters to same CSD destination can make use
> >> of backtrace obtained via fist waiter's NMI.
> >>
> >> Signed-off-by: Imran Khan <imran.f.khan@...cle.com>
> > 
> > Reviewed-by: Paul E. McKenney <paulmck@...nel.org>
> 
> Thanks a lot for reviewing this and [1]. Could you kindly let me know
> if you plan to pick these in your tree, at some point of time.

I have done so, and they should make it to -next early next week,
assuming testing goes well.

							Thanx, Paul

> Thanks,
> Imran
> 
> [1]:
> https://lore.kernel.org/all/088edfa0-c1b7-407f-8b20-caf0fecfbb79@paulmck-laptop/
> 
> >> ---
> >>  kernel/smp.c | 10 +++++++++-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/smp.c b/kernel/smp.c
> >> index b7ccba677a0a0..a1cd21ea8b308 100644
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -43,6 +43,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
> >>  
> >>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
> >>  
> >> +static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
> >> +
> >>  static void __flush_smp_call_function_queue(bool warn_cpu_offline);
> >>  
> >>  int smpcfd_prepare_cpu(unsigned int cpu)
> >> @@ -242,7 +244,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> >>  			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
> >>  	}
> >>  	if (cpu >= 0) {
> >> -		dump_cpu_task(cpu);
> >> +		if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
> >> +			dump_cpu_task(cpu);
> >>  		if (!cpu_cur_csd) {
> >>  			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
> >>  			arch_send_call_function_single_ipi(cpu);
> >> @@ -423,9 +426,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> >>  	struct llist_node *entry, *prev;
> >>  	struct llist_head *head;
> >>  	static bool warned;
> >> +	atomic_t *tbt;
> >>  
> >>  	lockdep_assert_irqs_disabled();
> >>  
> >> +	/* Allow waiters to send backtrace NMI from here onwards */
> >> +	tbt = this_cpu_ptr(&trigger_backtrace);
> >> +	atomic_set_release(tbt, 1);
> >> +
> >>  	head = this_cpu_ptr(&call_single_queue);
> >>  	entry = llist_del_all(head);
> >>  	entry = llist_reverse_order(entry);
> >> -- 
> >> 2.34.1
> >>