linux-kernel - Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00ace254f9085aad12684185b77504bd911aed63.camel@mediatek.com>
Date: Fri, 1 Nov 2024 07:41:27 +0000
From: Cheng-Jui Wang (王正睿)
	<Cheng-Jui.Wang@...iatek.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "sumit.garg@...aro.org" <sumit.garg@...aro.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"dianders@...omium.org" <dianders@...omium.org>, "rostedt@...dmis.org"
	<rostedt@...dmis.org>, "frederic@...nel.org" <frederic@...nel.org>,
	wsd_upstream <wsd_upstream@...iatek.com>,
	Bobule Chang (張弘義) <bobule.chang@...iatek.com>,
	"mark.rutland@....com" <mark.rutland@....com>, "kernel-team@...a.com"
	<kernel-team@...a.com>, "joel@...lfernandes.org" <joel@...lfernandes.org>,
	"rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in
 rcu_dump_cpu_stacks()

On Wed, 2024-10-30 at 06:54 -0700, Paul E. McKenney wrote:
> > > Alternatively, arm64 could continue using nmi_trigger_cpumask_backtrace()
> > > with normal interrupts (for example, on SoCs not implementing true NMIs),
> > > but have a short timeout (maybe a few jiffies?) after which its returns
> > > false (and presumably also cancels the backtrace request so that when
> > > the non-NMI interrupt eventually does happen, its handler simply returns
> > > without backtracing).  This should be implemented using atomics to avoid
> > > deadlock issues.  This alternative approach would provide accurate arm64
> > > backtraces in the common case where interrupts are enabled, but allow
> > > a graceful fallback to remote tracing otherwise.
> > > 
> > > Would you be interested in working this issue, whatever solution the
> > > arm64 maintainers end up preferring?
> > 
> > The 10-second timeout is hard-coded in nmi_trigger_cpumask_backtrace().
> > It is shared code and not architecture-specific. Currently, I haven't
> > thought of a feasible solution. I have also CC'd the authors of the
> > aforementioned patch to see if they have any other ideas.
> 
> It should be possible for arm64 to have an architecture-specific hook
> that enables them to use a much shorter timeout.  Or, to eventually
> switch to real NMIs.

There is already another thread discussing the timeout issue, but I
still have some questions about RCU. To avoid mixing the discussions, I
start this separate thread to discuss RCU.

> > Regarding the rcu stall warning, I think the purpose of acquiring `rnp-
> > > lock` is to protect the rnp->qsmask variable rather than to protect
> > 
> > the `dump_cpu_task()` operation, right?
> 
> As noted below, it is also to prevent false-positive stack dumps.
> 
> > Therefore, there is no need to call dump_cpu_task() while holding the
> > lock.
> > When holding the spinlock, we can store the CPUs that need to be dumped
> > into a cpumask, and then dump them all at once after releasing the
> > lock.
> > Here is my temporary solution used locally based on kernel-6.11.
> > 
> > +     cpumask_var_t mask;
> > +     bool mask_ok;
> > 
> > +     mask_ok = zalloc_cpumask_var(&mask, GFP_ATOMIC);
> >       rcu_for_each_leaf_node(rnp) {
> >               raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >               for_each_leaf_node_possible_cpu(rnp, cpu)
> >                       if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
> > {
> >                               if (cpu_is_offline(cpu))
> >                                       pr_err("Offline CPU %d blocking
> > current GP.\n", cpu);
> > +                             else if (mask_ok)
> > +                                     cpumask_set_cpu(cpu, mask);
> >                               else
> >                                       dump_cpu_task(cpu);
> >                       }
> >               raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >       }
> > +     if (mask_ok) {
> > +             if (!trigger_cpumask_backtrace(mask)) {
> > +                     for_each_cpu(cpu, mask)
> > +                             dump_cpu_task(cpu);
> > +             }
> > +             free_cpumask_var(mask);
> > +     }
> > 
> > After applying this, I haven't encountered the lockup issue for five
> > days, whereas it used to occur about once a day.
> 
> We used to do it this way, and the reason that we changed was to avoid
> false-positive (and very confusing) stack dumps in the surprisingly
> common case where the act of dumping the first stack caused the stalled
> grace period to end.
> 
> So sorry, but we really cannot go back to doing it that way.
> 
>                                                         Thanx, Paul

Let me clarify, the reason for the issue mentioned above is that it
pre-determines all the CPUs to be dumped before starting the dump
process. Then, dumping the first stack caused the stalled grace period
to end. Subsequently, many CPUs that do not need to be dumped (false
positives) are dumped.

So,to prevent false positives, it should be about excluding those CPUs
that do not to be dumped, right? Therefore, the action that trully help
is actually "releasing the lock after each dump (allowing other CPUs to
update qsmask) and rechecking (gp_seq and qsmask) to confirm whether to
continue dumping".

I think holding the lock while dumping CPUs does not help prevent false
positives; it only blocks those CPUs waiting for the lock (e.g., CPUs
aboult to report qs). For CPUs that do not interact with this lock,
holding it should not have any impact. Did I miss anything?

-Cheng-Jui