linux-kernel - Re: [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YQSBwGME1+vKHSM8+svtosunk-QO2oMygFKgapPE3b45w@mail.gmail.com>
Date: Wed, 9 Oct 2024 14:23:15 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Boqun Feng <boqun.feng@...il.com>, 
	Neeraj Upadhyay <neeraj.upadhyay@....com>, "Paul E . McKenney" <paulmck@...nel.org>, 
	Uladzislau Rezki <urezki@...il.com>, Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>, 
	kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq

Hi Frederic,

On Wed, Oct 2, 2024 at 10:57 AM Frederic Weisbecker <frederic@...nel.org> wrote:
>
> After a CPU has set itself offline and before it eventually calls
> rcutree_report_cpu_dead(), there are still opportunities for callbacks
> to be enqueued, for example from an IRQ. When that happens on NOCB, the
> rcuog wake-up is deferred through an IPI to an online CPU in order not
> to call into the scheduler and risk arming the RT-bandwidth after
> hrtimers have been migrated out and disabled.
>
> But performing a synchronized IPI from an IRQ is buggy as reported in
> the following scenario:
>
>         WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
>         Modules linked in: rcutorture torture
>         CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
>         Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
>         RIP: 0010:smp_call_function_single
>         <IRQ>
>         swake_up_one_online
>         __call_rcu_nocb_wake
>         __call_rcu_common
>         ? rcu_torture_one_read
>         call_timer_fn
>         __run_timers
>         run_timer_softirq
>         handle_softirqs
>         irq_exit_rcu

This call stack seems a bit confusing with the context of the first
paragraph. In the beginning of changelog, you had mentioned the issue
happens from IRQ, but in fact the callstack here is from softirq
right? The IRQ issue should already be resolved in current code
AFAICS.

>         ? tick_handle_periodic
>         sysvec_apic_timer_interrupt
>         </IRQ>
>
> The periodic tick must be shutdown when the CPU is offline, just like is
> done for oneshot tick. This must be fixed but this is not enough:
> softirqs can happen on any hardirq tail and reproduce the above scenario.
>
> Fix this with introducing a special deferred rcuog wake up mode when the
> CPU is offline. This deferred wake up doesn't arm any timer and simply
> wait for rcu_report_cpu_dead() to be called in order to flush any
> pending rcuog wake up.
[...]
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index a9a811d9d7a3..7ed060edd12b 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -290,6 +290,7 @@ struct rcu_data {
>  #define RCU_NOCB_WAKE_LAZY     2
>  #define RCU_NOCB_WAKE          3
>  #define RCU_NOCB_WAKE_FORCE    4
> +#define RCU_NOCB_WAKE_OFFLINE   5
>
>  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
>                                         /* For jiffies_till_first_fqs and */
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 2fb803f863da..8648233e1717 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -295,6 +295,8 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>         case RCU_NOCB_WAKE_FORCE:
>                 if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
>                         mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> +               fallthrough;
> +       case RCU_NOCB_WAKE_OFFLINE:
>                 if (rdp_gp->nocb_defer_wakeup < waketype)
>                         WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
>                 break;
> @@ -562,8 +564,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>         lazy_len = READ_ONCE(rdp->lazy_len);
>         if (was_alldone) {
>                 rdp->qlen_last_fqs_check = len;
> -               // Only lazy CBs in bypass list
> -               if (lazy_len && bypass_len == lazy_len) {
> +               if (cpu_is_offline(rdp->cpu)) {
> +                       /*
> +                        * Offline CPUs can't call swake_up_one_online() from IRQs. Rely
> +                        * on the final deferred wake-up rcutree_report_cpu_dead()
> +                        */
> +                       rcu_nocb_unlock(rdp);
> +                       wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_OFFLINE,
> +                                          TPS("WakeEmptyIsDeferredOffline"));
> +               } else if (lazy_len && bypass_len == lazy_len) {

Since the call stack is when softirqs are disabled,  would an
alternative fix be (pseudocode):

Change the following in the "if (was_alldone)" block:

               if (!irqs_disabled_flags(flags)) {

to:
               if (!irqs_disabled_flags(flags) && !in_softirq())

?

That way perhaps an additional RCU_NOCB flag is not needed.

Or does that not work for some reason?

thanks,

 - Joel