linux-kernel - Re: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221201001307.GF4001@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 30 Nov 2022 16:13:07 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Zqiang <qiang1.zhang@...el.com>
Cc:     frederic@...nel.org, quic_neeraju@...cinc.com,
        joel@...lfernandes.org, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU
 hotplug

On Thu, Dec 01, 2022 at 07:45:33AM +0800, Zqiang wrote:
> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> RCU-tasks grace period, if __num_online_cpus == 1, will return
> directly, indicates the end of the rude RCU-task grace period.
> suppose the system has two cpus, consider the following scenario:
> 
>         CPU0                                   CPU1 (going offline)
>                                           migration/1 task:
>                                       cpu_stopper_thread
>                                        -> take_cpu_down
>                                           -> _cpu_disable
>                                            (dec __num_online_cpus)
>                                           ->cpuhp_invoke_callback
>                                                 preempt_disable
>                                                 access old_data0
>            task1
>  del old_data0                                  .....
>  synchronize_rcu_tasks_rude()
>  task1 schedule out
>  ....
>  task2 schedule in
>  rcu_tasks_rude_wait_gp()
>      ->__num_online_cpus == 1
>        ->return
>  ....
>  task1 schedule in
>  ->free old_data0
>                                                 preempt_enable
> 
> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> the CPU1 has not finished offline, stop_machine task(migration/1)
> still running on CPU1, maybe still accessing 'old_data0', but the
> 'old_data0' has freed on CPU0.
> 
> In order to prevent the above scenario from happening, this commit
> remove check for __num_online_cpus == 0 and add handling of calling
> synchronize_rcu_tasks_generic() during early boot(when the
> rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE, the scheduler
> not yet initialized and only one boot-CPU online).
> 
> Signed-off-by: Zqiang <qiang1.zhang@...el.com>

Very good, thank you!  I did the usual wordsmithing, including to that
error message, so as usual please check to make sure that I didn't mess
something up.

							Thanx, Paul

------------------------------------------------------------------------

commit 033ddc5d337984e20b9d49c8af4faa4689727626
Author: Zqiang <qiang1.zhang@...el.com>
Date:   Thu Dec 1 07:45:33 2022 +0800

    rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
    
    The synchronize_rcu_tasks_rude() function invokes rcu_tasks_rude_wait_gp()
    to wait one rude RCU-tasks grace period.  The rcu_tasks_rude_wait_gp()
    function in turn checks if there is only a single online CPU.  If so, it
    will immediately return, because a call to synchronize_rcu_tasks_rude()
    is by definition a grace period on a single-CPU system.  (We could
    have blocked!)
    
    Unfortunately, this check uses num_online_cpus() without synchronization,
    which can result in too-short grace periods.  To see this, consider the
    following scenario:
    
            CPU0                                   CPU1 (going offline)
                                              migration/1 task:
                                          cpu_stopper_thread
                                           -> take_cpu_down
                                              -> _cpu_disable
                                               (dec __num_online_cpus)
                                              ->cpuhp_invoke_callback
                                                    preempt_disable
                                                    access old_data0
               task1
     del old_data0                                  .....
     synchronize_rcu_tasks_rude()
     task1 schedule out
     ....
     task2 schedule in
     rcu_tasks_rude_wait_gp()
         ->__num_online_cpus == 1
           ->return
     ....
     task1 schedule in
     ->free old_data0
                                                    preempt_enable
    
    When CPU1 decrements __num_online_cpus, its value becomes 1.  However,
    CPU1 has not finished going offline, and will take one last trip through
    the scheduler and the idle loop before it actually stops executing
    instructions.  Because synchronize_rcu_tasks_rude() is mostly used for
    tracing, and because both the scheduler and the idle loop can be traced,
    this means that CPU0's prematurely ended grace period might disrupt the
    tracing on CPU1.  Given that this disruption might include CPU1 executing
    instructions in memory that was just now freed (and maybe reallocated),
    this is a matter of some concern.
    
    This commit therefore removes that problematic single-CPU check from the
    rcu_tasks_rude_wait_gp() function.  This dispenses with the single-CPU
    optimization, but there is no evidence indicating that this optimization
    is important.  In addition, synchronize_rcu_tasks_generic() contains a
    similar optimization (albeit only for early boot), which also splats.
    (As in exactly why are you invoking synchronize_rcu_tasks_rude() so
    early in boot, anyway???)
    
    It is OK for the synchronize_rcu_tasks_rude() function's check to be
    unsynchronized because the only times that this check can evaluate to
    true is when there is only a single CPU running with preemption
    disabled.
    
    While in the area, this commit also fixes a minor bug in which a
    call to synchronize_rcu_tasks_rude() would instead be attributed to
    synchronize_rcu_tasks().
    
    [ paulmck: Add "synchronize_" prefix and "()" suffix. ]
    
    Signed-off-by: Zqiang <qiang1.zhang@...el.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 4dda8e6e5707f..d845723c1af41 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -560,8 +560,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
 {
 	/* Complain if the scheduler has not started.  */
-	WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
-			 "synchronize_rcu_tasks called too soon");
+	if (WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
+			 "synchronize_%s() called too soon", rtp->name))
+		return;
 
 	// If the grace-period kthread is running, use it.
 	if (READ_ONCE(rtp->kthread_ptr)) {
@@ -1064,9 +1065,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
 // Wait for one rude RCU-tasks grace period.
 static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
 {
-	if (num_online_cpus() <= 1)
-		return;	// Fastpath for only one CPU.
-
 	rtp->n_ipis += cpumask_weight(cpu_online_mask);
 	schedule_on_each_cpu(rcu_tasks_be_rude);
 }