linux-kernel - Re: [BUG] rcu-tasks : should take care of sparse cpu masks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Apr 2022 12:49:07 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] rcu-tasks : should take care of sparse cpu masks

On Thu, Mar 31, 2022 at 03:54:02PM -0700, Eric Dumazet wrote:
> On Thu, Mar 31, 2022 at 3:42 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Thu, Mar 31, 2022 at 02:45:25PM -0700, Eric Dumazet wrote:
> > > Hi Paul
> > >
> > > It seems you assume per cpu ptr for arbitrary indexes (< nr_cpu_ids) are valid.
> >
> > Gah!  I knew I was forgetting something...
> >
> > But just to check, is this a theoretical problem or something you hit
> > on real hardware?  (For the rest of this email, I am assuming the latter.)
> 
> Code review really...
> 
> >
> > > What do you think of the (untested) following patch ?
> >
> > One issue with this patch is that the contention could be unpredictable,
> > or worse, vary among CPU, especially if the cpu_possible_mask was oddly
> > distributed.
> >
> > So might it be better to restrict this to all on CPU 0 on the one hand
> > and completely per-CPU on the other?  (Or all on the boot CPU, in case
> > I am forgetting some misbegotten architecture that can run without a
> > CPU 0.)
> 
> If I understand correctly, cblist_init_generic() could setup
> percpu_enqueue_shift
> to something smaller than order_base_2(nr_cpu_ids)
> 
> Meaning that we could reach a non zero idx in (smp_processor_id() >>
> percpu_enqueue_shift)
> 
> So even if CPU0 is always present (I am not sure this is guaranteed,
> but this seems reasonable),
> we could still attempt a per_cpu_ptr(PTR,  not_present_cpu), and get garbage.

And the problem with my wish to provide load balancing is that a
sparse cpumask could be sparse any which way that it wants to be.
Another problem is that, unlike TREE SRCU, Tasks RCU doesn't have an
efficient way to find all the CPUs with callbacks queued.  Yes, I could
add that information, but the benefit does not seem worth the complexity.

So I took your patch after all, but changed from cpu_online_mask to
cpu_possible_mask.  Thank you for bearing with me on this one!

Are you OK with your Signed-off-by on this patch as shown below?

						Thanx, Paul

------------------------------------------------------------------------

commit b77b2981bb22c4449a0a6e86eeb9fbab36a2beae
Author: Eric Dumazet <edumazet@...gle.com>
Date:   Mon Apr 4 12:30:18 2022 -0700

    rcu-tasks: Handle sparse cpu_possible_mask
    
    If the rcupdate.rcu_task_enqueue_lim kernel boot parameter is set to
    something greater than 1 and less than nr_cpu_ids, the code attempts to
    use a subset of the CPU's RCU Tasks callback lists.  This works, but only
    if the cpu_possible_mask is contiguous.  If there are "holes" in this
    mask, the callback-enqueue code might attempt to access a non-existent
    per-CPU ->rtcpu variable for a non-existent CPU.  For example, if only
    CPUs 0, 4, 8, 12, 16 and so on are in cpu_possible_mask, specifying
    rcupdate.rcu_task_enqueue_lim=4 would cause the code to attempt to
    use callback queues for non-existent CPUs 1, 2, and 3.  Because such
    systems have existed in the past and might still exist, the code needs
    to gracefully handle this situation.
    
    This commit therefore checks to see whether the desired CPU is present
    in cpu_possible_mask, and, if not, searches for the next CPU.  This means
    that the systems administrator of a system with a sparse cpu_possible_mask
    will need to account for this sparsity when specifying the value of
    the rcupdate.rcu_task_enqueue_lim kernel boot parameter.  For example,
    setting this parameter to the value 4 will use only CPUs 0 and 4, which
    CPU 4 getting three times the callback load of CPU 0.
    
    This commit assumes that bit (nr_cpu_ids - 1) is always set in
    cpu_possible_mask.
    
    Link: https://lore.kernel.org/lkml/CANn89iKaNEwyNZ=L_PQnkH0LP_XjLYrr_dpyRKNNoDJaWKdrmg@mail.gmail.com/
    Signed-off-by: Eric Dumazet <edumazet@...gle.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 65d6e21a607a..44977c6a1cb8 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -273,7 +273,9 @@ static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp)
 static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 				   struct rcu_tasks *rtp)
 {
+	int chosen_cpu;
 	unsigned long flags;
+	int ideal_cpu;
 	unsigned long j;
 	bool needadjust = false;
 	bool needwake;
@@ -283,8 +285,9 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
 	rhp->func = func;
 	local_irq_save(flags);
 	rcu_read_lock();
-	rtpcp = per_cpu_ptr(rtp->rtpcpu,
-			    smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift));
+	ideal_cpu = smp_processor_id() >> READ_ONCE(rtp->percpu_enqueue_shift);
+	chosen_cpu = cpumask_next(ideal_cpu - 1, cpu_possible_mask);
+	rtpcp = per_cpu_ptr(rtp->rtpcpu, chosen_cpu);
 	if (!raw_spin_trylock_rcu_node(rtpcp)) { // irqs already disabled.
 		raw_spin_lock_rcu_node(rtpcp); // irqs already disabled.
 		j = jiffies;